How to Use Proxies to Rotate IP Addresses in Python

Discover how to use proxies in Python for IP rotation in web scraping, where to find reliable proxies, and tips to avoid website blocks.
7 min read
Python IP rotation

IP rotation using proxies is essential in web scraping, particularly when dealing with modern websites that may impose restrictions. Distributing your requests across multiple IP addresses is crucial to avoid being blocked or subjected to rate limits. Rotating IP addresses makes it more challenging for websites to trace and restrict your scraping activity. This enhances the efficiency and reliability of your web scraping process, allowing you to extract data more effectively. Using proxies and rotating IP addresses during web scraping lets you avoid IP-based bans and penalties, overcome rate limits, and access geo-restricted content.

This article explains how to implement proxies in your web scraping workflow to rotate the IP addresses used. You’ll explore where to get effective proxies, what the tips are for IP rotation, and how to avoid getting blocked by your target website.

IP Rotation with Python

regular scraping process with Python commonly uses a Python library like Requests or Scrapy to access a website and parse through its contents. You can then filter the website content for the information you want to extract. The following is an example of a typical scraping process:


import requests

url = 'http://example.com'

# Make requests 
response = requests.get(url)
print(response.text)

This process gets you the information you need and is fine for single-use cases or cases where you only need to extract data once. However, it uses your system IP to make requests and can run into issues with repeated or continuous requests where the website limits access over time.

The results of the example scraping process are as follows:

<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", 
…

Most Python libraries, such as Requests or Scrapy, aimed at scraping or making web requests have an avenue for switching the IP address used in making these requests. However, to take advantage of this, you need a list or source for valid IP addresses. These sources can be free or commercial, such as Bright Data proxies.

Commercial options guarantee validity and provide helpful tools to manage and rotate your proxies to ensure no downtimes in your scraping process. For example, Bright Data has several categories of proxies priced differently based on what use case they are made for, how well they scale, and what the assurance of unblocked access is to your requested data:

Bright Data's proxy services for seamless data scraping

Using free proxies, you can create a list in Python containing valid proxies that you can rotate throughout your scraping process:


proxies = ["103.155.217.1:41317", "47.91.56.120:8080", "103.141.143.102:41516", "167.114.96.13:9300", "103.83.232.122:80"]

With this, all you need is a rotating mechanism that selects different IP addresses from the list as you make multiple requests. In Python, this would look similar to the following function:

import random
import requests

def scraping_request(url):

   ip = random.randrange(0, len(proxies))
   
   ips = {"http": proxies[ip], "https": proxies[ip]}
   response = requests.get(url, proxies=ips)
   print(f"Proxy currently being used: {ips['https']}")
   return response.text

This code selects a random proxy from your list each time it is called. The proxy is used in scraping requests.

Including an error case to handle invalid proxies would result in the complete scraping code looking like this:

import random
import requests

proxies = ["103.155.217.1:41317", "47.91.56.120:8080", "103.141.143.102:41516", "167.114.96.13:9300", "103.83.232.122:80"]
def scraping_request(url):

   ip = random.choice(proxies)
   try:
      response = requests.get(url, proxies={"http": ip, "https": ip})
      if response.status_code == 200:   
         print(f"Proxy currently being used: {ip}")
   ip = random.randrange(0, len(proxies))
   
   ips = {"http": proxies[ip], "https": proxies[ip]}
   response = requests.get(url, proxies=ips)
   try:

      if response.status_code == 200:   
         print(f"Proxy currently being used: {ips['https']}")
         print(response.text)   

      elif response.status_code == 403:
         print("Forbidden client")

      elif response.status_code == 429:
         print("Too many requests")
         
   except Exception as e:
      print(f"An unexpected error occurred: {e}")

                
scraping_request("http://example.com")

You can also use this rotating list of proxies to perform your requests with any other scraping framework, such as Scrapy.

Scraping with Scrapy

With Scrapy, you need to install the library and create the necessary project artifacts before you can successfully crawl the web.

You can install Scrapy using the pip package manager in your Python-enabled environment:

pip install Scrapy

Once installed, you can generate a Scrapy project with some template files in your current directory using the following commands:

scrapy startproject sampleproject

cd sampleproject

scrapy genspider samplebot example.com

These commands also generate a basic code file that you can flesh out with an IP rotation mechanism.

Open up the sampleproject/spiders/samplebot.pysamplebot.py file and update it with the following code:


import scrapy
import random

proxies = ["103.155.217.1:41317", "47.91.56.120:8080", "103.141.143.102:41516", "167.114.96.13:9300", "103.83.232.122:80"]
ip = random.randrange(0, len(proxies))
   
class SampleSpider(scrapy.Spider):
    name = "samplebot"
    allowed_domains = ["example.com"]
    start_urls = ["https://example.com"]
    
    def start_requests(self):
        for url in self.start_urls:
            proxy = random.choice(proxies)
            yield scrapy.Request(url, meta={"proxy": f"http://{proxy}"})

    request = scrapy.Request(
        "http://www.example.com/index.html",
        meta={"proxy": f"http://{ip}"}
    )
    def parse(self, response):
        # Log the proxy being used in the request
        proxy_used = response.meta.get("proxy")
        self.logger.info(f"Proxy used: {proxy_used}") 
        print(response.text)

Execute the following command at the top of the project directory to run this scraping script:

scrapy crawl samplebot
Running the script

Tips for IP Rotation

Web scraping has evolved into a form of competition between websites and scrapers, with scrapers coming up with new methods and techniques to get the needed data and websites finding new ways to block access.

IP rotation is a technique that aims to bypass limitations set by websites. To maximize the effectiveness of IP rotation and minimize the chances of getting blocked by your target website, consider the following tips:

  • Ensure a large and diverse proxy pool: When using IP rotation, you need a significant proxy pool with a large number of proxies and a wide variety of IP addresses. This diversity helps achieve proper rotation and reduces your risk of overusing proxies, which could lead to rate limits and bans. Consider using multiple proxy providers with different IP ranges and locations. Also, consider varying the timing and intervals between your requests with your different proxies to simulate natural user behavior better.
  • Have robust error-handling mechanisms: During your web scraping process, you may encounter a number of errors due to temporary connectivity issues, blocked proxies, or changes in your target website. By implementing error handling in your scripts, you can ensure the smooth execution of your scraping process, catching and handling common exceptions such as connection errors, timeouts, and HTTP status errors. Consider setting up circuit breakers to temporarily pause your scraping process if a high number of errors occur within a short period.
  • Test your proxies before use: Before deploying your scraping script in production, use a sample of your proxy pool to test the IP rotation functionality and error-handling mechanisms under different scenarios. You can use sample websites to simulate real-world conditions and ensure your script can handle these cases.
  • Monitor proxy performance and efficiency: Regularly monitor the performance of your proxies to detect any issues, such as slow response times or frequent failures. You should keep track of the success rate of each proxy to identify inefficient ones. Proxy providers such as Bright Data offer tools to check the health and performance of their proxies. By monitoring proxy performance, you can quickly switch to more reliable proxies and remove underperforming ones from your rotation pool.

Web scraping is an iterative process, and websites may change their structure and response patterns or implement new measures to prevent scraping. Regularly monitor your scraping process and adapt to any changes to maintain the effectiveness of your scraping efforts.

Conclusion

This article explored IP rotation and how to implement it in your scraping process with Python. You also learned some practical tips to maintain the effectiveness of your scraping process with Python.

Bright Data is your one-stop platform for web scraping solutions. It provides high quality and ethcial proxies, a web Scraping Browser, an IDE for your scraping bot development and processes, ready-to-use datasets, and several tools for rotating and managing proxies during scraping.

No credit card required