Web Scraping With Selenium Wire in Python

Learn to use Selenium Wire for web scraping, featuring request interception and dynamic proxy rotation to boost your scraping efficiency.
16 min read
web scraping with selenium wire blog image

In this guide, you will learn:

  • What Selenium Wire is
  • Why you should use Selenium Wire for web scraping
  • The key features of Selenium Wire
  • A web scraping use case of Selenium Wire with proxies rotation
  • Bright Data proxy integration with Selenium Wire

Let’s dive in!

What Is Selenium Wire?

Selenium Wire is an extension for Selenium’s Python bindings that provides control over browser requests. Specifically, it allows you to intercept and modify both requests and responses in real time directly from your Python code while using Selenium.

Note: While the library is no longer being maintained, several scraping technologies and scripts still rely on it.

Why Use Selenium Wire for Web Scraping?

Selenium is a popular browser automation framework used in web scraping to interact with sites as regular human users would. Find out more in our Selenium web scraping guide.

The problem is that browsers have certain limitations that can make web scraping challenging. For example, they do not enable you to set authorized proxy URLs or rotate proxies on the fly. Selenium Wire helps you overcome those limitations.

Here are three good reasons why you should use Selenium Wire for web scraping:

  • Access the network layer: Interpret, inspect, and modify AJAX network traffic for advanced data extraction.
  • Bypass antibotsChromeDriver exposes a significant amount of information that anti-bot systems can use to identify you as a bot. Selenium Wire is used by technologies like undetected-chromedriver to avoid that and help bypass most anti-bot solutions.
  • Overcome browsers’ limitations: Modern browsers use flags to configure behaviors at startup, but these settings are static and require a restart to be changed. Selenium Wire overcomes this limitation by supporting dynamic modifications. This way, you can update request headers or proxies during the same browser session, which is ideal for web scraping.

Key Features of Selenium Wire

Now you know what Selenium Wire is and why you should use it for web scraping. It is time to explore its most important features!

Access Requests and Responses

Selenium Wire can capture the HTTP/HTTPS traffic made by the browser, giving you access to the following attributes:

Attribute Description
driver.requests It reports the list of captured requests in chronological order
driver.last_request It reports the most recently captured request
(This is more efficient than using driver.requests[-1])
driver.wait_for_request(pat, timeout=10) This method will wait—the time is defined by the timeout parameter—until it sees a request matching a pattern, defined by the pat parameter—which can be a substring or a regular expression.
driver.har A JSON formatted HAR archive of HTTP transactions that have taken place.
driver.iter_requests() It returns an iterator over captured requests.

In detail, a Selenium Wire Request object has the following attributes:

Attribute Description
body The body’s request is presented as bytes. If the request has no body the value of body will be empty (for example: b'').
cert It reports information about the server SSL certificate in a dictionary format (it’s empty for non-HTTPS requests).
date It shows the datetime at which the request was made.
headers It reports a dictionary-like object of the request’s headers (note that in Selenium Wire headers are case-insensitive and duplicates are permitted).
host It reports the request host ( for example, https://brightdata.com/).
method It specifies the HHTP method (GETPOST, etc…)
params It reports a dictionary of the request’s parameters (note that if a parameter with the same name appears more than once in the request, its value in the dictionary will be a list).
path It reports the request path.
querystring It reports the query string.
response It reports the response object associated with the request (note that the value will be None if the request has no response).
url It reports the request URL complete with hostpath, and querystring.
ws_messages In the case a request is a WebSocket (in which case, the URL is generally like wss://) the ws_messages will contain any websocket messages sent and received.

Instead, a Response object exposes these attributes:

Attribute Description
body The body’s response is presented as bytes. If the response has no body the value of body will be empty (for example: b'').
date It shows the datetime at which the response was received.
headers It reports a dictionary-like object of the response’s headers (note that in Selenium Wire headers are case-insensitive and duplicates are permitted).
reason It reports the reason phrase of the response, like OKNot Found, etc…
status_code It reports the status of the response, like 200404, etc…

To test this feature, you can create a Python script like the following:

from seleniumwire import webdriver

# Initialize the WebDriver with Selenium Wire
driver = webdriver.Chrome()

try:
    # Open the target website
    driver.get("https://brightdata.com/")

    # Access and print all captured requests
    for request in driver.requests:
        print(f"URL: {request.url}")
        print(f"Method: {request.method}")
        print(f"Headers: {request.headers}")
        print(f"Response Status Code: {request.response.status_code if request.response else 'No Response'}")
        print("-" * 50)

finally:
    # Close the browser
    driver.quit()

The above code opens the target website and capture requests by using driver.requests. Then, it loops through a for loop to intercept some request attributes like urlmethod, and headers.

Here is the expected result:

Some of the logged requests

The destination page makes multiple requests, and the script tracks all of them.

Intercept Requests and Responses

Selenium Wire can intercept and modify requests and responses thanks to interceptors. An interceptor is a function invoked with requests and responses as they pass through the browser.

There are two separate interceptors:

  • driver.request_interceptor: It intercepts requests and accepts a single argument.
  • driver.response_interceptor: It intercepts the response and accepts two arguments, one for the originating request and one for the response.

Here is an example that shows how to use a request interceptor:

from seleniumwire import webdriver

# Define the request interceptor function
def interceptor(request):
    # Add a custom header to all requests
    request.headers["X-Test-Header"] = "MyCustomHeaderValue"

    # Block requests to a specific domain
    if "example.com" in request.url:
        print(f"Blocking request to: {request.url}")
        request.abort()  # Abort the request

# Initialize the WebDriver with Selenium Wire
driver = webdriver.Chrome()

# Assign the interceptor function to the driver
driver.request_interceptor = interceptor

try:
    # Open a website that makes multiple requests
    driver.get("https://brightdata.com/")

    # Print all captured requests
    for request in driver.requests:
        print(f"URL: {request.url}")
        print(f"Headers: {request.headers}")
        print("-" * 50)

finally:
    # Close the browser
    driver.quit()

This is what this snippet does:

  • Interceptor function: It creates an interceptor function to be called for every outgoing request. This adds a custom header to all outgoing requests with request.headers[]. Also, it blocks browser requests for example.com domain.
  • Captures requests: After the page is loaded, all captured requests are printed, including the modified headers.

Note: Request blocking is helpful when pages load additional resources such as ads, analytics scripts, or third-party widgets that are irrelevant to your goal. Blocking these requests can significantly improve scraping speed and reduce the browser’s bandwidth usage.

The expected result is something like so:

Note the X-Test-Header

See how the request made by the browser was intercepted and the additional header value was added to it.

WebSocket Monitoring

Many modern web pages use WebSockets for real-time communication with servers. WebSockets establish a persistent connection between the browser and the server. That way, data can be exchanged continuously without the overhead of traditional HTTP requests.

Often, critical data flows through these channels, and accessing it directly can be invaluable for data retrieval. By intercepting WebSocket communication, you can extract raw data sent by the server without waiting for the browser to transform it or the page to render it.

You have already learned that request objects have the ws_messages attribute to manage WebSockets. These are the attributes of a Selenium Wire WebSocket object:

Attribute Description
content It reports the message’s content which can be either a str or in the bytes format.
date It shows the datetime of the message.
headers It reports a dictionary-like object of the response’s headers (note that in Selenium Wire headers are case-insensitive and duplicates are permitted).
from_client This is a boolean that returns True when the message was sent by the client and False by the server.

Manage Proxies

Proxy servers act as intermediaries between your device and the target sites, masking your IP address in the process. They are essential for web scraping as they:

  1. Help bypass IP-based restrictions
  2. Prevent blocking in case of rate limiters
  3. Enable scraping content from geo-restricted sites

Below is how you can configure a proxy in Selenium Wire:

# Set up Selenium Wire options
options = {
    "proxy": {
        "http": "<YOUR_HTTP_PROXY_URL>",
        "https": "<YOUR_HTTPS_PROXY_URL>"
    }
}

# Initialize the WebDriver with Selenium Wire
driver = webdriver.Chrome(seleniumwire_options=options)

This setup differs from configuring a proxy in vanilla Selenium, where you need to rely on Chrome’s --proxy-server flag. This means that proxy configuration is static in vanilla Selenium.

Once you set a proxy, it applies to the entire browser session and cannot be changed without restarting the browser. This limitation can be restrictive, especially in scenarios where you need to rotate proxies dynamically.

In contrast, Selenium Wire provides the flexibility to change proxies dynamically within the same browser instance. That is possible thanks to the proxy attribute:

# Dynamically change the proxy
driver.proxy = {
    "http": "<NEW_HTTP_PROXY_URL>",
    "https": "<NEW_HTTPS_PROXY_URL>"
}

Plus, Chrome’s --proxy-server flag does not support proxies with authentication credentials in the URL:

protocol://username:password@host:port

Instead, Selenium Wire fully supports authenticated proxies, making it the better choice for web scraping.

Since proxy configuration is one of the most significant advantages of Selenium Wire, we will explore this topic further in the next chapter.

Web Scraping Use Case: Proxy Rotation in Selenium Wire

As mentioned earlier, the primary reason to use Selenium Wire for web scraping is its advanced proxy management capabilities.

In this guided section, you will see how to set up a Selenium Wire project for proxy rotation. This will help you make your exit IP change at every request.

Requirements

To replicate this tutorial, your system must match the following prerequisites:

  • Python 3.7 or higher: Any Python version higher than 3.7 will do. Specifically, we will install the dependencies via pip, which is already installed with any Python version greater than 3.4.
  • A supported web browser: Selenium Wire extends Selenium, so you need a supported browser.

Before installing Selenium Wire, you can create a virtual environment directory like so:

python -m venv venv

To activate it, on Windows, run:

venv\Scripts\activate

Equivalently, on macOS/Linux, execute:

source venv/bin/activate

Now you can install Selenium Wire with:

pip install selenium-wire

Note: You do not need to install Selenium. Its installation occurs with Selenium Wire, as it is one of its dependencies.

Suppose you call your main folder selenium_wire/. At the end of this step, the folder will have the following structure:

selenium_wire/
├── selenium_wire.py
└── venv/

Where selenium_wire.py is the Python file that will contain all the logic you will implement in the next steps.

Step 1: Randomize Proxies

First, you need a list of valid proxy URLs. If you do not know where to get them, take a look at our list of free proxies. Add them to a list and use random.choice() to pick a random element from it:

def get_random_proxy():
    proxies = [
        "http://PROXY_1:PORT_NUMBER_X",
        "http://PROXY_2:PORT_NUMBER_Y",
        "http://PROXY_3:PORT_NUMBER_Z",
        # ...
    ]
    
    # Randomize the list
    return random.choice(proxies)

Once called, this function returns a random proxy URL from the list.

To make it work, do not forget to import random:

mport random

Step 2: Set the Proxy

Call the get_random_proxy() function to get a proxy URL:

proxy = get_random_proxy()

Then, initialize the browser instance and set the selected proxy:

# Selenium Wire configuration with the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy,
        "https": proxy
    }
}

# Browser configuration
chrome_options = Options()
chrome_options.add_argument("--headless")  # Run the browser in headless mode 

# Initialize a browser instance with the given configurations
driver = webdriver.Chrome(service=Service(), options=chrome_options, seleniumwire_options=seleniumwire_options)

The above snippet requires the following imports:

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

For dynamically changing the proxy during the browser session, you would use this code instead:

driver.proxy = {
    "http": proxy,
    "https": proxy
}

Amazing, the controlled Chrome instance will now route requests through the given proxy.

Step 3: Visit the Target Page

Visit the target website, extract the output, and close the browser:

try:
    # Visit the target page
    driver.get("https://httpbin.io/ip")

    # Extract the page output
    body = driver.find_element(By.TAG_NAME, "body").text
    print(body)
except Exception as e:
    # Handle any errors that occur with the browser or the proxy
    print(f"Error with proxy {proxy}: {e}")
finally:
    # Close the browser
    driver.quit()

To make it work, import By from Selenium:

from selenium.webdriver.common.by import By

In this example, the destination page is the /ip endpoint from the HTTPBin project. This has been a deliberate choice, as the page returns the IP address of the caller. If everything goes as expected, the script should print a different IP from the list of proxies on each run.

Time to verify that!

Step 4: Put It All Together

This is the whole Selenium Wire proxy rotation logic that should be in your selenium_wire.py file:

import random
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

def get_random_proxy():
    proxies = [
        "http://PROXY_1:PORT_NUMBER_X",
        "http://PROXY_2:PORT_NUMBER_Y",
        "http://PROXY_3:PORT_NUMBER_Z",
        # Add more proxies here...
    ]
    
    # Randomly pick a proxy
    return random.choice(proxies)
 
# Pick a random proxy URL 
proxy = get_random_proxy()

# Selenium Wire configuration with the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy,
        "https": proxy
    }
}

# Browser configuration
chrome_options = Options()
chrome_options.add_argument("--headless")  # Run the browser in headless mode 

# Initialize a browser instance with the given configurations
driver = webdriver.Chrome(service=Service(), options=chrome_options, seleniumwire_options=seleniumwire_options)

try:
    # Visit the target page
    driver.get("https://httpbin.io/ip")

    # Extract the page output
    body = driver.find_element(By.TAG_NAME, "body").text
    print(body)
except Exception as e:
    # Handle any errors that occur with the browser or the proxy
    print(f"Error with proxy {proxy}: {e}")
finally:
    # Close the browser
    driver.quit()

To run the file, launch:

python3 selenium_wire.py

At each run, the output should be:

{
  "origin": "PROXY_1:XXXX"
}

Or:

{
  "origin": "PROXY_2:YYYY"
}

And so on…

Run the script multiple times, and you will see a different IP address each time. Proxy rotation is working!

A Better Approach to Proxy Rotation: Bright Data Proxies

As we just saw, manual proxy rotation in Selenium Wire involves a lot of boilerplate code and requires maintaining a list of valid proxy URLs.

Luckily, Bright Data’s rotating proxies are a more efficient solution!

Our rotating proxies automatically handle IP address changes, eliminating the need for manual proxy management. With coverage in 195 countries, we guarantee exceptional network uptime and a success rate of 99.9%. Our worldwide proxy network includes:

Follow the steps below and learn how to use Bright Data’s proxies in Selenium Wire.

If you already have an account, log in to Bright Data. Otherwise, create an account for free. You will gain access to the following user dashboard:

The Bright Data dashboard

Click the “View proxy products” button:

View proxy products

You will be redirected to the “Proxies & Scraping Infrastructure” page below:

Configuring your residential proxies

Scroll down, find the “Residential Proxies” card, and click on the “Get started” button:

Residential proxies

You will reach the residential proxy configuration dashboard. Follow the guided wizard and set up the proxy service based on your needs. If you have any doubts about how to configure the proxy, feel free to contact the 24/7 support:

Configuring your residential proxies

Go to the “Access parameters” tab and retrieve your proxy’s host, port, username, and password as follows:

access parameter

Note that the “Host” field already includes the port.

That is all you need to build the proxy URL and set it in Selenium Wire. Put all the information together, and build a URL with the following syntax:

<username>:<password>@<host>

For example, in this case it would be:

brd-customer-hl_4hgu8dwd-zone-residential:[email protected]:XXXXX

Toggle “Active proxy,” follow the last instructions, and you are ready to go!

Active proxy toggle

Your Selenium Wire proxy snippet for Bright Data integration will look like as follows:

# Bright Data proxy URL
proxy = "brd-customer-hl_4hgu8dwd-zone-residential:[email protected]:XXXXX"

# Set up Selenium Wire options
options = {
    "proxy": {
        "http": proxy,
        "https": proxy
    }
}

# Initialize the WebDriver with Selenium Wire
driver = webdriver.Chrome(seleniumwire_options=options)

Proxy rotation is much easier with this approach!

Selenium vs Selenium Wire for Web Scraping

As a summary, take a look at the Selenium vs Selenium Wire summary table below:

Selenium Selenium Wire
Purpose A tool for automating web browsers to perform UI testing and web interactions Extends Selenium to provide additional capabilities for inspecting and modifying HTTP/HTTPS requests and responses
HTTP/HTTPS request handling Does not provide direct access to HTTP/HTTPS requests or responses Allows inspection, modification, and capturing of HTTP/HTTPS requests and responses
Proxy support Limited proxy support (requires manual configuration) Advanced proxy management, with support for dynamic setting
Performance Lightweight and fast Slightly slower due to the overhead of capturing and processing network traffic
Use cases Primarily used for functional testing of web applications, but useful also for basic web scraping cases Useful for testing APIs, debugging network traffic, and web scraping

Conclusion

In this blog post, you learned what Selenium Wire is and how it can be used for web scraping. In particular, we focused on proxy integration and rotating proxies. Keep in mind that while Selenium Wire is useful, it is not a one-size-fits-all solution. Also, it is no longer actively maintained.

The better approach is not to extend Selenium Wire, but rather to use vanilla Selenium or another browser automation tool along with a dedicated scraping browser.

Scraping Browser from Bright Data is a scalable, cloud browser that works with Playwright, Puppeteer, Selenium, and others. It automatically rotates exit IPs with every request and can handle browser fingerprinting, retries, CAPTCHA resolution, and much more. Forget about getting blocked and streamline your scraping operation.

Sign up now and start your free trial!

No credit card required