In this tutorial, you will learn:
- Why you need to set the user agent header
- The default Python
requests
user agent - How to change and unset the user agent in Requests
- How to implement user agent rotation in Python
Let’s dive in!
Why You Should Always Set the User Agent Header
The User-Agent HTTP header is set by browsers, applications performing web requests, and HTTP clients to identify the client software making the request. This value typically includes details about the browser or application type, operating system, and architecture the request comes from.
For instance, here is the user agent set by Chrome as of this writing when visiting web pages:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36
The components of this user agent are:
Mozilla/5.0:
Historically used to indicate compatibility with Mozilla browsers. It now is a common prefix added to user agents for compatibility reasons.Windows NT 10.0; Win64; x64
: Operating system (Windows NT 10.0
), platform (Win64
), and architecture (x64).AppleWebKit/537.36
: Browser engine used by the version of Chrome making the request.KHTML, like Gecko
: Compatibility with the KHTML engine and Gecko layout engine used by Mozilla.
Chrome/125.0.0.0
: Browser name and its version.Safari/537.36
: Compatibility with Safari.
In short, the user agent is crucial for identifying whether a request originates from a well-known browser or another type of software.
Scraping bots tend to use default or inconsistent user agent strings, revealing their automated nature. Consequently, anti-scraping solutions protect data on web pages by looking at the User-Agent
header to determine whether the current user is legitimate or a bot.
For more details, read our guide on user agents for web scraping.
What Is the Default Requests Python User Agent?
Like most HTTP clients, Requests sets a User-Agent
header when making HTTP requests. In particular, the default user agent set by requests
follows the format below:
python-requests/X.Y.Z
Where X.Y.Z
is the version of the requests
package installed in your project.
Verify that the above string is actually the Requests user agent by making a GET request to the httpbin.io /user-agent
endpoint. This API returns the User-Agent
header read from the incoming request. In other terms, it allows you to check the user agent automatically set by an HTTP client.
Import requests
and use its get() method to perform the desired HTTP request:
import requests
# make an HTTP GET request to the specified URL
response = requests.get('https://httpbin.io/user-agent')
# parse the API response as JSON and print it
print(response.json())
Execute the above Python snippet, and you will get something like this:
{'user-agent': 'python-requests/2.32.3'}
The user agent is python-requests/2.32.3
, which clearly identifies the request as originating from the requests
library. As a result, anti-bot systems can mark such a request as not coming from a human user and immediately block it. Here is why it is so crucial to change the Python Requests user agent value!
For more information, check out our complete guide on the Python Requests library.
How to Change the Python Requests User Agent
Let’s see how to change and unset the value of the User-Agent
header in Requests!
Set a Custom User Agent
Requests does not provide a direct option for setting the user agent value. At the same time, User-Agent
is nothing more than an HTTP header. So, you can customize its value like any other HTTP header by using the headers
option as below:
import requests
# custom user agent header
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'
}
# make an HTTP GET request to the specified URL
# setting custom headers
response = requests.get('https://httpbin.io/user-agent', headers=headers)
# parse the API response as JSON and print it
print(response.json())
Execute the above Python snippet again, and this time it will print:
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'}
Wonderful! You just learned that to set a custom Python requests user agent, you have to:
- Define a Python dictionary with a
user-agent
property. - Pass the dictionary to the
headers
parameter of therequests
method you are using to make the HTTP request.
Do not forget that HTTP header names are case insensitive, so the property names in the header dictionary can have the format you prefer.
Note: This approach also works with the request()
, post()
, patch()
, put()
, delete()
, and head()
.
To set a global requests
user agent, you need to configure a custom HTTP session as follows:
import requests
# initialize an HTTP session
session = requests.Session()
# set a custom header in the session
session.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'
# perform a GET request within the HTTP session
response = session.get('https://httpbin.io/user-agent')
# print the data returned by the API
print(response.json())
# other requests with a custom user agent within the session ...
This will produce the same output as before. If you are not familiar with HTTP sessions in Requests, check out the docs.
Unset the User Agent
Making HTTP requests without setting the user agent is a bad practice that can easily trigger anti-bot solutions. However, there are situations where you may need to remove the User-Agent
header.
The first approach to unset the user agent in Requests you could come up with is to set the User-Agent
header to None
:
import requests
# custom user agent header
headers = {
'user-agent': None
}
# make an HTTP GET request to the specified URL
# setting custom headers
response = requests.get('https://httpbin.io/user-agent', headers=headers)
# parse the API response as JSON and print it
print(response.json())
This will not work because requests
uses urllib3
behind the scenes. Thus, it will default to the urllib3
user agent value:
python-urllib3/2.2.1
In detail, the /user-agent
endpoint will return something like:
{'user-agent': 'python-urllib3/2.2.1'}
What you need to do instead is to configure urllib3
to skip the default user agent value using urllib3.util.SKIP_HEADER
. Verify that the user agent has been unset by targeting the /
headers
endpoint from httpbin.io, which returns the HTTP headers of the incoming request:
import requests
import urllib3
# exclude the default user agent value
headers = {
'user-agent': urllib3.util.SKIP_HEADER
}
# prepare the HTTP request to make
req = requests.Request('GET', 'https://httpbin.io/headers')
prepared_request = req.prepare()
# set the custom headers with no user agent
prepared_request.headers = headers
# create a requests session and perform
# the request
session = requests.Session()
response = session.send(prepared_request)
# print the returned data
print(response.json())
Run the above Python code, and you will receive:
{'headers': {'Accept-Encoding': ['identity'], 'Host': ['httpbin.io']}}
Amazing! As expected, no Python requests user agent.
Implement User Agent Rotation in Requests
Changing the default User-Agent
header to a proper value from a real browser may not be enough. If you make too many requests from the same IP address using the same user agent, it may trigger suspicion from anti-bot technologies. These systems monitor all incoming requests, knowing that automated requests usually follow regular patterns.
The key to avoiding bot detection is to randomize your requests. A good way to make each request different from the other is user agent rotation. The idea behind this technique is to keep changing the user agent header used by the HTTP client. This way, you can make your automated requests appear as coming from different browsers, reducing the risk of triggering blocking or temporary bans.
Now, follow the steps below to implement user agent rotation in Requests!
Step #1: Retrieve a List of Valid User Agents
Gather a list of proper user agents from a site like User Agent String.com and store it in a Python array:
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:126.0) Gecko/20100101 Firefox/126.0",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0"
# other user agents...
]
Step #2: Extract a Random User Agent
Randomly extracts a user agent string from the array using random.choice()
:
random_user_agent = random.choice(user_agents)
Do not forget that the above line requires the following import:
import random
Step #3: Set the Random User Agent and Make the HTTP Request
Define the header dictionary with the random user agent and use it in the requests
request:
headers = {
'user-agent': random_user_agent
}
response = requests.get('https://httpbin.io/user-agent', headers=headers)
print(response.json())
These instructions require this import:
import requests
Step #4: Put It All Together
This is what your Python Requests user agent rotation logic will look like:
import random
import requests
# list of user agents
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:126.0) Gecko/20100101 Firefox/126.0",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0"
# other user agents...
]
# pick a random user agent from the list
random_user_agent = random.choice(user_agents)
# set the random user agent
headers = {
'user-agent': random_user_agent
}
# perform a GET request to the specified URL
# and print the response data
response = requests.get('https://httpbin.io/user-agent', headers=headers)
print(response.json())
Execute this script a few times, and you will get different user agent strings.
Et voilà! You are now a master at setting Requests Python user agent values.
Conclusion
In this guide, you learned the importance of setting the User-Agent
header and how to do that in requests
. That way, you can trick basic anti-bot systems into thinking that your requests come from legitimate browsers. However, advanced solutions may still be able to detect and block you. To prevent IP bans, you could use a proxy with requests
, but even that might not be enough!
Avoid these complications with Web Scraper API. This next-generation scraping API provides everything you need to perform automated web requests using requests
or any other HTTP client. It effortlessly bypasses anti-bot technologies for you, relying on features like IP and user agent rotation. Making successful automated requests has never been easier!
Talk to one of our data experts about our scraping solutions or simply explore all the available products by registering now. Free trials available!
No credit card required