Wget User Agent Guide: Setting and Changing

Learn how to set and change the Wget user agent to enhance web scraping and avoid detection.
10 min read
Wget
User Agent Guide blog image

At the end of this article, you will know:

  • What a user agent is and why you should set it in your HTTP requests
  • The default user agent set by Wget
  • How to change the Wget user agent string
  • How to implement user agent rotation with Wget

Let’s dive in!  

User Agent: Definition and Why To Set It

A user agent is a string set in the User-Agent HTTP header by browsers, applications making web requests, and HTTP clients to identify the client software from which the request originates. This string usually contains information such as the browser or application type, operating system, and other relevant details.

For example, this is the user agent set by Chrome as of this writing when visiting web pages:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

The information in this string are:

  • Mozilla/5.0: Historically used to indicate compatibility with Mozilla browsers, but now a common prefix in user agents for compatibility purposes.
  • Windows NT 10.0; Win64; x64: Operating system (Windows NT 10.0), platform (Win64), and architecture (x64).
  • AppleWebKit/537.36: Browser engine used by this version of Chrome.
  • KHTML, like Gecko: Compatibility with the KHTML engine and Gecko layout engine used by Mozilla.
  1. Chrome/125.0.0.0: Browser name and its version.
  2. Safari/537.36: Compatibility with Safari.

In other words, the user agent is essential for determining whether a request comes from a known browser or from another software.

Scraping bots often use inconsistent or default user agent strings, revealing their automated nature. As a result, the User-Agent header helps anti-bot solutions—employed by sites to protect their pages and data—determine whether the current user is genuine or a bot.

For more information, read our guide on user agents for web scraping

What Is the Default Wget User Agent?

When making an HTTP request, Wget sets the User-Agent header to the following value:

Wget/X.Y.Z

The X.Y.Z string matches the version of Wget installed on your machine. 

To verify that the above string is actually the Wget user agent, perform a GET request to the httpbin.io /user-agent endpoint. This returns the string in the User-Agent header of the incoming request, representing a good way to verify the user agent used by the HTTP client.

Make a GET request to /user-agent with Wget using this instruction:

wget -O "response.json" "https://httpbin.io/user-agent"

Note: On Windows, replace wget with wget.exe. That is required because wget is an alias to Invoke-WebRequest in PowerShell, while wget.exe points to the Wget Windows executable. 

The previous command will download the response returned by the endpoint and store it in a local response.json containing something like this:

{

  "user-agent": "Wget/1.21.4"

}

In this case, the user agent set by Wget is Wget/1.21.4. As you can imagine, this clearly identifies the request as originating from Wget. Anti-bot solutions could easily flag such a request as not coming from a real user and immediately block it. That is why it is so important to know some Wget change user agent approaches!

How to Set a Wget User Agent

There are two possible ways to set a user agent in Wget. Let’s explore them both!

Set a Custom User Agent Directly

Wget provides an option for changing user agents. In detail, the -U or –user-agent option lets you override the default string used in the User-Agent header by Wget. Use the following syntax to set a user agent string in Wget:

wget [other_options] -U|--user-agent "<user-agent_string>" "<url>"

Now, take a look at the example below:

wget -O "response.json" -U "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" "https://httpbin.io/user-agent"

Open response.json, and you will see: 

{

  "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"

}

Awesome, the Wget set user agent strategy worked like a charm!

Do not forget that the previous Wget command is equivalent to:

wget.exe -O "response.json" --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" "https://httpbin.io/user-agent"

To remove the User-Agent header from the request, pass an empty string to -U. You can verify that by targeting the /headers endpoint from httpbin.io, which returns the HTTP headers of the incoming request:

wget -O "response.json" -U "" "https://httpbin.io/headers"

The response.json file will contain:

{

  "headers": {

    "Accept": [

      "*/*"

    ],

    "Accept-Encoding": [

      "identity"

    ],

    "Connection": [

      "Keep-Alive"

    ],

    "Host": [

      "httpbin.io"

    ]

  }

}

As expected, no User-Agent header.

If you want to unset the User-Agent header instead, pass a single space to -U: 

wget -O "response.json" -U " " "https://httpbin.io/headers"

The content in response.json will be:

{

  "headers": {

    "Accept": [

      "*/*"

    ],

    "Accept-Encoding": [

      "identity"

    ],

    "Connection": [

      "Keep-Alive"

    ],

    "Host": [

      "httpbin.io"

    ],

    "User-Agent": [

      ""

    ]

  }

}

The User-Agent header is there, but it contains an empty string as desired. 

Note: Deleting or unsetting the User-Agent header is a bad practice that can trigger anti-bot technologies. 

Set a Custom User-Agent HTTP Header

Since User-Agent is an HTTP header, you can set it like any other header in Wget using the –header option, through this syntax:

wget [other_options] --header "User-Agent: <user-agent_string>" "<url>"

See the –header option in action in the example below:

wget.exe -O response.json --header "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" "https://httpbin.io/user-agent"

The result in response.json will be:

{

  "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"

}

Great, the user agent value in the response matches the string in the –header option.

To unset the Wget user agent header, use the “User-Agent:” header value. If you need to remove the header entirely, you must use the -U option as explained earlier.

Implement User Agent Rotation with Wget

Using a static User-Agent value—even if it is a user agent from a real-world browser—may not be a successful approach when making automated requests with Wget. The problem is that anti-bot technologies monitor all incoming requests, and when they detect too many requests with the same headers from a particular IP, they might ban it.

Randomization of requests is the key to avoiding detection and blocking. How can requests be made less similar to each other? By using user agent rotation! This method helps simulate requests from different browsers, reducing the risk of triggering blocks or temporary bans.

You can achieve Wget user agent rotation with the following three-step approach:

  1. Retrieve some user agents:Collect a list of real user agent strings from browsers.
  2. Implement rotation logic:Randomly pick a random user agent from the list.
  3. Randomize the request:Set the selected user agent string in the Wget request.

Implementing this procedure requires a few lines of code, which you can write with Unix Bash or Windows PowerShell. You could also accomplish that by integrating Wget with Python

Now, dig into how to handle user agent rotation in Wget on both Windows and UNIX-based systems!

Bash

Gather a list of valid user agents from a site like User Agent String.com and store it in an array:

user_agents=(

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"

    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:126.0) Gecko/20100101 Firefox/126.0"

    # ...

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0"

)

Next, create a function that randomly extracts a user agent string from the list using RANDOM:

get_random_user_agent() {

    # number of user agents in the list

    local count=${#user_agents[@]}

    # generate a RANDOM number from 0 to count

    local index=$((RANDOM % count))

    # extract a user agent string from the list

    # and return it

    echo "${user_agents[$index]}"

}

Call the function to get a random user agent and use it in the Wget command:

# get the random user agent

user_agent=$(get_random_user_agent)

# perform a Wget request to a given URL

# using the random user agent

wget -O "response.json" -U "$user_agent" "https://httpbin.io/user-agent"

Note: Modify the target URL to suit your goals.

Put it all together, and you will get the following bash script:

#!/bin/bash

# list of user agent strings

user_agents=(

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"

    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:126.0) Gecko/20100101 Firefox/126.0"

    # ...

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0"

)

get_random_user_agent() {

    # number of user agents in the list

    local count=${#user_agents[@]}

    # generate a RANDOM number from 0 to count

    local index=$((RANDOM % count))

    # extract a user agent string from the list

    # and return it

    echo "${user_agents[$index]}"

}

# get the random user agent

user_agent=$(get_random_user_agent)

# perform a Wget request to a given URL

# using the random user agent

wget -O "response.json" -U "$user_agent" "https://httpbin.io/user-agent"

Add the above code to a .sh script and launch it. This will produce a response.json file in the same folder of the script. Open it, and see the user agent returned by the /user-agent endpoint. Execute the script other times and you will see different user agents. 

Well done! Wget user agent rotation implemented.

PowerShell

Get a list of real-world user agents from a site like WhatIsMyBrowser.com. Then, store those strings in a PowerShell array variable:

$user_agents = @(

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"

    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:126.0) Gecko/20100101 Firefox/126.0"

    # ...

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0"

)

Implement a function that randomly picks a user agent string from the list using Get-Random and returns it:

function Get-RandomUserAgent {

    # number of user agents in the list

    $count = $user_agents.Count

    # generate a random number from 0 to $count

    $index = Get-Random -Maximum $count

    # extract a user agent string and return it

    return $user_agents[$index]

}

Call the function to retrieve a random user agent string and use it in the Wget request:

# get the random user agent

$user_agent = Get-RandomUserAgent

# make an HTTP request to a given URL 

# using the random user agent

wget.exe -O "response.json" -U "$user_agent" "https://httpbin.io/user-agent"

Put it all together to get the following code:

# list of user agents

$user_agents = @(

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"

    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:126.0) Gecko/20100101 Firefox/126.0"

    # ...

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0"

)

function Get-RandomUserAgent {

    # number of user agents in the list

    $count = $user_agents.Count

    # generate a random number from 0 to $count

    $index = Get-Random -Maximum $count

    # extract a user agent string and return it

    return $user_agents[$index]

}

# get a random user agent

$user_agent = Get-RandomUserAgent

# make an HTTP request to a given URL 

# using the random user agent

wget.exe -O "response.json" -U "$user_agent" "https://httpbin.io/user-agent"

Store the above logic in a .ps1 script. Execute it a few times, and you will get different user agent strings in the response.json output file.

Et voilà! You are now a master at Wget change user agent procedures.

Conclusion

In this guide, you explored why you should always set the User-Agent header in an HTTP client and how to do it in Wget. This approach can trick simple anti-bot systems into believing that your requests are coming from legitimate browsers. However, advanced anti-bot solutions can still detect and block your requests. To circumvent anti-scraping measures like rate limiting, you could use a proxy with Wget. Unfortunately, that may not be enough!

Avoid all this stress and try Scraper API. As is a full-featured scraping API, it comes with everything you need to perform automated web requests with Wget or any other HTTP client. This all-in-one solution can bypass any anti-bot technology, featuring also features IP and user agent rotation. Making automated requests has never been easier!

Find the perfect product for your needs, register now.

No credit card required