Hypertext Transfer Protocol (HTTP) is a stateless protocol that follows the client-server model where a client makes a request and then waits for a response from the server. The request includes details such as the HTTP method, server location, path, query string, and headers.
HTTP headers are fields or lists of strings as key-value pairs that facilitate the transmission of metadata and instructions. They’re instrumental in defining parameters such as content type, caching behavior, and authentication, ensuring efficient and secure interactions between clients and servers. In web scraping, HTTP headers allow you to customize requests, enabling web scrapers to mimic user agents, control content negotiation, and handle authentication per website policies and protocols.
Some common use cases of HTTP headers in web scraping include changing the user-agent (UA) or response type, performing conditional requests, and authenticating to application programming interfaces (APIs).
In this article, you’ll learn how to send HTTP headers with curl.
Sending HTTP Headers With cURL
Before beginning this tutorial, verify that curl is installed on your operating system by running the following command in your terminal:
curl --version
If it’s installed, you’ll see a version number in the output, like this:
curl 7.55.1 (Windows) libcurl/7.55.1 WinSSL
Release-Date: [unreleased]
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL
If you get an error message, such as curl is not recognized as an internal or external command, operable program or batch file
or command not found
, then you need to install curl.
You also need a tool to inspect headers, such as httpbin.org, which is a simple HTTP request and response service.
If you’ve worked with curl before, you know that the curl syntax looks like this:
curl [options] [url]
That means if you want to download a web page from mywebpage.com
, you’d need to run the following command:
curl www.mywebpage.com
cURL Headers
To view headers sent by curl using httpbin.org, open your command line and run this:
curl http://httpbin.org/headers
Your output should include a list of the headers:
{
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"User-Agent": "curl/7.55.1",
"X-Amzn-Trace-Id": "Root=1-65fd2eb0-0617353714d52f3777c9c267"
}
The Accept
, Host
, and User-Agent
headers are sent by default with curl.
The Accept
header tells the server what media types the client can accept. It communicates to the server the content types that the client is willing to accept, enabling content negotiation between the client and server.
An Accept
header showing that the client favors JSON looks like this:
Accept: application/json
The User-Agent
field contains your client details, which, in this case, is the curl application running version 7.55.1 (this version number will match your version).
The Host
header uniquely identifies the web domain (ie the host) and the port where the HTTP request is being sent. If there isn’t a port included in the request, then the defaults are assumed (ie port 80 for HTTP and port 443 for HTTPS).
X-Amzn-Trace-Id
is the only header in the output that is not a default header. This header shows that your request was sent to an Amazon Web Services (AWS) service, such as an AWS load balancer, and can be used to trace HTTP requests.
To verify that these headers were sent by curl by default, you can use the verbose
mode. The flag for this is either -v
or --verbose
and displays detailed information about the request and response, including the headers.
Run the following command to view the default headers sent by curl:
curl -v http://httpbin.org/headers
Your output should look like this:
* Trying 50.16.63.240...
* TCP_NODELAY set
* Connected to httpbin.org (50.16.63.240) port 80 (#0)
> GET /headers HTTP/1.1
> Host: httpbin.org
> User-Agent: curl/7.55.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 22 Mar 2024 07:18:00 GMT
< Content-Type: application/json
< Content-Length: 173
< Connection: keep-alive
< Server: gunicorn/19.9.0
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
<
{
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"User-Agent": "curl/7.55.1",
"X-Amzn-Trace-Id": "Root=1-65fd30a8-624365ad52781957578cd5b1"
}
}
* Connection #0 to host httpbin.org left intact
The lines with a greater than sign (>) show what your client (curl) sent to the endpoint. This output confirms that the following headers were sent:
GET
(HTTP method) to the endpoint/headersHost
with the valuehttpbin.org
User-Agent
with the value ofcurl/7.55.1
Accept
with the value\*/\*
In the output, the lines with a less than sign, such as < Content-Type: application/json
, are reflections of the headers you sent.
Change the Default Headers Using the -H Flag
The -H
or --header
flag is used to pass custom header(s) to the server and can also be used for testing.
For example, to change the User-Agent
from curl/7.55.1
to Your-New-User-Agent
, use the following command:
curl -H "User-Agent: Your-New-User-Agent" http://httpbin.org/headers
Your output looks like this:
{
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"User-Agent": "Your-New-User-Agent",
"X-Amzn-Trace-Id": "Root=1-65fd5123-3ebe566a4681427c6996c72c"
}
}
If you want to change the Accept
header from */*
, which accepts any type of content, to application/json
, which accepts content in JSON format only, use the following command:
curl --header "Accept: application/json" http://httpbin.org/headers
Your output looks like this:
{
"headers": {
"Accept": "application/json",
"Host": "httpbin.org",
"User-Agent": "curl/7.55.1",
"X-Amzn-Trace-Id": "Root=1-65fd55c3-05c21f81770c1c5e6343b1fc"
}
}
Note: In the second example,
--header
was used instead of-H
.--header
and-H
are interchangeable and perform the same function.
Since curl version 7.55.0, you can also pass a file with your headers. For instance, if the file name with your headers is called header_file
, you can use the following command to pass a file with your headers:
Curl -H @header_file
Send Custom Headers
Custom headers are defined by developers and include additional information about HTTP requests beyond what is provided by standard headers.
To send a custom header with curl, you can use the -H
flag. For instance, if you want to send a custom header called My-Custom-Header
with the value value of custom header
, run the following command:
curl -H "My-Custom-Header: Value of custom header" http://httpbin.org/headers
Your output looks like this:
{
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"My-Custom-Header": "Value of custom header",
"User-Agent": "curl/7.55.1",
"X-Amzn-Trace-Id": "Root=1-65fd7d2a-3b683be160ff2965023b3a31"
}
}
Send Empty Headers
There are scenarios where sending empty headers is necessary, such as complying with specific API requirements that mandate certain headers even if they don’t have any content. For example, the HTTP Strict Transport Security (HSTS) header is used to enforce secure HTTPS connections on websites. While this header usually includes directives about HSTS duration and behavior, sending it with an empty value ensures immediate HSTS enforcement.
You can also use empty headers to clear or reset headers. For example, if you want to reset or clear a header that was previously set by default, sending an empty header can clear the value of the header.
To send an empty header with curl, you need to specify the header name followed by a semicolon to indicate an empty value. The following command shows you how you can send an empty custom header called My-Custom-Header
:
curl -H "My-Custom-Header;" http://httpbin.org/headers
The output shows My-Custom-Header
with an empty value:
{
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"My-Custom-Header": "",
"User-Agent": "curl/7.55.1",
"X-Amzn-Trace-Id": "Root=1-65fd84e2-7a42d9d62a42741e448c426f"
}
}
Remove a Header
To remove a header with curl, you need to specify the header name followed by a colon with no value.
For example, to remove the default User-Agent
header, send the following command:
curl -H "User-Agent:" http://httpbin.org/headers
The response does not contain the User-Agent
header, which verifies that the header was removed:
{
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"X-Amzn-Trace-Id": "Root=1-65fd862d-13b181583501ae11046374a1"
}
}
Send Multiple Headers
So far, you’ve seen examples that include only one header, but it’s possible to send more than one header with curl. All you need to do is include the -H
flag multiple times.
For instance, if you want to send two headers (Custom-Header-1
and Custom-Header-2
) with the values one
and two
, respectively, run the following command:
curl -H "Custom-Header-1: one" -H "Custom-Header-2: two" http://httpbin.org/headers
Your output looks like this:
{
"headers": {
"Accept": "*/*",
"Custom-Header-1": "one",
"Custom-Header-2": "two",
"Host": "httpbin.org",
"User-Agent": "curl/7.55.1",
"X-Amzn-Trace-Id": "Root=1-65fd8781-143be3502c559bc5605fc6f1"
}
}
Conclusion
In this article, you learned about HTTP headers and how to send HTTP headers with curl.
If you’re looking for a comprehensive web scraping solution, try Bright Data. It offers tools and services such as proxy services, which help ensure anonymity and prevent IP blocking, and Web Unlocker, which helps you access geographically restricted content without CAPTCHAs.
Start your free trial today!
No credit card required