Guide to Web Scraping With cURL Impersonate

Learn how to use cURL Impersonate to mimic browser behavior for web scraping, with detailed guidance on command line and Python usage, plus advanced tips.
9 min read
web scraping with cURL impersonate blog image

In this guide, you will learn:

  • What cURL Impersonate is
  • The reasons behind the project and how it works
  • How to use it via the command line
  • How to use it in Python
  • Advanced techniques and aspects

Let’s dive in!

What Is cURL Impersonate?

cURL Impersonate is a special build of cURL designed to mimic the behavior of major browsers(i.e., Chrome, Edge, Safari, and Firefox). In detail, this tool performs TLS and HTTP handshakes that closely resemble those of real browsers.

The HTTP client can be used either through the curl-impersonate command-line tool, similar to regular curl, or as a library in Python.

These are the browsers that can be impersonated:

Browser Simulated OS Wrapper Script
Chrome 99 Windows 10 curl_chrome99
Chrome 100 Windows 10 curl_chrome100
Chrome 101 Windows 10 curl_chrome101
Chrome 104 Windows 10 curl_chrome104
Chrome 107 Windows 10 curl_chrome107
Chrome 110 Windows 10 curl_chrome110
Chrome 116 Windows 10 curl_chrome116
Chrome 99 Android 12 curl_chrome99_android
Edge 99 Windows 10 curl_edge99
Edge 101 Windows 10 curl_edge101
Firefox 91 ESR Windows 10 curl_ff91esr
Firefox 95 Windows 10 curl_ff95
Firefox 98 Windows 10 curl_ff98
Firefox 100 Windows 10 curl_ff100
Firefox 102 Windows 10 curl_ff102
Firefox 109 Windows 10 curl_ff109
Firefox 117 Windows 10 curl_ff117
Safari 15.3 macOS Big Sur curl_safari15_3
Safari 15.5 macOS Monterey curl_safari15_5

Each supported browser has a specific wrapper script. That configures curl-impersonate with the appropriate headers, flags, and settings to simulate a specific browser.

How curl-impersonate Works

When you send a request to a website over HTTPS, a process called the TLS handshake occurs. During this handshake, details about the HTTP client are shared with the web server, creating a unique TLS fingerprint

HTTP clients have capabilities and configurations that differ from those of a standard browser. This discrepancy results in a TLS fingerprint that can easily reveal the use of HTTP clients. As a result, anti-bot measures used by the target site can detect your requests as automated and potentially block them.

cURL Impersonate addresses this issue by modifying the standard curl tool to make its TLS fingerprint match that of real browsers. Here is how it achieves the goal:

  • TLS library modification: For the Chrome version of curl-impersonate, curl is compiled with BoringSSL, Google’s TLS library. For the Firefox version , curl is compiled with the NSS, the TLS library used by Firefox.
  • Configuration adjustments: It modifies how cURL configures various TLS extensions and SSL options to mimic the settings of real browsers. It also adds support for new TLS extensions that are commonly used by browsers.
  • HTTP/2 handshake customization: It changes the settings cURL uses for HTTP/2 connections to align with those of real browsers.
  • Non-default flags: It runs with specific non-default flags, such as --ciphers, --curves, and some -H headers, which further helps in mimicking browser behavior.

Thus, curl-impersonate makes curl requests appear from a network perspective as if they were made by a real browser. This is useful for bypassing many bot detection mechanisms!

curl-impersonate: Command Line Tutorial

Follow the steps below to learn how to use cURL Impersonate from the command line.

Note: For completeness, multiple installation methods will be displayed. However, you need to choose only one. The recommended method is using Docker.

Installation From Pre-Compiled Binaries

You can download pre-compiled binaries for Linux and macOS from the GitHub releases page of the project. These binaries contain a statically compiled curl-impersonate. Before using them, ensure you have the following installed:

  • NSS (Network Security Services): A set of libraries designed to support cross-platform development of security-enabled client and server applications. NSS is used in Mozilla products like Firefox and Thunderbird for handling the TLS protocol.
  • CA certificates: A collection of digital certificates that authenticate the identity of servers and clients during secure communications. They ensure that your connection to a server is trustworthy by verifying that the server’s certificate has been signed by a recognized CA (Certificate Authority).

To meet the prerequisites, on Ubuntu, run: 

sudo apt install libnss3 nss-plugin-pem ca-certificates

On Red Hat, Fedora, or CentOS, execute: 

yum install nss nss-pem ca-certificates

On Archlinux, launch: 

pacman -S nss ca-certificates

On macOS, fire this command: 

brew install nss ca-certificates

Also, ensure you have zlib installed on your system, as the pre-compiled binary packages are gzipped.

Installation Through Docker

Docker images—based on Alpine Linux and Debian—with curl-impersonate compiled and ready to use are available on Docker Hub. These images include the binary and all necessary wrapper scripts.

The Chrome images(*-chrome) can impersonate Chrome, Edge, and Safari. Instead, the Firefox images(*-ff) can impersonate Firefox.

To download the Docker image you prefer, use one of the commands below.

For Chrome version on Alpine Linux:

docker pull lwthiker/curl-impersonate:0.5-chrome

For Firefox version on Alpine Linux:

docker pull lwthiker/curl-impersonate:0.5-ff

For Chrome version on Debian:

docker pull lwthiker/curl-impersonate:0.5-chrome-slim-buster

For Firefox version on Debian:

docker pull lwthiker/curl-impersonate:0.5-ff-slim-buster

Once downloaded, as you are about to see, you can execute curl-impersonate using a docker run command.

Installation From Distro Packages

On Arch Linux, curl-impersonate is available through the AUR package curl-impersonate-bin.

On macOS, you can install the unofficial Homebrew package for the Chrome version with the following commands:

brew tap shakacode/brew

brew install curl-impersonate

Basic Usage

Regardless of the installation method, you can now execute a curl-impersonate command using this syntax:

curl-impersonate-wrapper [options] [target-url]

Or, equivalently, on Docker, run something like:

docker run --rm lwthiker/curl-impersonate:[curl-impersonate-version]curl-impersonate-wrapper [options] [target_url]

Where:

  • curl-impersonate-wrapper is the cURL Impersonate wrapper you want to use (e.g., curl_chrome116, curl_edge101, curl_ff117, curl_safari15_5, etc.).
  • options are the optional flags that will be passed on to cURL. 
  • target-url is the URL of the web page to make an HTTP request to.

Be cautious while specifying custom options as some flags alter cURL’s TLS signature, potentially making it detectable. To learn more, check out our introduction to CURL.

Note that the wrappers automatically set a default collection of HTTP headers. To customize these headers, modify the wrapper scripts to suit your needs.

Now, let’s use curl-impersonate to make a request to the Wikipedia homepage using a Chrome wrapper:

curl_chrome110 https://www.wikipedia.org

Or, if you are a Docker user:

docker run --rm lwthiker/curl-impersonate:0.5-chrome curl_chrome110 https://www.wikipedia.org

The result will be:

<html lang="en" class="no-js">

  <head>

    <meta charset="utf-8">

    <title>Wikipedia</title>

    <meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.">

<!-- omitted for brevity... -->

Wonderful! The server returned the HTML of the desired page as if you were accessing it via a browser.

You can now use cURL Impersonate for web scraping just as you would use cURL for web scraping.

curl-impersonate: Python Tutorial

Command line usage is great for testing, but web scraping processes typically rely on custom scripts written in languages like Python. Discover the best programming languages for web scraping!

Fortunately, you can use cURL Impersonate in Python thanks to curl-cffi. This is a Python binding for curl-impersonate via cffi. In particular, curl-cffi can impersonate browsers’ TLS/JA3 and HTTP/2 fingerprints to connect to web pages without getting blocked.

See how to use it in the step-by-step section below!

Prerequisites

Before getting started, make sure you have:

  • Python 3.8+ installed on your machine
  • A Python project with a virtual environment set up

Optionally, a Python IDE like Visual Studio Code with the Python extension is recommended.

Installation

Install curl_cfii via pip as follows:

pip install curl_cfii

Usage

curl_cffi provides both a low-level curl API and a high-level requests-like API. Find out more in the official documentation.

Typically, you want to use the requests-like API. To do this, import requests from curl_cffi:

from curl_cffi import requests

You can now use the Chrome version of cURL Impersonate in Python to connect to a web page with:

response = requests.get("https://www.wikipedia.org", impersonate="chrome")

Print the response HTML with:

print(response.text)

Put it all together, and you will get:

from curl_cffi import requests

# make a GET request to the target page with

# the Chrome version of curl-impersonate

response = requests.get("https://www.wikipedia.org", impersonate="chrome")

# print the server response

print(response.text)

Run the above Python script, and it will print:

<html lang="en" class="no-js">

  <head>

    <meta charset="utf-8">

    <title>Wikipedia</title>

    <meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.">

<!-- omitted for brevity... -->

Great! You are now ready to perform web scraping in Python, just as you would with Requests and Beautiful Soup. For more guidance, follow our guide on web scraping with Python.

cURL Impersonate Advanced Usage

Time to explore some advanced usages and techniques!

Proxy Integration

Simulating browser fingerprints may not be enough. Anti-bot solutions might still block you, especially if you make too many automated requests in a short amount of time. This is where proxies come in!

By routing your request through a proxy server, you can get a fresh IP address and protect your identity.

Suppose the URL to your proxy server is:

http://84.18.12.16:8888

cURL Impersonate supports proxy integration via the command line using the-x flag:

curl-impersonate -x http://84.18.12.16:8888 https://httpbin.org/ip

For more details, read how to set a proxy in cURL.

In Python, you can set up a proxy similarly to how you would with requests:

from curl_cffi import requests

proxies = {"http": "http://84.18.12.16:8888", "https": "http://84.18.12.16:8888"}

response = requests.get("https://httpbin.org/ip", impersonate="chrome", proxies=proxies)

For additional information, see how to integrate a proxy with Python requests.

Libcurl Integration

libcurl-impersonate is a compiled version of libcurl that includes the same cURL Impersonate features. It also offers an extended API for adjusting TLS details and header configurations.

libcurl-impersonate can be installed using the pre-compiled package. Its goal is to facilitate the integration of cURL Impersonate into libraries in various programming languages, such as the curl-cffi Python package.

Conclusion

In this article, you learned what cURL Impersonate is, how it works, and how to use it both via CLI and in Python. You now understand that it is a tool for making HTTP requests while simulating the TLS fingerprint of real-world browsers.

The problem is that advanced anti-bot solutions like Cloudflare may still detect your requests as coming from a bot. The solution? Bright Data’s Scraper API—a next-generation, all-in-one, comprehensive scraping solution.

Scraper API provides everything you need to perform automated web requests using cURL or any other HTTP client. This full-featured solution handles browser fingerprinting, CAPTCHA solving, and IP rotation for you to bypass any anti-bot technology. Making automated HTTP requests has never been easier!

Register now for a free trial of Bright Data’s web scraping infrastructure or talk to one of our data experts about our scraping solutions.

No credit card required