In this guide, you will learn:
- What
curl_cffi
is and the features it offers - How it minimizes TLS fingerprint-based bot detection
- How to use it with Python for web scraping
- Advanced usage and methods
- A comparison with similar HTTP clients
Let’s dive in!
What Is curl_cffi
?
curl_cffi
is a library that provides Python bindings for the curl-impersonate
fork via CFFI. In other words, it is an HTTP client capable of impersonating browser TLS/JA3/HTTP2 fingerprints. This makes the library an excellent solution for bypassing anti-bot blocks based on TLS fingerprinting.
⚙️ Features
- Supports JA3/TLS and HTTP2 fingerprint impersonation, including recent browsers and custom fingerprints
- Much faster than
requests
andhttpx
, on par withaiohttp
- Mimics the
requests
API - Supports
asyncio
for asynchronous HTTP requests - Support for with proxy rotation on each request
- Supports HTTP/2.0
- Supports
WebSocket
s
How It Works
curl_cffi
is built on cURL Impersonate, a library that generates TLS fingerprints matching real-world browsers.
When you send an HTTPS request, a TLS handshake occurs, producing a unique TLS fingerprint. Since HTTP clients differ from browsers, their fingerprints can expose automation, triggering anti-bot defenses.
cURL Impersonate modifies cURL to match real browser TLS fingerprints:
- TLS library tweaks: Rely on the libraries for TLS connection used by browsers instead of that of cURL.
- Configuration changes: Adjust TLS extensions and SSL options to mimic browsers.
- HTTP/2 customization: Match browser handshake settings.
- Non-default cURL flags: Set
--ciphers
,--curves
, and custom headers for accuracy.
This makes the requests appear browser-like, helping bypass bot detection. For more information, refer to our guide on cURL Impersonate.
How to Use curl_cffi
for Web Scraping: Step-By-Step Guide
Suppose your goal is to scrape the “Keyboard” page from Walmart:
If you try to access this page using any HTTP client, you will receive the following error page:
Do not be misled by the 200 OK
response status. The page returned by Walmart’s server is actually a bot detection page. It specifically asks you to verify whether you are human with a CAPTCHA challenge.
You might wonder, how is this possible even if youe set the User-Agent
to simulate a real browser? The answer is TLS fingerprinting!
Now, let’s see how to use curl_cffi
to avoid anti-bot measures and perform web scraping with ease.
Step #1: Project Setup
First, make sure that you have Python 3+ installed on your machine. Otherwise, download it from the official site and follow the installation instructions.
Then, create a directory for your curl_cffi
scraping project using this command:
Navigate into that directory and set up a virtual environment inside it:
Open the project folder in your preferred Python IDE. Visual Studio Code with the Python extension or PyCharm Community Edition are both valid choices.
Now, create a scraper.py
file inside the project folder. It will be empty at first, but you will soon add the scraping logic to it.
In your IDE’s terminal, activate the virtual environment. On Linux or macOS, use:
Equivalently, on Windows, launch:
Amazing! You are all set up and ready to go.
Step #2: Install curl_cffi
In an activated virtual environment, install the HTTP client via the curl-cffi
pip package:
Behind the scenes, this library automatically downloads the curl
impersonation binaries for Windows, macOS, and Linux.
Step #3: Connect to the Target Page
Import requests
from curl_cffi
:
This object exposes a high-level API that is similar to that of the Python Requests library.
You can use it to perform a GET HTTP request to the target page as follows:
The impersonate="chrome"
argument tells curl_cffi
to make the HTTP request look like it is coming from the latest version of Chrome. As a result, Walmart will treat the automated request as a regular browser request, returning the standard web page instead of an anti-bot page.
You can access the HTML content of the target page with:
If you print html
, you will see:
Great! That is the HTML of the regular Walmart “keyboard” product page.
Step #4: Add the Data Scraping Logic
curl_cffi
is just an HTTP client that helps you retrieve the HTML of a page. If you want to perform web scraping, you will also need a library for HTML parsing like BeautifulSoup. For more guidance, refer to our guide on BeautifulSoup web scraping.
In the activated virtual environment, install BeautifulSoup:
Import it in scraper.py
:
Then, use it to parse the HTML of the page:
"html.parser"
is the default HTML parser from Python’s standard library used by BeautifulSoup for parsing the HTML string. Now, soup
contains all the methods you need to select HTML elements on the page and extract data from them.
In this example, as data parsing is not what matters most, we will scrape only the page title. You can select it through a CSS selector using the find()
method and then access its text with the text
attribute:
For more advanced scraping logic, refer to our guide on how to scrape Walmart.
Finally, print the page title:
Awesome! You implemented basic web scraping logic.
Step #5: Put It All Together
This is your final curl_cffi
web scraping script:
Launch it with the following command:
Or, equivalently, on Windows:
The result will be:
If you remove the impersonate="chrome"
argument, you will get instead:
This demonstrates how browser impersonation makes all the difference when it comes to avoiding anti-scraping measures.
Mission complete!
curl_cffi
: Advanced Usage
Now that you know how the library works, you are ready to explore some more advanced scenarios.
Browser Impersonation Selection
curl_cffi
supports impersonating several browsers. Each browser is associated with a unique label that you can pass to the impersonate
argument as below:
Here are the labels for the supported browsers:
chrome99
,chrome100
,chrome101
,chrome104
,chrome107
,chrome110
,chrome116
,chrome119
,chrome120
,chrome123
,chrome124
,chrome131
chrome99_android
,chrome131_android
edge99
,edge101
safari15_3
,safari15_5
,safari17_0
,safari17_2_ios
,safari18_0
,safari18_0_ios
Notes:
- To always impersonate the latest browser versions, you can simply use
chrome
,safari
andsafari_ios
. - Firefox is currently not available, as only WebKit-based browsers are supported.
- Browser versions are added only when their fingerprints change. If a version, such as
chrome122
, is skipped, you can still impersonate it by using the headers of the previous version. - For non-browser targets, use
ja3
,akamai
, and similar arguments to specify your own custom TLS fingerprints. For details, refer to the documentation on impersonation.
Session Management
Just like the requests
library, curl-cfii
supports sessions. Session
objects allow you to persist certain parameters across multiple requests, such as cookies, headers, or other session-specific data.
This is how you can define a session using the Python bindings for the cURL Impersonate library:
The output of the above script will be:
The result proves that the session is maintaining state across requests, such as storing cookies defined by the server.
Proxy Integration
Just like the requests
library, curl_cffi
supports proxy integration through a proxies
object:
Since the underlying API are very similar to requests
, refer to our guide on how to use a proxy in Requests.
Async API
curl_cffi
supports asynchronous requests through asyncio
via the AsyncSession
object:
Using AsyncSession
makes it easier to handle multiple asynchronous requests efficiently, which is vital for speeding up web scraping.
WebScokets Connection
curl_cffi
also supports WebSocket
s through the WebSocket
class:
This is particularly useful for scraping real-time data from sites or APIs that use WebSocket
to populate data dynamically. Some examples are sites with financial market data, live sports scores, or live chats.
Instead of scraping rendered pages, you can directly target the WebSocket
channel for efficient data retrieval.
Note: You can use WebSocket
s asynchronously thanks to the AsyncWebSocket
class.
curl_cffi vs Requests vs AIOHTTP vs HTTPX for Web Scraping
Below is a summary table to compare curl_cffi
with other popular Python HTTP clients for web scraping:
Feature | curl_cffi | Requests | AIOHTTP | HTTPX |
---|---|---|---|---|
Sync API | ✔️ | ✔️ | ❌ | ✔️ |
Async API | ✔️ | ❌ | ✔️ | ✔️ |
Support for **WebSocket** s |
✔️ | ❌ | ✔️ | ❌ |
Connection pooling | ✔️ | ✔️ | ✔️ | ✔️ |
Support for HTTP/2 | ✔️ | ❌ | ❌ | ✔️ |
**User-Agent** customization |
✔️ | ✔️ | ✔️ | ✔️ |
TLS fingerprint spoofing | ✔️ | ❌ | ❌ | ❌ |
Speed | High | Medium | High | Medium |
Retry mechanism | ❌ | Available via HTTPAdapter s |
Available only via a third-party library | Available via built-in Transport s |
Proxy integration | ✔️ | ✔️ | ✔️ | ✔️ |
Cookie handling | ✔️ | ✔️ | ✔️ | ✔️ |
curl_cffi
Alternatives for Web Scraping
curl_cffi
involves a manual approach to web scraping, where you need to write most of the code yourself. While suitable for simple static websites, that is prone to challenges from when targeting dynamic or more secure sites.
Bright Data provides a range of curl_cffi
alternatives for web scraping:
- Scraping Browser API: Fully managed cloud browser instances integrated with Puppeteer, Selenium, and Playwright. These browsers offer built-in CAPTCHA solving and automated proxy rotation, bypassing anti-bot defenses while interacting with websites like real users.
- Web Scraper APIs: Pre-configured endpoints for retrieving fresh, structured data from over 100 popular domains. These APIs are ethical and compliant, allowing easy data extraction using HTTPX or any other HTTP client.
- No-Code Scraper: An intuitive, on-demand data collection service that eliminates coding. It offers control, scalability, and flexibility without dealing with infrastructure, proxies, or anti-scraping hurdles.
- Datasets: Access pre-built datasets from various websites or customize data collections to fit your requirements.
These solutions simplify scraping by offering robust, scalable, and compliant data extraction tools that reduce manual effort.
Conclusion
In this article, you discovered how to use the curl_cffi
library for web scraping. You explored its purpose, key features, and advantages. This HTTP client excels as a fast and dependable option for making requests that mimic real browsers.
However, automated HTTP requests can expose your public IP address, potentially revealing your identity and location, which poses a privacy risk. To protect your security and anonymity, one of the most effective solutions is to use a proxy server to hide your IP address.
Bright Data controls the best proxy servers in the world, serving Fortune 500 companies and more than 20,000 customers. Its offer includes a wide range of proxy types:
- Datacenter proxies – Over 770,000 datacenter IPs.
- Residential proxies – Over 72M residential IPs in more than 195 countries.
- ISP proxies – Over 700,000 ISP IPs.
- Mobile proxies – Over 7M mobile IPs.
Create a free Bright Data account today to test our proxies and scraping solutions!
No credit card required