Web Scraping With curl_cffi and Python in 2025

In this guide, you will learn:

What curl_cffi is and the features it offers
How it minimizes TLS fingerprint-based bot detection
How to use it with Python for web scraping
Advanced usage and methods
A comparison with similar HTTP clients

Let’s dive in!

What Is `curl_cffi`?

curl_cffi is a library that provides Python bindings for the curl-impersonate fork via CFFI. In other words, it is an HTTP client capable of impersonating browser TLS/JA3/HTTP2 fingerprints. This makes the library an excellent solution for bypassing anti-bot blocks based on TLS fingerprinting.

⚙️ Features

Supports JA3/TLS and HTTP2 fingerprint impersonation, including recent browsers and custom fingerprints
Much faster than requests and httpx, on par with aiohttp
Mimics the requests API
Supports asyncio for asynchronous HTTP requests
Support for with proxy rotation on each request
Supports HTTP/2.0
Supports WebSockets

How It Works

curl_cffi is built on cURL Impersonate, a library that generates TLS fingerprints matching real-world browsers.

When you send an HTTPS request, a TLS handshake occurs, producing a unique TLS fingerprint. Since HTTP clients differ from browsers, their fingerprints can expose automation, triggering anti-bot defenses.

cURL Impersonate modifies cURL to match real browser TLS fingerprints:

TLS library tweaks: Rely on the libraries for TLS connection used by browsers instead of that of cURL.
Configuration changes: Adjust TLS extensions and SSL options to mimic browsers.
HTTP/2 customization: Match browser handshake settings.
Non-default cURL flags: Set --ciphers, --curves, and custom headers for accuracy.

This makes the requests appear browser-like, helping bypass bot detection. For more information, refer to our guide on cURL Impersonate.

How to Use `curl_cffi` for Web Scraping: Step-By-Step Guide

Suppose your goal is to scrape the “Keyboard” page from Walmart:

If you try to access this page using any HTTP client, you will receive the following error page:

Do not be misled by the 200 OK response status. The page returned by Walmart’s server is actually a bot detection page. It specifically asks you to verify whether you are human with a CAPTCHA challenge.

You might wonder, how is this possible even if youe set the User-Agent to simulate a real browser? The answer is TLS fingerprinting!

Now, let’s see how to use curl_cffi to avoid anti-bot measures and perform web scraping with ease.

Step #1: Project Setup

First, make sure that you have Python 3+ installed on your machine. Otherwise, download it from the official site and follow the installation instructions.

Then, create a directory for your curl_cffi scraping project using this command:

mkdir curl-cfii-scraper

Navigate into that directory and set up a virtual environment inside it:

cd curl-cfii-scraper
python -m venv env

Open the project folder in your preferred Python IDE. Visual Studio Code with the Python extension or PyCharm Community Edition are both valid choices.

Now, create a scraper.py file inside the project folder. It will be empty at first, but you will soon add the scraping logic to it.

In your IDE’s terminal, activate the virtual environment. On Linux or macOS, use:

./env/bin/activate

Equivalently, on Windows, launch:

env/Scripts/activate

Amazing! You are all set up and ready to go.

Step #2: Install `curl_cffi`

In an activated virtual environment, install the HTTP client via the curl-cffi pip package:

pip install curl-cffi

Behind the scenes, this library automatically downloads the curl impersonation binaries for Windows, macOS, and Linux.

Step #3: Connect to the Target Page

Import requests from curl_cffi:

from curl_cffi import requests

This object exposes a high-level API that is similar to that of the Python Requests library.

You can use it to perform a GET HTTP request to the target page as follows:

response = requests.get("https://www.walmart.com/search?q=keyboard", impersonate="chrome")

The impersonate="chrome" argument tells curl_cffi to make the HTTP request look like it is coming from the latest version of Chrome. As a result, Walmart will treat the automated request as a regular browser request, returning the standard web page instead of an anti-bot page.

You can access the HTML content of the target page with:

html = response.text

If you print html, you will see:

<!DOCTYPE html>
<html lang="en-US">
   <head>
      <meta charSet="utf-8"/>
      <meta property="fb:app_id" content="105223049547814"/>
      <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1, interactive-widget=resizes-content"/>
      <link rel="dns-prefetch" href="https://tap.walmart.com "/>
      <link rel="preload" fetchpriority="high" crossorigin="anonymous" href="https://i5.walmartimages.com/dfw/63fd9f59-a78c/fcfae9b6-2f69-4f89-beed-f0eeb4237946/v1/BogleWeb_subset-Bold.woff2" as="font" type="font/woff2"/>
      <link rel="preload" fetchpriority="high" crossorigin="anonymous" href="https://i5.walmartimages.com/dfw/63fd9f59-a78c/fcfae9b6-2f69-4f89-beed-f0eeb4237946/v1/BogleWeb_subset-Regular.woff2" as="font" type="font/woff2"/>
      <link rel="preconnect" href="https://beacon.walmart.com"/>
      <link rel="preconnect" href="https://b.wal.co"/>
      <title>Electronics - Walmart.com</title>
      <!-- omitted for brevity ... -->

Great! That is the HTML of the regular Walmart “keyboard” product page.

Step #4: Add the Data Scraping Logic

curl_cffi is just an HTTP client that helps you retrieve the HTML of a page. If you want to perform web scraping, you will also need a library for HTML parsing like BeautifulSoup. For more guidance, refer to our guide on BeautifulSoup web scraping.

In the activated virtual environment, install BeautifulSoup:

pip install beautifulsoup4

Import it in scraper.py:

from bs4 import BeautifulSoup

Then, use it to parse the HTML of the page:

soup = BeautifulSoup(response.text, "html.parser")

"html.parser" is the default HTML parser from Python’s standard library used by BeautifulSoup for parsing the HTML string. Now, soup contains all the methods you need to select HTML elements on the page and extract data from them.

In this example, as data parsing is not what matters most, we will scrape only the page title. You can select it through a CSS selector using the find() method and then access its text with the text attribute:

title_element = soup.find("title")
title = title_element.text

For more advanced scraping logic, refer to our guide on how to scrape Walmart.

Finally, print the page title:

print(title)

Awesome! You implemented basic web scraping logic.

Step #5: Put It All Together

This is your final curl_cffi web scraping script:

from curl_cffi import requests
from bs4 import BeautifulSoup

# Send a GET request to the Walmart search page for "keyboard"
response = requests.get("https://www.walmart.com/search?q=keyboard", impersonate="chrome")

# Extract the HTML from the page
html = response.text

# Parse the response content with BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")

# Find the title tag using a CSS selector and print it
title_element = soup.find("title")
# Extract data from it
title = title_element.text

# More complex scraping logic...

# Print the scraped data
print(title)

Launch it with the following command:

python3 scraper.py

Or, equivalently, on Windows:

python scraper.py

The result will be:

Electronics - Walmart.com

If you remove the impersonate="chrome" argument, you will get instead:

Robot or human?

This demonstrates how browser impersonation makes all the difference when it comes to avoiding anti-scraping measures.

Mission complete!

`curl_cffi` : Advanced Usage

Now that you know how the library works, you are ready to explore some more advanced scenarios.

Browser Impersonation Selection

curl_cffi supports impersonating several browsers. Each browser is associated with a unique label that you can pass to the impersonate argument as below:

response = requests.get("<YOUR_URL>", impersonate="<BROWSER_LABEL>")

Here are the labels for the supported browsers:

chrome99, chrome100, chrome101, chrome104, chrome107, chrome110, chrome116, chrome119, chrome120, chrome123, chrome124, chrome131
chrome99_android, chrome131_android
edge99, edge101
safari15_3, safari15_5, safari17_0, safari17_2_ios, safari18_0, safari18_0_ios

Notes:

To always impersonate the latest browser versions, you can simply use chrome, safari and safari_ios.
Firefox is currently not available, as only WebKit-based browsers are supported.
Browser versions are added only when their fingerprints change. If a version, such as chrome122, is skipped, you can still impersonate it by using the headers of the previous version.
For non-browser targets, use ja3, akamai, and similar arguments to specify your own custom TLS fingerprints. For details, refer to the documentation on impersonation.

Session Management

Just like the requests library, curl-cfii supports sessions. Session objects allow you to persist certain parameters across multiple requests, such as cookies, headers, or other session-specific data.

This is how you can define a session using the Python bindings for the cURL Impersonate library:

# Create a new session
session = requests.Session()

# This endpoint sets a cookie on the server
session.get("https://httpbin.io/cookies/set/userId/5", impersonate="chrome")

# Print the session's cookies to confirm they are being stored
print(session.cookies)

The output of the above script will be:

<Cookies[<Cookie userId=5 for httpbin.org />]>

The result proves that the session is maintaining state across requests, such as storing cookies defined by the server.

Proxy Integration

Just like the requests library, curl_cffi supports proxy integration through a proxies object:

# Define your proxy URL
proxy = "YOUR_PROXY_URL"

# Create a dictionary of proxies for HTTP and HTTPS
proxies = {"http": proxy, "https": proxy}

# Make a request using a proxy and browser impersonation
response = requests.get("<YOUR_URL>", impersonate="chrome", proxies=proxies)

Since the underlying API are very similar to requests, refer to our guide on how to use a proxy in Requests.

Async API

curl_cffi supports asynchronous requests through asyncio via the AsyncSession object:

from curl_cffi.requests import AsyncSession
import asyncio

# Define an async function to execute the asynchronous code
async def fetch_data():
    async with AsyncSession() as session:
        # Perform the asynchronous GET request
        response = await session.get("https://httpbin.org/anything", impersonate="chrome")
        # Print the response text
        print(response.text)

# Run the async function
asyncio.run(fetch_data())

Using AsyncSession makes it easier to handle multiple asynchronous requests efficiently, which is vital for speeding up web scraping.

WebScokets Connection

curl_cffi also supports WebSockets through the WebSocket class:

from curl_cffi.requests import WebSocket


# Define a callback function to handle incoming messages
def on_message(ws, message):
    print(message)

# Initialize the WebSocket connection with the callback
ws = WebSocket(on_message=on_message)

# Connect to a sample WebSocket server and listen for messages
ws.run_forever("wss://api.gemini.com/v1/marketdata/BTCUSD")

This is particularly useful for scraping real-time data from sites or APIs that use WebSocket to populate data dynamically. Some examples are sites with financial market data, live sports scores, or live chats.

Instead of scraping rendered pages, you can directly target the WebSocket channel for efficient data retrieval.

Note: You can use WebSockets asynchronously thanks to the AsyncWebSocket class.

curl_cffi vs Requests vs AIOHTTP vs HTTPX for Web Scraping

Below is a summary table to compare curl_cffi with other popular Python HTTP clients for web scraping:

Feature	curl_cffi	Requests	AIOHTTP	HTTPX
Sync API	✔️	✔️	❌	✔️
Async API	✔️	❌	✔️	✔️
Support for `WebSocket`s	✔️	❌	✔️	❌
Connection pooling	✔️	✔️	✔️	✔️
Support for HTTP/2	✔️	❌	❌	✔️
`User-Agent` customization	✔️	✔️	✔️	✔️
TLS fingerprint spoofing	✔️	❌	❌	❌
Speed	High	Medium	High	Medium
Retry mechanism	❌	Available via `HTTPAdapter`s	Available only via a third-party library	Available via built-in `Transport`s
Proxy integration	✔️	✔️	✔️	✔️
Cookie handling	✔️	✔️	✔️	✔️

`curl_cffi` Alternatives for Web Scraping

curl_cffi involves a manual approach to web scraping, where you need to write most of the code yourself. While suitable for simple static websites, that is prone to challenges from when targeting dynamic or more secure sites.

Bright Data provides a range of curl_cffi alternatives for web scraping:

Scraping Browser API: Fully managed cloud browser instances integrated with Puppeteer, Selenium, and Playwright. These browsers offer built-in CAPTCHA solving and automated proxy rotation, bypassing anti-bot defenses while interacting with websites like real users.
Web Scraper APIs: Pre-configured endpoints for retrieving fresh, structured data from over 100 popular domains. These APIs are ethical and compliant, allowing easy data extraction using HTTPX or any other HTTP client.
No-Code Scraper: An intuitive, on-demand data collection service that eliminates coding. It offers control, scalability, and flexibility without dealing with infrastructure, proxies, or anti-scraping hurdles.
Datasets: Access pre-built datasets from various websites or customize data collections to fit your requirements.

These solutions simplify scraping by offering robust, scalable, and compliant data extraction tools that reduce manual effort.

Conclusion

In this article, you discovered how to use the curl_cffi library for web scraping. You explored its purpose, key features, and advantages. This HTTP client excels as a fast and dependable option for making requests that mimic real browsers.

However, automated HTTP requests can expose your public IP address, potentially revealing your identity and location, which poses a privacy risk. To protect your security and anonymity, one of the most effective solutions is to use a proxy server to hide your IP address.

Bright Data controls the best proxy servers in the world, serving Fortune 500 companies and more than 20,000 customers. Its offer includes a wide range of proxy types:

Datacenter proxies – Over 770,000 datacenter IPs.
Residential proxies – Over 72M residential IPs in more than 195 countries.
ISP proxies – Over 700,000 ISP IPs.
Mobile proxies – Over 7M mobile IPs.

Create a free Bright Data account today to test our proxies and scraping solutions!

Start free trial

Start free with Google

Antonello Zanini

Technical Writer

5.5 years experience

Antonello Zanini is a technical writer, editor, and software engineer with 5M+ views. Expert in technical content strategy, web development, and project management.

Expertise

Web Development Web Scraping AI Integration

View all articles

Using curl_cffi for Web Scraping in Python

What Is `curl_cffi`?

How It Works

How to Use `curl_cffi` for Web Scraping: Step-By-Step Guide

Step #1: Project Setup

Step #2: Install `curl_cffi`

Step #3: Connect to the Target Page

Step #4: Add the Data Scraping Logic

Step #5: Put It All Together

`curl_cffi` : Advanced Usage

Browser Impersonation Selection

Session Management

Proxy Integration

Async API

WebScokets Connection

curl_cffi vs Requests vs AIOHTTP vs HTTPX for Web Scraping

`curl_cffi` Alternatives for Web Scraping

Conclusion

Antonello Zanini

Expertise

Dedicated Scraper APIs & No-Code Scrapers

Just want data? Skip scraping.

You might also be interested in

Best Web Scraping Methods for JavaScript-Heavy Sites

Crawl4AI vs Firecrawl: Detailed Comparison 2025

Using LlamaIndex and Bright Data for Web Search

Using curl_cffi for Web Scraping in Python

What Is curl_cffi?

How It Works

How to Use curl_cffi for Web Scraping: Step-By-Step Guide

Step #1: Project Setup

Step #2: Install curl_cffi

Step #3: Connect to the Target Page

Step #4: Add the Data Scraping Logic

Step #5: Put It All Together

curl_cffi : Advanced Usage

Browser Impersonation Selection

Session Management

Proxy Integration

Async API

WebScokets Connection

curl_cffi vs Requests vs AIOHTTP vs HTTPX for Web Scraping

curl_cffi Alternatives for Web Scraping

Conclusion

Antonello Zanini

Expertise

Dedicated Scraper APIs & No-Code Scrapers

Just want data? Skip scraping.

You might also be interested in

Best Web Scraping Methods for JavaScript-Heavy Sites

Crawl4AI vs Firecrawl: Detailed Comparison 2025

Using LlamaIndex and Bright Data for Web Search

What Is `curl_cffi`?

How to Use `curl_cffi` for Web Scraping: Step-By-Step Guide

Step #2: Install `curl_cffi`

`curl_cffi` : Advanced Usage

`curl_cffi` Alternatives for Web Scraping