This Playwright Stealth tutorial will cover:
- What bot detection is and why it poses a problem for Playwright.
- What Playwright Stealth is.
- How to use it in both Python and JavaScript to avoid getting blocked.
Let’s dive in!
Bot Detection as The Biggest Limitation of Playwright
Playwright is one of the most popular Python libraries for browser automation. In detail, it is reliable and widely used because it is developed and maintained directly by Microsoft. Its high-level and intuitive API makes it easy to control headless or headed browsers in different programming languages. That means Playwright is a great tool for cross-browser and cross-platform bot development, automated testing, and web scraping.
Playwright is one of the most popular Python libraries for browser automation. In detail, it is reliable and widely used because it is developed and maintained directly by Microsoft. Its high-level and intuitive API makes it easy to control headless or headed browsers in different programming languages. That means Playwright is a great tool for cross-browser and cross-platform bot development, automated testing, and web scraping.
The main issue with the library is that it can be easily detected and blocked by anti-bot technologies, especially when using browsers in headless mode. How is that possible? Well, Playwright automatically changes the value of special properties and headers when controlling headless browsers. For example, it sets the navigator.webdriver
Chrome setting to true
.
Bot detection solutions are aware of those configurations and analyze them to verify whether the current user is a human or a bot. When these mechanisms detect any suspicious settings, they categorize the user as a bot and block it right away.
For example, consider this bot detection test for headless mode. Visit the page in your browser, and you will see:
Perfect, that is the result you would expect!
Now, try visiting the same page in Playwright vanilla and extract the answer from the page:
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
# launch the browser
browser = await p.chromium.launch()
# open a new page
page = await browser.new_page()
# visit the target page
await page.goto("https://arh.antoinevastel.com/bots/areyouheadless")
# extract the answer contained on the page
answer_element = page.locator("#res")
answer = await answer_element.text_content()
# print the resulting answer
print(f'The result is: "{answer}"')
# close the browser and release its resources
await browser.close()
asyncio.run(main())
Execute the Python program, and it will print:
The result is: "You are Chrome headless"
That means that the bot automation test page has been able to detect the request made by your automated script as coming from a headless browser.
In other words, Playwright is a limited tool that can be easily stopped by bot detection technologies. To avoid that, you can manually override default configurations and hope for a win. Otherwise, install the Playwright Stealth plugin!
Playwright Stealth Plugin: What It Is and How It Works
playwright-stealth
is a Python package that extends Playwright by overriding specific configurations to avoid bot detection. Playwright Stealth is a port of the puppeteer-extra-plugin-stealth
npm package, which uses built-in evasion modules to avoid leaks and change properties that expose automated browsers as bots. For example, it deletes the navigator.webdriver
property and removes “HeadlessChrome” from the User-Agent header set by Chrome in headless mode by default.
The objective of the Stealth plugin is to enable an automated headless browser instance to successfully pass all bot detection tests on sannysoft.com. At the time of writing, that objective has been met. However, as mentioned in the official documentation, there are still methods for detecting headless browsers. So, what works today, may not work tomorrow. Bypassing all bot detection mechanisms is not entirely achievable, but the library aims to make this process as challenging as possible.
How To Use Playwright Stealth to Avoid Bot Detection
Follow the steps below to learn how to integrate Playwright Stealth into a playwright
Python script to avoid getting blocked.
Step 1: Set Up a Playwright Python Project
Disclaimer: If you already have a Playwright
Python project in place, you can skip this step.
First, make sure you have Python 3 installed on your machine. Otherwise, download the installer, execute it, and follow the installation wizard.
Next, use the commands below to set up a Python project called playwright-demo
:
mkdir playwright-demo
cd playwright-demo
These commands create the playwright-demo
folder and enter it in the terminal.
Initialize a Python virtual environment and activate it:
python -m venv env
env/Scripts/activate
Launch the following command to install Playwright:
pip install playwright
This will take a while, so be patient.
After that, install the required browsers with:
playwright install
Open the project folder in the Python IDE of your choice and create an index.py
file. Initialize it with the following lines:
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
# browser automation logic...
await browser.close()
asyncio.run(main())
The above script launches an instance of Chromium in headless mode, opens a new page, and finally closes the browser. This is what a basic Playwright Python script looks like.
To execute it, run:
python index.py
Great, you now have a Playwright project ready to be extended with the Stealth Plugin!
Step 2: Install and Use the Stealth Plugin
Install the Playwright Stealth plugin with:
pip install playwright-stealth
Open your index.py
file and add the import below to your Playwright script:
from playwright_stealth import stealth_async
Or if you are using the sync API:
from playwright_stealth import stealth_sync
To register it in Playwright, pass the page
object to the imported function as follows:
await stealth_async(page)
Or if you are using the sync API:
stealth_async(page)
The stealth_async()
function will extend page
by overriding some default configurations to avoid bot detection.
Fantastic! It only remains to visit the target page and repeat the test.
Step 3: Put It All Together
Integrate the Stealth plugin into the Playwright script presented at the beginning of the article:
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
async def main():
async with async_playwright() as p:
# launch the browser
browser = await p.chromium.launch()
# open a new page
page = await browser.new_page()
# register the Playwright Stealth plugin
await stealth_async(page)
# visit the target page
await page.goto("https://arh.antoinevastel.com/bots/areyouheadless")
# extract the message contained on the page
message_element = page.locator("#res")
message = await message_element.text_content()
# print the resulting message
print(f'The result is: "{message}"')
# close the browser and release its resources
await browser.close()
asyncio.run(main())
Execute it again, and this time it will print:
The result is: "You are not Chrome headless"
Et voilà! The target page, equipped with bot detection capabilities, can no longer flag your Playwright automated script as a bot.
Well done! You have now mastered the art of Playwright Stealth, and no bot detection technology can intimidate you any longer.
Extra: Playwright Stealth in JavaScript
If you are a Playwright JavaScript user and want to achieve the same result, you need to use the puppeteer-extra-plugin-stealth
npm package. This works for both Puppeteer Extra and Playwright Extra. If you are not familiar with these projects, they are essentially enhanced versions of the two browser automation libraries. Specifically, they add extension functionality via plugins to Puppeteer and Playwright, respectively.
Thus, suppose you have the following Playwright JavaScript script and want to integrate it with the Stealth plugin:
import { chromium } from "playwright"
(async () => {
// set up the browser and launch it
const browser = await chromium.launch()
// open a new blank page
const page = await browser.newPage()
// navigate the page to the target page
await page.goto("https://arh.antoinevastel.com/bots/areyouheadless")
// extract the message contained on the page
const messageElement = page.locator('#res')
const message = await messageElement.textContent()
// print the resulting message
console.log(`The result is: "${message}"`)
// close the browser and release its resources
await browser.close()
})()
First, install playwright-extra
and puppeteer-extra-plugin-stealth
:
npm install playwright-extra puppeteer-extra-plugin-stealth
Next, import chromium
from playwright-extra
instead of playwright
and import StealthPlugin
from puppeteer-extra-plugin-stealth
:
import { chromium } from "playwright-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"
Then, register the Stealth Plugin with:
chromium.use(StealthPlugin())
Put it all together, and you will get:
import { chromium } from "playwright-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"
(async () => {
// configure the Stealth plugin
chromium.use(StealthPlugin())
// set up the browser and launch it
const browser = await chromium.launch()
// open a new blank page
const page = await browser.newPage()
// navigate the page to the target page
await page.goto("https://arh.antoinevastel.com/bots/areyouheadless")
// extract the message contained on the page
const messageElement = page.locator('#res')
const message = await messageElement.textContent()
// print the resulting message
console.log(`The result is: "${message}"`)
// close the browser and release its resources
await browser.close()
})()
Awesome! You just integrated the Stealth plugin into Playwright in JavaScript.
Conclusion
In this guide, you understood why bot detection poses a challenge for Playwright and how to address it. Thanks to the Python library Playwright Stealth, you can enhance the default browser configuration to circumvent bot detection. As proved here, a similar approach can also be applied in JavaScript.
Regardless of how sophisticated your browser automation script in Playwright is, advanced bot detection systems will still represent a problem. While you could consider using another browser automation package, the root cause of detection lies within the browser, not the library itself. The solution lies in a scalable browser with anti-bot bypass functionality that seamlessly integrates with any browser automation library. Such a browser exists and is known as Scraping Browser!
Bright Data’s Scraping Browser is a highly scalable cloud-based browser compatible with Playwright, Puppeteer, and Selenium. It automatically rotates the exit IP at each request and can handle browser fingerprinting, automatic retries, and CAPTCHA resolution for you. These features are possible through the proxy-based unlocking features it relies on.
Bright Data’s proxies are used by Fortune 500 companies and over 20,000 customers. This reliable worldwide proxy network involves:
- Datacenter proxies – Over 770,000 datacenter IPs.
- Residential proxies – Over 72M residential IPs in more than 195 countries.
- ISP proxies – Over 700,000 ISP IPs.
- Mobile proxies – Over 7M mobile IPs.
No credit card required