Avoid Getting Blocked With Puppeteer Stealth

Learn how to integrate Puppeteer Stealth into a puppeteer scraping script to avoid getting blocked.
7 min read
Avoid Getting Blocked With Puppeteer Stealth

This Puppeteer Stealth tutorial will cover:

  • What bot detection is and why it is a problem for Puppeteer.
  • What Puppeteer Extra is.
  • How to use the Puppeteer Extra Stealth Plugin to avoid blocks.

Bot Detection: The Biggest Enemy of Puppeteer

Puppeteer is one of the most widely used JavaScript libraries for browser automation. It is so popular because it is backed by the Chrome team at Google. Its high-level API allows you to control headless or headed browsers over the DevTools Protocol, making it a great tool for web scraping, automated testing, and bot development. 

However, Puppeteer can easily be stopped by bot detection technologies. That is particularly true when using Chrome/Chromium in headless mode. Why? Because Puppeteer automatically sets default properties and headers that make the browser under control appear as a headless instance. For example, it sets the following Chrome setting: navigator.webdriver: true.

Anti-bot solutions know that and analyze those settings to determine whether the current user is a human or a bot. When they find some suspicious configurations, they mark the user as a bot.

For example, consider this headless mode bot detection test. If you open the test page in your browser, you will see:

 If you open the test page in your browser you will see that you are not Chrome headless

Now, try visiting that site with Puppeteer vanilla and extract the test result from it:

import puppeteer from "puppeteer"

(async () => {
    // set up the browser and launch it
    const browser = await puppeteer.launch()

    // open a new blank page
    const page = await browser.newPage()

    // navigate the page to the target page
    await page.goto("https://arh.antoinevastel.com/bots/areyouheadless")

    // extract the message of the test result
    const resultElement = await page.$("#res")
    const message = await resultElement.evaluate(e => e.textContent)

    // print the resulting message
    console.log(`The result of the test is "%s"`, message);

    // close the current browser session
    await browser.close()
})()

Launch the above script, and you will see:

The result of the test is "You are Chrome headless"

That means that the test failed, as the page was able to detect the automatic request as coming from a headless browser.

By default, Puppeteer is a limited tool. To avoid bot detection, you would have to manually tweak it and override default configurations. Avoid all that with Puppeteer Extra!

Puppeteer Extra: An Extendable Version of Puppeteer

Puppeteer Extra is a lightweight wrapper built around Puppeteer that extends it with plugin support. In other words, puppeteer-extra is a drop-in replacement for puppeteer. In addition to working just like the popular browser automation library, it provides the use() method to register plugins.

Each plugin adds extra functionality to Puppeteer. Some of the most useful plugins available are:

  • puppeteer-extra-plugin-stealth: To make it harder for bot detection technology to detect headless browser instances.
  • puppeteer-extra-plugin-recaptcha: To solve reCAPTCHAs and hCaptchas automatically.
  • puppeteer-extra-plugin-adblocker: To remove ads and trackers, reducing bandwidth and load times as a result.
  • puppeteer-extra-plugin-devtools: To make browser debugging possible from anywhere by creating a secure tunnel to the DevTools.
  • puppeteer-extra-plugin-repl: To make debugging fun with an interactive REPL (Read-Eval-Print-Loop) interface.
  • puppeteer-extra-plugin-block-resources: To block dynamically page resources such as images, media, CSS, and JS files.
  • puppeteer-extra-plugin-anonymize-ua: To anonymize the User-Agent header on page navigation. Learn about why this is important in our guide on User-Agent for web scraping.
  • puppeteer-extra-plugin-user-preferences: To set custom Chrome/Chromium user preferences.

Let’s now dig deeper into the Puppeteer Stealth plugin.

What the Puppeteer Extra Stealth Plugin Is and What It Does 

puppeteer-extra-plugin-stealth is a plugin for Puppeteer Extra that includes a set of configurations to avoid bot detection. In detail, Puppeteer Stealth relies on built-in evasion modules that overwrite Puppeteer’s leaks and properties that expose it as a bot. For example, it removes “HeadlessChrome” from the User-Agent header and deletes the navigator.webdriver property sets by Puppeteer by default.

The goal of the Puppeteer Extra Stealth plugin is to make a headless Chromium instance controlled via Puppeteer pass all bot detection tests on sannysoft.com. As of this writing, it achieves its goal. At the same time, as stated in the official documentation, there are still ways to detect headless Chromium. This means it is impossible to bypass all bot detection mechanisms, but the idea of the project is to make that process as hard as possible.

How To Use Puppeteer Stealth to Avoid Bot Detection While Scraping Web Pages

Time to see how to integrate Puppeteer Stealth into a puppeteer scraping script to avoid getting blocked.

Follow the steps below!

Step 1: Install Puppeteer Extra and the Stealth Plugin

Launch the command below to add Puppeteer Extra and the Puppeteer Stealth plugin to your project’s dependencies:

npm install puppeteer-extra puppeteer-extra-plugin-stealth

Great! You just met the prerequisites to integrate the Stealth plugin into your Puppeteer automated script.

Step 2: Set up Puppeteer Extra and Register the Stealth Plugin

First, replace the puppeteer import statement with this instruction:

import puppeteer from "puppeteer-extra"

In other terms, make sure to import the puppeteer object from "puppeteer-extra``" and not "puppeteer``".

Then, import StealthPlugin from puppeteer-extra-plugin-stealth:

import StealthPlugin from "puppeteer-extra-plugin-stealth"

If you are instead a CommonJS user, you will need:

const puppeteer = require("puppeteer-extra")
const StealthPlugin = require("puppeteer-extra-plugin-stealth")

Next, register the Stealth plugin by passing it to the puppeteer object via the use() method:

puppeteer.use(StealthPlugin())

Awesome! You just added the default evasion capabilities supported by the plugin to Puppeteer.

Note that the StealthPlugin() constructor accepts an optional object with the set of strings corresponding to the evasions to enable:

// enable only a few evasion techniques
puppeteer.use(StealthPlugin({
    enabledEvasions: new Set(["chrome.app", "chrome.csi", "defaultArgs", "navigator.plugins"])
}))

Otherwise, use the logic below to dynamically remove a specific evasion strategy from the Stealth plugin:

const stealthPlugin = StealthPlugin()
puppeteer.use(stealthPlugin)

// ...

// remove the "user-agent-override" evasion method
pluginStealth.enabledEvasions.delete("user-agent-override")

Step 3: Put It All Together

Integrate the Puppeteer Extra and its Stealth plugin into the script you saw at the beginning of the article:

import puppeteer from "puppeteer-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"

(async () => {
    // configure the stealth plugin
    puppeteer.use(StealthPlugin())
    // set up the browser and launch it
    const browser = await puppeteer.launch()

    // open a new blank page
    const page = await browser.newPage()

    // navigate the page to the target page
    await page.goto("https://arh.antoinevastel.com/bots/areyouheadless")

    // extract the message of the test result
    const resultElement = await page.$("#res")
    const message = await resultElement.evaluate(e => e.textContent)

    // print the resulting message
    console.log(`The result of the test is "%s"`, message);

    // close the current browser session
    await browser.close()
})()

Run this snippet, and it will now print:

The result of the test is "You are not Chrome headless"

Et voilà! The selected page with bot detection capabilities is no longer able to mark your Puppeteer automated script as a bot.

Congratulations! You are now a Puppeteer Stealth ninja, and no bot detection technology will scare you anymore.

Conclusion

In this article, you understood why bot detection is a challenge for Puppeteer and how to deal with it. Thanks to Puppeteer Extra, you can extend Puppeteer’s functionality with plugins. In particular, the Stealth plugin is a great ally for obviating bot detection, and here you learned how to use it.

No matter how sophisticated your Puppeteer Extra is, advanced anti-bot technologies like Cloudflare will still be able to spot and block your scripts. You could opt for another browser automation package but the cause of detection is the browser, not the library. The solution is a scalable browser with anti-bot bypass functionality that can be integrated with any browser automation library. That browser exists and is called Scraping Browser!

Bright Data’s Scraping Browser is a highly scalable cloud browser that works with Puppeteer, Playwright, Selenium, and more. It automatically rotates the exit IP at each request and can handle browser fingerprinting, CAPTCHA resolution, and automated retries for you. That is possible thanks to the proxy-based unlocking features it relies on.

Bright Data’s proxies are used by Fortune 500 companies and over 20,000 customers. This reliable worldwide proxy network involves:

Talk to one of our sales reps and see which of Bright Data’s products best suits your needs.