In this guide, you will learn:
- What CAPTCHAs are and whether you can bypass them
- How to use Puppeteer to bypass CAPTCHAs via a step-by-step tutorial
- What to do if the process with Puppeteer does not work
Let’s dive in!
What Are CAPTCHAs? And Can You Bypass Them?
A CAPTCHA (Completely Automated Public Turing tests to tell Computers and Humans Apart) is a challenge-response test that distinguishes humans from automated bots. To accomplish their goals, CAPTCHAs are designed to be easily solved by humans while difficult to software.
Popular CAPTCHA providers include Google reCAPTCHA, hCaptcha, and BotDetect and common CAPTCHA types are:
- Text-based: In these challenges, users have to recognize letters and numbers and type them.
Image-based: These tests require users to identify specific objects in a grid of images by selecting the right images. - Audio-based: In this type, users have to write the letters they hear.
- Puzzle challenges: This type of challenge requires users to solve a simple puzzle by sliding a piece into the dedicated place.
CAPTCHAs are designed so that they hard to be bypassed by automated software and bots. So, what you can do is integrate your software with CAPTCHA-solving libraries or services that rely on human operators to automate these challenges and solve them.
However, hard-coded CAPTCHAs are not common because they have a negative impact on the overall user experience on the website. For this reason, it is more common that CAPTCHAs are used as parts of broader anti-bot solutions, such as WAFs (Web Application Firewalls):
In these cases, the system dynamically displays a CAPTCHA when there is the suspect that a bot is doing some activity on the website. To bypass these CAPTCHAs, you have to develop a bot that mimics human behavior. While this can be done, it requires a lot of effort, particularly because you need to update your scripts frequently to stay ahead of new bot detection techniques and methods.
The good news is that there is a more effective solution to bypass CAPTCHAs: Bright Data’s CAPTCHA Solver! This always up-to-date tool solves all your problems related to bypassing CAPTCHAs without any headaches.
How to Bypass CAPTCHAs With Puppeteer: Step-By-Step Tutorial
Now it is time to create an automated script that mimics human behavior to bypass CAPTCHAs.
To do so, you can use Puppeteer: a JavaScript library that provides a high-level API that controls web browsers and, thus, can be used to mimic human behaviors.
Let’s get started!
Step #1: Project Setup
Suppose you call the main folder of your project bypass_captcha_puppeteer
. Here is the structure the repository should have:
bypass_captcha_puppeteer/
├── index.js
└── package.json
You can create with:
mkdir bypass_captcha_puppeteer
Then, enter the project folder and launch npm init
to initialize a Node.js application:
cd bypass_captcha_puppeteer
npm init -y
Next, create an index.js
file inside it.
Install Puppeteer as below:
npm install puppeteer
Step #2: Use ESM Javascript Notation
To use ECMAScript Modules notation in Javascript, the package.json
file must have the "type": "module"
option.
Here is how the package.json
file should look like:
{
"name": "bypass_captcha_puppeteer",
"version": "1.0.0",
"description": "",
"main": "index.js",
"type": "module",
"scripts": {
"start": "node index.js"
},
"dependencies": {
"puppeteer": "^23.10.4"
}
}
Step #3: Try To Bypass CAPTCHA With Puppeteer
Write the following code into the index.js
file to see whether Puppeteer appears as a bot or not:
import puppeteer from 'puppeteer';
const visitBotAnalyzerPage = async () => {
try {
// initialize the browser
const browser = await puppeteer.launch();
// open a new browser page
const page = await browser.newPage();
// navigate to the target URL
const url = 'https://bot.sannysoft.com/';
console.log(`Navigating to ${url}...`);
await page.goto(url, { waitUntil: 'networkidle2' });
// save a full-page screenshot
console.log('Taking full-page screenshot...');
await page.screenshot({ path: 'anti-bot-analysis.png', fullPage: true });
console.log('Screenshot taken');
// close the browser
await browser.close();
console.log('Browser closed');
} catch (error) {
console.error('An error occurred:', error);
}
};
// run the script
visitBotAnalyzerPage();
Here is what this code does:
- Launches the browser: The
puppeteer.launch()
method starts a new browser instance with visible UI (headless: false
). - Opens a new browser page:
browser.newPage()
creates a new blank browser page where further actions can be performed. - Goes to target page: The method
page.goto()
redirects to the target page, which is Intoli.com tests, a page designed to understand whether a request comes from a bot or not. - Saves a screenshot of the results: The method
page.screenshot()
get a screenshot of the results and save it. - Closes browser and handles errors: The
browser.close()
closes the browser and intercepts eventual errors.
To run the code, type:
node index.js
You can now open the saved image. This is the expected result:
So, Puppeteer has not passed a few tests, as the image shows. Consequently, WAFs will be likely to show CAPTCHAs when interacting with pages with Puppeteer.
To solve these issues, let’s use the Puppeteer Stealth!
Step #4: Install The Stealth Plugin
Puppeteer Extra is a lightweight wrapper around Puppeteer that, among other things, allows you to install the Stealth plugin that prevents bot detection by overriding several configurations to make the browser instance appear to be natural and “human-like.”
Install these libraries like so:
npm install puppeteer-extra puppeteer-extra-plugin-stealth
Import Puppeteer from puppeteer-extra
instead of puppeteer
:
import puppeteer from 'puppeteer-extra';
Fantastic! You are ready to use the Stealth Plugin to try to avoid CAPCHAs with Puppeteer.
Step #5: Repeat the Test With the Stealth Plugin
Now you have to implement the Stealth plugin with this line of code:
puppeteer.use(StealthPlugin()).
So, the code becomes:
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
// Add the stealth plugin to Puppeteer
puppeteer.use(StealthPlugin());
const visitBotAnalyzerPage = async () => {
try {
// launch the browser with stealth settings
const browser = await puppeteer.launch();
console.log('Launching browser in stealth mode...');
// open a new page
const page = await browser.newPage();
// navigate to the target page
const url = 'https://bot.sannysoft.com/';
console.log(`Navigating to ${url}...`);
await page.goto(url, { waitUntil: 'networkidle2' });
// save the screenshot of the entire page
console.log('Taking full-page screenshot...');
await page.screenshot({ path: 'anti-bot-analysis.png', fullPage: true });
console.log(`Screenshot taken`);
// close the browser
await browser.close();
console.log('Browser closed. Script completed successfully');
} catch (error) {
console.error('Error occurred:', error);
}
};
// run the script
visitBotAnalyzerPage();
Now, when you run the code again with:
node index.js
The expected result is:
Hooray! The script now passes the bot detection tests, which means you are less likely to receive CAPTCHAs with Puppeteer!
What To Do The Above Procedure to Bypass CAPTCHAs With Puppeteer Does Not Work
Unfortunately, Puppeteer Extra is not always gold. The reason is that browser settings are not the only way anti-bots focus their attention on blocking automated software.
For example, user agent is another factor used by anti-bot systems to block automated software. To solve this issue, you can use the library puppeteer-extra-plugin-anonymize-ua
which anonymizes the user agent.
However, the approach based on plugins described before works only against basic anti-bot measures: when dealing with more complex tools like Cloudflare, you need something more powerful.
So…Are you looking for a real Playwright CAPTCHA solver? Try Bright Data web scraping solutions!
These provide superior unlocking capabilities with a dedicated CAPTCHA-solving feature to automatically handle reCAPTCHA, hCaptcha, px_captcha, SimpleCaptcha, GeeTest CAPTCHA, FunCaptcha, Cloudflare Turnstile, AWS WAF Captcha, KeyCAPTCHA, and many others.
Integrating Bright Data’s CAPTCHA Solver into your scripts is easy, as it works with any HTTP client or browser automation tool.
Find out more about how to use Bright Data’s CAPTCHA Solver and check out the documentation for all integration and configuration details.
Conclusion
In this article, you learned why bypassing CAPTCHAs with Puppeteer can be challenging, and how to use the Stealth plugin to override the default browser configuration to circumvent bot detection.
The problem with that approach is that it works only in simple scenarios. Advanced bot detection systems can still identify you as a bot and block you.
So, when bypassing CAPTCHAs, the actual solution is to connect to your target page through an unlocking API that can seamlessly return the CAPTCHA-free HTML of any web page. This solution exists and is called Web Unlocker. Its goal is to automatically rotate the exit IP with each request via proxy integration, handle browser fingerprinting, automatic retries, and CAPTCHA resolution for you.
Sign up now to discover which of Bright Data’s scraping products best suit your needs.
Start with a free trial!
No credit card required