For years, Undetected Chromedriver has been a staple in secure browsing and anti-bot bypass. The developer behind Undetected Chromedriver has since created NODRIVER. With NODRIVER, you’re no longer dependent on Selenium or Webdriver. A simple pip install
and everything should be ready to go.
In this guide, you’ll learn:
- What is NODRIVER?
- How Is It Different From Other Headless Browsers?
- How To Use NODRIVER?
- What Are NODRIVER’s Limitations?
- How To Use NODRIVER With a Proxy?
- Solid Alternatives to NODRIVER
What Is NODRIVER and Why Should You Care?
What Exactly Is NODRIVER?
NODRIVER is the fully asynchronous successor to Undetected Chromedriver. Using “best practices” as the default for all kwargs, this thing has been designed to work right out of the box with only a small amount of code.
NODRIVER boasts the following features:
- Performance
- No External Dependencies (not even Chromedriver)
- Antibot Bypass
- Persistent Session cookies
- Fresh Browser Instance With Each Use
What Makes NODRIVER Different?
NODRIVER uses a radically different architecture from Undetected Chromedriver and even other headless browsers. Traditionally, these other browsers have depended on Selenium, or Chrome DevTools Protocol (CDP).
NODRIVER uses its own custom implementation of the DevTools protocol. In the documentation, it’s actually referred to as “chrome (-ish) automation library”. With NODRIVER, you’re not dependent on Selenium, nor are you directly dependent on CDP. NODRIVER uses a custom implementation of CDP. To use NODRIVER, all you need is pip and a Chrome-based browser.
Scraping With NODRIVER
1. Getting Started
Before you get started, you need to make sure you have Python and a browser installed. If you’re reading this article — I’m assuming you’ve already got these. You can install NODRIVER directly with pip.
2. Basic Structure
Our basic structure is really similar to what you’d get with Playwright or Puppeteer. If you’re interested in using Playwright in Python, you can view a full guide on scraping Amazon listings here. NODRIVER has a very similar feel to Playwright, but it’s still under heavy development.
Here’s our basic structure.
3. Getting A Page
As you probably noticed in our basic skeleton above, browser.get()
returns a page
object. You can even open multiple pages simultaneously. If you’re willing to get creative, you can make highly concurrent operations.
The snippet below is only theoretical.
4. Dynamic Content
To handle dynamic content, you get two options. You can use the .sleep() method to wait an arbitrary amount of time, or you can use .wait_for() to wait for a specific selector on the page.
NOTE: In the snippet above, I used tab
instead of page
as a variable name. These are interchangeable. They are both tab objects. You can learn more about tabs in NODRIVER here.
5. Finding Elements
NODRIVER gives us a variety of methods for finding elements on the page. It seems they’re in the midst of handling some legacy methods.
There are four different text-based methods for finding elements. Two of them will likely disappear in the future.
Like the methods above, there are also four selector-based methods for finding elements. Two of them will likely disappear. If the developers behind NODRIVER want to clearly align with CDP, the query_selector
methods will likely survive.
As you can see above, no matter how you want to find elements on the page, there are likely multiple ways to do it. In time, the developers behind NODRIVER might tighten this up. That said, at the moment, their parsing methods are like a swiss army chainsaw.
6. Extracting Their Data
NODRIVER offers a couple of methods to extract data. You can use the .attributes
trait to extract attributes directly — this isn’t very user-friendly — it returns an array, not a JSON object.
Here’s a hacky workaround I made to extract the href
from a link object. It’s ugly, but it works. I expect that the attributes
method will be replaced soon with something a bit more functional.
NOTE: Most other headless browsers contain a get_attribute()
method. However, this method isn’t working yet in NODRIVER.
Here’s how we extract text data. As you might notice, we don’t use await
here. I suspect this will change in the future to align with other CDP style browsers. In its current form, text
is just an attribute, not a method — await
will actually throw an error when used with attributes. This feels contrary to both Puppeteer and Playwright, but this is the current state of NODRIVER — still under heavy development.
7. Storing The Data
We’ll store our data inside a neat little JSON file. When extracting quotes, each quote has a list of tags and lists don’t do very well in CSV form.
8. Putting Everything Together
Now, let’s put all of these concepts together into a working script. In the example below, we use the concepts above to extract data from Qutoes to Scrape — a site built just for scraping tutorials. Copy and paste the code below to get a feel for how NODRIVER actually works.
If you run the script above, you’ll get a JSON file with objects like what you see below.
Current Limitations of NODRIVER
Currently, NODRIVER has some serious limitations that are worth noting. Let’s go over those.
Headless Mode
NODRIVER throws an error whenever we run it in headless mode. We are not sure if this is intentional (as an antibot bypass) or a legitimate issue.
Page Interactions
While NODRIVER has numerous page interactions listed in their docs, most of them either partially work — or don’t work at all. As you can see, this is documented in the screenshot below for both click_mouse()
and mouse_click()
.
Attribute Extraction
This biggest pain point with NODRIVER is the attribute extraction. As mentioned before, this outputs an array and it’s extremely archaic as you saw in our href
workaround. Here’s the literal output from attribute
. For production-level scraping, this needs to be addressed.
Proxy Usage With NODRIVER
Currently, proxy support for NODRIVER is limited at best. They do provide a create_context()
method for proxy connection.
The snippet below comes straight from their issues page. However, after hours of trying this and various other methods, I was still unable to connect.
If you look at their documentation, they have a section on proxies [1]. Even though there’s an official proxy section, there’s no actual documentation. We presume this will fixed in the near future.
Viable Alternatives
While it’s not currently ready for production use, I expect great things from NODRIVER in the future. If you’re looking for something more heavy duty, take a look at the browsers below.
- Selenium: Going strong since 2004. Selenium depends on Chromedriver, but it’s battle tested and production-ready. Learn more about Selenium web scraping.
- Playwright: Playwright feels like a polished, ready-to-go version of what you’ve seen in this tutorial with NODRIVER. Learn how to use Playwright for web scraping.
Conclusion
NODRIVER is an exciting new tool for browser automation, but rapid development means some features are still maturing. For large-scale, reliable web scraping, consider using robust solutions like:
- Residential Proxies: Real device connections to bypass geo-blocks.
- Web Unlocker: Managed proxies with built-in CAPTCHA solver.
- Scraping Browser: Remote browser automation with proxy and CAPTCHA support. The perfect solution for multi-step scraping projects.
- Custom Scraper: Run custom scraping jobs with no code required and expert-assisted data extraction.
Sign up for a free trial and get started today!
No credit card required