For years, Undetected Chromedriver has been a staple in secure browsing and anti-bot bypass. The developer behind Undetected Chromedriver has since created NODRIVER. With NODRIVER, you’re no longer dependent on Selenium or Webdriver. A simple pip install
and everything should be ready to go.
In this guide, you’ll learn:
- What is NODRIVER?
- How Is It Different From Other Headless Browsers?
- How To Use NODRIVER?
- What Are NODRIVER’s Limitations?
- How To Use NODRIVER With a Proxy?
- Solid Alternatives to NODRIVER
What Is NODRIVER and Why Should You Care?
What Exactly Is NODRIVER?
NODRIVER is the fully asynchronous successor to Undetected Chromedriver. Using “best practices” as the default for all kwargs, this thing has been designed to work right out of the box with only a small amount of code.
NODRIVER boasts the following features:
- Performance
- No External Dependencies (not even Chromedriver)
- Antibot Bypass
- Persistent Session cookies
- Fresh Browser Instance With Each Use
What Makes NODRIVER Different?
NODRIVER uses a radically different architecture from Undetected Chromedriver and even other headless browsers. Traditionally, these other browsers have depended on Selenium, or Chrome DevTools Protocol (CDP).
NODRIVER uses its own custom implementation of the DevTools protocol. In the documentation, it’s actually referred to as “chrome (-ish) automation library”. With NODRIVER, you’re not dependent on Selenium, nor are you directly dependent on CDP. NODRIVER uses a custom implementation of CDP. To use NODRIVER, all you need is pip and a Chrome-based browser.
Scraping With NODRIVER
1. Getting Started
Before you get started, you need to make sure you have Python and a browser installed. If you’re reading this article — I’m assuming you’ve already got these. You can install NODRIVER directly with pip.
pip install nodriver
2. Basic Structure
Our basic structure is really similar to what you’d get with Playwright or Puppeteer. If you’re interested in using Playwright in Python, you can view a full guide on scraping Amazon listings here. NODRIVER has a very similar feel to Playwright, but it’s still under heavy development.
Here’s our basic structure.
import nodriver
async def main():
#start the browser
browser = await nodriver.start()
base_url = "https://quotes.toscrape.com"
#navigate to a page
page = await browser.get(base_url)
###logic goes here###
#close the browser
await page.close()
if __name__ == '__main__':
#in their docs, they advise directly against asyncio.run()
nodriver.loop().run_until_complete(main())
3. Getting A Page
As you probably noticed in our basic skeleton above, browser.get()
returns a page
object. You can even open multiple pages simultaneously. If you’re willing to get creative, you can make highly concurrent operations.
The snippet below is only theoretical.
#navigate to a page
page_1 = await browser.get(base_url)
page_2 = await browser.get(a_different_url)
####do stuff with the different pages#####
4. Dynamic Content
To handle dynamic content, you get two options. You can use the .sleep() method to wait an arbitrary amount of time, or you can use .wait_for() to wait for a specific selector on the page.
#wait an arbitrary amount of time
await tab.sleep(1)
#wait for a specific element
await tab.wait_for("div[data-testid='some-value']")
NOTE: In the snippet above, I used tab
instead of page
as a variable name. These are interchangeable. They are both tab objects. You can learn more about tabs in NODRIVER here.
5. Finding Elements
NODRIVER gives us a variety of methods for finding elements on the page. It seems they’re in the midst of handling some legacy methods.
There are four different text-based methods for finding elements. Two of them will likely disappear in the future.
#find an element using its text
my_element = page.find("some text here")
#find a list of elements by their text
my_elements = page.find_all("some text here")
#find an element using its text
my_element = page.find_element_by_text("some text here")
#find a list of elements using their text
my_elements = page.find_element_by_text("some text here")
Like the methods above, there are also four selector-based methods for finding elements. Two of them will likely disappear. If the developers behind NODRIVER want to clearly align with CDP, the query_selector
methods will likely survive.
#find a single element using its css selector
my_element = page.select("div[class='your-classname']")
#find a list of elements using a css selector
my_elements = page.select_all("div[class='your-classname']")
#find a single element using its css selector
my_element = page.query_selector("div[class='your-classname']")
#find a list of elements using a css selector
my_elements = page.query_selector_all("div[class='your-classname']")
As you can see above, no matter how you want to find elements on the page, there are likely multiple ways to do it. In time, the developers behind NODRIVER might tighten this up. That said, at the moment, their parsing methods are like a swiss army chainsaw.
6. Extracting Their Data
NODRIVER offers a couple of methods to extract data. You can use the .attributes
trait to extract attributes directly — this isn’t very user-friendly — it returns an array, not a JSON object.
Here’s a hacky workaround I made to extract the href
from a link object. It’s ugly, but it works. I expect that the attributes
method will be replaced soon with something a bit more functional.
next_button = await page.select("li[class='next'] > a")
#this returns an array
attributes = next_button.attributes
#use array indexing to find the href object and its value
for i in range(len(attributes)):
if attributes[i] == "href":
next_url = attributes[i+1]
NOTE: Most other headless browsers contain a get_attribute()
method. However, this method isn’t working yet in NODRIVER.
Here’s how we extract text data. As you might notice, we don’t use await
here. I suspect this will change in the future to align with other CDP style browsers. In its current form, text
is just an attribute, not a method — await
will actually throw an error when used with attributes. This feels contrary to both Puppeteer and Playwright, but this is the current state of NODRIVER — still under heavy development.
#find the quote element
quote_element = await quote.query_selector("span[class='text']")
#extract its text
quote_text = quote_element.text
7. Storing The Data
We’ll store our data inside a neat little JSON file. When extracting quotes, each quote has a list of tags and lists don’t do very well in CSV form.
import json
with open("quotes.json", "w", encoding="utf-8") as f:
json.dump(scraped_data, f, ensure_ascii=False, indent=4)
8. Putting Everything Together
Now, let’s put all of these concepts together into a working script. In the example below, we use the concepts above to extract data from Qutoes to Scrape — a site built just for scraping tutorials. Copy and paste the code below to get a feel for how NODRIVER actually works.
import nodriver
import json
async def main():
#list to hold scraped data
scraped_data = []
browser = await nodriver.start()
next_url = "/"
base_url = "https://quotes.toscrape.com"
#while we still have urls to scrape
while next_url:
#go to the page
page = await browser.get(f"{base_url}{next_url}")
#find quote divs using a selector
quotes = await page.select_all("div[class='quote']")
#iterate through the quotes
for quote in quotes:
#find the quote element and extract its text
quote_element = await quote.query_selector("span[class='text']")
quote_text = quote_element.text
#find the author and extract the text
author_element = await quote.query_selector("small[class='author']")
author = author_element.text
#find the tag elements
tag_elements = await quote.query_selector_all("a[class='tag']")
tags = []
#iterate through the tags and extract their text
for tag_element in tag_elements:
text = tag_element.text
tags.append(text)
#add our extracted data to the list of scraped data
scraped_data.append({
"quote": quote_text,
"author": author,
"tags": tags
})
#check the page for a "next" button
next_button = await page.select("li[class='next'] > a")
#if it doesn't exist, close the browser and break the loop
if next_button == None:
await page.close()
next_url = None
#if it does, follow this block instead
else:
attributes = next_button.attributes
#loop through the attributes to find your desired attribute, its value is the next index
for i in range(len(attributes)):
if attributes[i] == "href":
next_url = attributes[i+1]
#write the data to a json file
with open("quotes.json", "w", encoding="utf-8") as f:
json.dump(scraped_data, f, ensure_ascii=False, indent=4)
if __name__ == '__main__':
nodriver.loop().run_until_complete(main())
If you run the script above, you’ll get a JSON file with objects like what you see below.
[
{
"quote": "“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”",
"author": "Albert Einstein",
"tags": [
"change",
"deep-thoughts",
"thinking",
"world"
]
},
{
"quote": "“It is our choices, Harry, that show what we truly are, far more than our abilities.”",
"author": "J.K. Rowling",
"tags": [
"abilities",
"choices"
]
},
Current Limitations of NODRIVER
Currently, NODRIVER has some serious limitations that are worth noting. Let’s go over those.
Headless Mode
NODRIVER throws an error whenever we run it in headless mode. We are not sure if this is intentional (as an antibot bypass) or a legitimate issue.
Page Interactions
While NODRIVER has numerous page interactions listed in their docs, most of them either partially work — or don’t work at all. As you can see, this is documented in the screenshot below for both click_mouse()
and mouse_click()
.
Attribute Extraction
This biggest pain point with NODRIVER is the attribute extraction. As mentioned before, this outputs an array and it’s extremely archaic as you saw in our href
workaround. Here’s the literal output from attribute
. For production-level scraping, this needs to be addressed.
Proxy Usage With NODRIVER
Currently, proxy support for NODRIVER is limited at best. They do provide a create_context()
method for proxy connection.
The snippet below comes straight from their issues page. However, after hours of trying this and various other methods, I was still unable to connect.
tab = await browser.create_context("https://www.google.nl", proxy_server='socks5://myuser:mypass@somehost')
# or add new_window=True if you would like a new window
If you look at their documentation, they have a section on proxies [1]. Even though there’s an official proxy section, there’s no actual documentation. We presume this will fixed in the near future.
Viable Alternatives
While it’s not currently ready for production use, I expect great things from NODRIVER in the future. If you’re looking for something more heavy duty, take a look at the browsers below.
- Selenium: Going strong since 2004. Selenium depends on Chromedriver, but it’s battle tested and production-ready. Learn more about Selenium web scraping.
- Playwright: Playwright feels like a polished, ready-to-go version of what you’ve seen in this tutorial with NODRIVER. Learn how to use Playwright for web scraping.
Conclusion
NODRIVER is an exciting new tool for browser automation, but rapid development means some features are still maturing. For large-scale, reliable web scraping, consider using robust solutions like:
- Residential Proxies: Real device connections to bypass geo-blocks.
- Web Unlocker: Managed proxies with built-in CAPTCHA solver.
- Scraping Browser: Remote browser automation with proxy and CAPTCHA support. The perfect solution for multi-step scraping projects.
- Custom Scraper: Run custom scraping jobs with no code required and expert-assisted data extraction.
Sign up for a free trial and get started today!
No credit card required