Web Scraping With NODRIVER in 2025

Master web scraping with NODRIVER. This guide covers setup, key features, and methods to streamline your data extraction workflows.
10 min read
web scraping with NODRIVER blog image

For years, Undetected Chromedriver has been a staple in secure browsing and anti-bot bypass. The developer behind Undetected Chromedriver has since created NODRIVER. With NODRIVER, you’re no longer dependent on Selenium or Webdriver. A simple pip install and everything should be ready to go.

In this guide, you’ll learn:

  • What is NODRIVER?
  • How Is It Different From Other Headless Browsers?
  • How To Use NODRIVER?
  • What Are NODRIVER’s Limitations?
  • How To Use NODRIVER With a Proxy?
  • Solid Alternatives to NODRIVER

What Is NODRIVER and Why Should You Care?

What Exactly Is NODRIVER?

NODRIVER is the fully asynchronous successor to Undetected Chromedriver. Using “best practices” as the default for all kwargs, this thing has been designed to work right out of the box with only a small amount of code.

NODRIVER boasts the following features:

  • Performance
  • No External Dependencies (not even Chromedriver)
  • Antibot Bypass
  • Persistent Session cookies
  • Fresh Browser Instance With Each Use

What Makes NODRIVER Different?

NODRIVER uses a radically different architecture from Undetected Chromedriver and even other headless browsers. Traditionally, these other browsers have depended on Selenium, or Chrome DevTools Protocol (CDP).

NODRIVER uses its own custom implementation of the DevTools protocol. In the documentation, it’s actually referred to as “chrome (-ish) automation library”. With NODRIVER, you’re not dependent on Selenium, nor are you directly dependent on CDP. NODRIVER uses a custom implementation of CDP. To use NODRIVER, all you need is pip and a Chrome-based browser.

Scraping With NODRIVER

1. Getting Started

Before you get started, you need to make sure you have Python and a browser installed. If you’re reading this article — I’m assuming you’ve already got these. You can install NODRIVER directly with pip.

pip install nodriver

2. Basic Structure

Our basic structure is really similar to what you’d get with Playwright or Puppeteer. If you’re interested in using Playwright in Python, you can view a full guide on scraping Amazon listings here. NODRIVER has a very similar feel to Playwright, but it’s still under heavy development.

Here’s our basic structure.

import nodriver

async def main():
    #start the browser
    browser = await nodriver.start()

    base_url = "https://quotes.toscrape.com"

    #navigate to a page
    page = await browser.get(base_url)

    ###logic goes here###

    #close the browser
    await page.close()

if __name__ == '__main__':

    #in their docs, they advise directly against asyncio.run()
    nodriver.loop().run_until_complete(main())

3. Getting A Page

As you probably noticed in our basic skeleton above, browser.get() returns a page object. You can even open multiple pages simultaneously. If you’re willing to get creative, you can make highly concurrent operations.

The snippet below is only theoretical.

#navigate to a page
page_1 = await browser.get(base_url)
page_2 = await browser.get(a_different_url)

####do stuff with the different pages#####

4. Dynamic Content

To handle dynamic content, you get two options. You can use the .sleep() method to wait an arbitrary amount of time, or you can use .wait_for() to wait for a specific selector on the page.

#wait an arbitrary amount of time
await tab.sleep(1)

#wait for a specific element
await tab.wait_for("div[data-testid='some-value']")

NOTE: In the snippet above, I used tab instead of page as a variable name. These are interchangeable. They are both tab objects. You can learn more about tabs in NODRIVER here.

5. Finding Elements

NODRIVER gives us a variety of methods for finding elements on the page. It seems they’re in the midst of handling some legacy methods.

There are four different text-based methods for finding elements. Two of them will likely disappear in the future.

#find an element using its text
my_element = page.find("some text here")

#find a list of elements by their text
my_elements = page.find_all("some text here")

#find an element using its text
my_element = page.find_element_by_text("some text here")

#find a list of elements using their text
my_elements = page.find_element_by_text("some text here")

Like the methods above, there are also four selector-based methods for finding elements. Two of them will likely disappear. If the developers behind NODRIVER want to clearly align with CDP, the query_selector methods will likely survive.

#find a single element using its css selector
my_element = page.select("div[class='your-classname']")

#find a list of elements using a css selector
my_elements = page.select_all("div[class='your-classname']")

#find a single element using its css selector
my_element = page.query_selector("div[class='your-classname']")

#find a list of elements using a css selector
my_elements = page.query_selector_all("div[class='your-classname']")

As you can see above, no matter how you want to find elements on the page, there are likely multiple ways to do it. In time, the developers behind NODRIVER might tighten this up. That said, at the moment, their parsing methods are like a swiss army chainsaw.

6. Extracting Their Data

NODRIVER offers a couple of methods to extract data. You can use the .attributes trait to extract attributes directly — this isn’t very user-friendly — it returns an array, not a JSON object.

Here’s a hacky workaround I made to extract the href from a link object. It’s ugly, but it works. I expect that the attributes method will be replaced soon with something a bit more functional.

next_button = await page.select("li[class='next'] > a")

#this returns an array
attributes = next_button.attributes

#use array indexing to find the href object and its value
for i in range(len(attributes)):
    if attributes[i] == "href":
        next_url = attributes[i+1]

NOTE: Most other headless browsers contain a get_attribute() method. However, this method isn’t working yet in NODRIVER.

Here’s how we extract text data. As you might notice, we don’t use await here. I suspect this will change in the future to align with other CDP style browsers. In its current form, text is just an attribute, not a method — await will actually throw an error when used with attributes. This feels contrary to both Puppeteer and Playwright, but this is the current state of NODRIVER — still under heavy development.

#find the quote element
quote_element = await quote.query_selector("span[class='text']")
#extract its text
quote_text = quote_element.text

7. Storing The Data

We’ll store our data inside a neat little JSON file. When extracting quotes, each quote has a list of tags and lists don’t do very well in CSV form.

import json

with open("quotes.json", "w", encoding="utf-8") as f:
    json.dump(scraped_data, f, ensure_ascii=False, indent=4)

8. Putting Everything Together

Now, let’s put all of these concepts together into a working script. In the example below, we use the concepts above to extract data from Qutoes to Scrape — a site built just for scraping tutorials. Copy and paste the code below to get a feel for how NODRIVER actually works.

import nodriver
import json

async def main():

    #list to hold scraped data
    scraped_data = []


    browser = await nodriver.start()

    next_url = "/"

    base_url = "https://quotes.toscrape.com"

    #while we still have urls to scrape
    while next_url:

        #go to the page
        page = await browser.get(f"{base_url}{next_url}")

        #find quote divs using a selector
        quotes = await page.select_all("div[class='quote']")

        #iterate through the quotes
        for quote in quotes:

            #find the quote element and extract its text
            quote_element = await quote.query_selector("span[class='text']")
            quote_text = quote_element.text

            #find the author and extract the text
            author_element = await quote.query_selector("small[class='author']")
            author = author_element.text

            #find the tag elements
            tag_elements = await quote.query_selector_all("a[class='tag']")
            tags = []

            #iterate through the tags and extract their text
            for tag_element in tag_elements:
                text = tag_element.text
                tags.append(text)

            #add our extracted data to the list of scraped data
            scraped_data.append({
                "quote": quote_text,
                "author": author,
                "tags": tags
            })

        #check the page for a "next" button
        next_button = await page.select("li[class='next'] > a")

        #if it doesn't exist, close the browser and break the loop
        if next_button == None:
            await page.close()
            next_url = None

        #if it does, follow this block instead
        else:
            attributes = next_button.attributes

            #loop through the attributes to find your desired attribute, its value is the next index
            for i in range(len(attributes)):
                if attributes[i] == "href":
                    next_url = attributes[i+1]

    #write the data to a json file
    with open("quotes.json", "w", encoding="utf-8") as f:
        json.dump(scraped_data, f, ensure_ascii=False, indent=4)


if __name__ == '__main__':

    nodriver.loop().run_until_complete(main())

If you run the script above, you’ll get a JSON file with objects like what you see below.

[
    {
        "quote": "“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”",
        "author": "Albert Einstein",
        "tags": [
            "change",
            "deep-thoughts",
            "thinking",
            "world"
        ]
    },
    {
        "quote": "“It is our choices, Harry, that show what we truly are, far more than our abilities.”",
        "author": "J.K. Rowling",
        "tags": [
            "abilities",
            "choices"
        ]
    },

Current Limitations of NODRIVER

Currently, NODRIVER has some serious limitations that are worth noting. Let’s go over those.

Headless Mode

NODRIVER throws an error whenever we run it in headless mode. We are not sure if this is intentional (as an antibot bypass) or a legitimate issue.

A Python error traceback displayed in a terminal window, indicating a maximum recursion depth exceeded error in a Nodriver script, with file paths and line numbers referenced. The error suggests issues with preparing a headless session and sending a one-shot request.

Page Interactions

While NODRIVER has numerous page interactions listed in their docs, most of them either partially work — or don’t work at all. As you can see, this is documented in the screenshot below for both click_mouse() and mouse_click().

Code snippet showing the parameters and usage of async functions mouse_click and click_mouse, including details on button selection, modifier keys, and an internal event for waiting.

Attribute Extraction

This biggest pain point with NODRIVER is the attribute extraction. As mentioned before, this outputs an array and it’s extremely archaic as you saw in our href workaround. Here’s the literal output from attribute. For production-level scraping, this needs to be addressed.

A snippet of code showing a hyperlink attribute with the value '/page/2/' in an array format.

Proxy Usage With NODRIVER

Currently, proxy support for NODRIVER is limited at best. They do provide a create_context() method for proxy connection.

The snippet below comes straight from their issues page. However, after hours of trying this and various other methods, I was still unable to connect.

tab = await  browser.create_context("https://www.google.nl", proxy_server='socks5://myuser:mypass@somehost')

# or add  new_window=True if you would like a new window

If you look at their documentation, they have a section on proxies [1]. Even though there’s an official proxy section, there’s no actual documentation. We presume this will fixed in the near future.

Viable Alternatives

While it’s not currently ready for production use, I expect great things from NODRIVER in the future. If you’re looking for something more heavy duty, take a look at the browsers below.

  • Selenium: Going strong since 2004. Selenium depends on Chromedriver, but it’s battle tested and production-ready. Learn more about Selenium web scraping.
  • Playwright: Playwright feels like a polished, ready-to-go version of what you’ve seen in this tutorial with NODRIVER. Learn how to use Playwright for web scraping.

Conclusion

NODRIVER is an exciting new tool for browser automation, but rapid development means some features are still maturing. For large-scale, reliable web scraping, consider using robust solutions like:

Sign up for a free trial and get started today!

No credit card required