How to Scrape Google AI Overviews: 2025 Tutorial

Learn what Google AI Overview is and how to scrape it programmatically, plus tips for reliable results and overcoming common challenges.
14 min read
How to Scrape Google AI Overview

In this article, you will learn:

  • What the Google AI Overview is.
  • How it works and why scraping it can be valuable.
  • How to scrape the Google AI Overview with a step-by-step tutorial.
  • The challenges involved, and how to overcome them.

Let’s dive in!

What Is Google AI Overview?

Google AI Overview is a feature integrated into Google Search that provides AI-generated summaries at the top of search results. Behind the scenes, it is powered by Google’s Gemini large language model.

The Google AI Overview section

These overviews synthesize information from multiple web sources to offer concise answers to user queries. They often include links to the original source articles, helping users to delve deeper.

As of May 2025, AI Overviews are now available in over 200 countries and territories, and more than 40 languages. Originally, the feature was available only in the United States.

Why Scrape Google AI Overviews?

Google AI Overview responses are more than just general responses that Gemini or any other AI provider could generate. The key distinction is that they are rooted in the SERP (Search Engine Results Pages) links and the content within those links.

In other words, their content is backed by real-world articles, pages, and sites, and often includes links for further reading and expansion. This is something that LLMs typically struggle to do.

Therefore, by programmatically scraping Google AI Overviews, you could build a sort of AI-powered SERP chatbot that leverages actual SERP results to produce RAG-optimized responses. The idea is to get answers grounded in current, verifiable web content.

As you will learn at the end of this article, while this approach is surely interesting, there are some inherent challenges. So, you may consider exploring our guide on how to build a SERP chatbot through RAG.

How to Scrape Google AI Overview in Python: Step-by-Step Guide

In this tutorial section, we will guide you through the process of scraping the Google AI Overview. You will learn how to build a Python script that:

  • Connects to Google.
  • Performs a search query.
  • Waits for the AI Overview to load.
  • Scrapes the HTML from it.
  • Converts the content to Markdown.
  • Exports it to an output file

Follow the steps below to see how to perform Google AI Overview scraping!

Step #1: Project Setup

Before getting started, make sure you have Python 3 installed on your machine. If not, download it and follow the installation wizard.

Open a terminal and run the following commands:

mkdir google-ai-overview-scraper
cd google-ai-overview-scraper
python -m venv venv

These will create a new folder google-ai-overview-scraper/ for your scraper project and initialize a virtual environment.

Load the project folder in your favorite Python IDE. PyCharm Community Edition or Visual Studio Code with the Python extension are two good options.

In the project’s folder, create a scraper.py file:

google-ai-overview-scraper/
├── venv/              # Your Python virtual environment
└── scraper.py         # Your scraping script

scraper.py is now a blank script, but it will soon contain the scraping logic.

In the IDE’s terminal, activate the virtual environment. In Linux or macOS, fire this command:

source ./venv/bin/activate

Alternatively, on Windows, execute:

venv/Scripts/activate

Great! You now have a clean Python environment set up for your scraping project.

Step #2: Install Playwright

Google is a dynamic platform, and with recent updates, it now requires JavaScript execution to fully load most pages. Also, crafting a valid Google Search URL manually can be tricky. That is why the best way to interact with Google Search is by simulating user behavior in a browser.

In other words, to scrape the “AI Overview” section, you need a browser automation tool. This enables you to launch a real browser, load web pages, and interact with them programmatically—just like a user would.

One of the best browser automation tools for Python is Playwright. In your activated Python virtual environment, install Playwright via the playwright pip package:

pip install playwright

Now, complete the Playwright installation with:

python -m playwright install

This command will download the necessary browser executables and other components that Playwright needs to control web browsers.

For more details on using this tool, read our guide to web scraping with Playwright.

Awesome! You now have everything set up to start scraping the AI Overview section from Google.

Step #3: Navigate to the Google Homepage

Open your scraper.py file, import Playwright and initialize a Chromium instance in headless mode:

import asyncio
from playwright.async_api import async_playwright

async def run():
    async with async_playwright() as p:
        # Start a new Chromium instance
        browser = await p.chromium.launch(headless=True) # Set to False while developing
        context = await browser.new_context()
        page = await context.new_page()

This snippet creates a Playwright Page instance, which allows you to programmatically control a browser tab. Setting headless=True runs the browser in the background, without a GUI. If you are developing or want to debug, set headless=False to observe what your script is doing as it runs.

Since async_playwright runs asynchronously, the script must use Python’s asyncio module.

Disclaimer: Keep in mind tha new Google AI Overview features are usually rolled out first in the United States. For more accurate results, you may need to geolocate your machine to a city in the U.S. Achieve that by integrating Playwright with a web proxy. Specifically, take a look at our U.S. proxy options.

From now on, we will assume you are operating from within the United States.

Now, use Playwright’s goto() method to open the Google homepage:

await page.goto("https://google.com/")

Always remember to clean up resources by closing the browser at the end of your script:

await browser.close()

Put it all together, and you will get:

import asyncio
from playwright.async_api import async_playwright

async def run():
    async with async_playwright() as p:
        # Start a new Chromium instance
        browser = await p.chromium.launch(headless=True) # Set to False while developing
        context = await browser.new_context()
        page = await context.new_page()

        # Navigate to Google
        await page.goto("https://google.com/")

        # scraping logic goes here ...

        # Close the browser and free resources
        await browser.close()

asyncio.run(run())

Fantastic! You are ready to scrape dynamic websites like Google.

Step #4: Submit the Search Form

Reach the Google homepage in your browser. Right-click on the search bar and choose “Inspect” to open the browser’s Developer Tools:

Inspecting the Google search textarea

Google’s HTML structure often uses dynamically generated classes and attributes—which are likely to change at each deploy. That makes them unreliable for scraping, as your selectors will break over time.

Instead, target stable HTML attributes. For example, the search textarea has a clear aria-label attribute:

textarea[aria-label="Search"]

Use the fill() method to select the search textarea and fill it out with the Google search query:

await page.fill("textarea[aria-label='Search']", search_query)

Where, in this example, the search_query variable is defined as below:

search_query = "What is web scraping?"

Note that using a question-style query is a great way to prompt Google to generate the AI Overview section. That is important as that section is not always included in the search results pages. Feel free to adjust the search query to suit your specific use case.

Then, trigger the search by simulating an Enter key press:

await page.keyboard.press("Enter")

If you run the script in headful mode (headless=False) and set a breakpoint on the page.close() line, this is what you should see:

The Chromium window controlled by Playwright

Notice the “AI Overview” section at the top of the results page. If it does not show up, try re-running the script with a different, more question-like query. Amazing!

Step #5: Scrape the Google AI Overview Section

If you explore how Google’s AI Overview feature works, you will notice that three scenarios are possible:

  1. Cached response: The AI Overview snippet is already cached and appears instantly.
  2. Real-time generation: The AI Overview is generated dynamically, with a short delay as Google processes the query.
  3. No AI Overview: Google does not show the AI Overview section at all.

In this section, let’s focus on Scenario 2, where the AI Overview is generated on the fly. That is the trickiest case, and it also covers Scenario 1.

To trigger it, try using fresh or less common question-style queries. For example:

Note how Google takes some time to generate the AI Overview section

As shown above, the AI Overview section appears after a few milliseconds of processing. Specifically, it can be considered ready only when its title element is contains the text “AI Overview”.

Thus, inspect the element containing the AI Overview title:

Inspecting the AI Overview title HTML element

You can select the title using the following CSS selector:

div[jsname][role="heading"] strong

To ensure the AI Overview section is present, wait for this element to appear and contain the correct text:

await page.locator(
    "div[jsname][role='heading'] strong", has_text="ai overview"
).first.wait_for(timeout=30000)

This will wait up to 30 seconds (30000 milliseconds) for the element with the “ai overview” (case-insensitive) text to be on the page.

Now that you are certain the AI Overview section has loaded, get ready to scrape it. In most cases, a portion of the content is initially hidden behind a “Show more” button:

Inspecting the “Show more” button

To scrape the full response, check if the “Show more” button is present and then click it:

try:
    # Clicking the "Show more" button if it is present on the page
    ai_overview_show_more_button = page.locator("div[aria-label='Show more AI Overview']").first
    await ai_overview_show_more_button.click()
except PlaywrightTimeoutError:
    print("'Show more' button not present")

Do not forget to import PlaywrightTimeoutError, which is triggered when the locator() function does not find the specified element:

from playwright.async_api import TimeoutError as PlaywrightTimeoutError

Once the full section is visible, inspect the HTML structure to determine how to select it:

Inspecting the AI Overview content element

As you can tell, the main content of the AI Overview can be selected using this CSS selector:

div[jsname][data-rl] div

Use the following code to locate the element and extract its HTML:

ai_overview_element = page.locator("div[jsname][data-rl] div").first
ai_overview_html = await ai_overview_element.evaluate("el => el.outerHTML")

If you are wondering why we extracted the HTML instead of just the text, keep reading.

Here we go! You have successfully scraped the Google AI Overview section.

Step #6: Convert the Google AI Overview HTML to Markdown

When it comes to web scraping, the most common goal is to extract text from elements—not their full HTML. However, in most cases, the AI-generated content inside the Google AI Overview section is not plain text.

Instead, it can include bullet points, links, subheadings, and even images. Treating that content as plain text would strip away all that structure and context, valuable information you need to preserve.

So, a better approach is to treat the AI Overview as raw HTML and then convert it to Markdown, an ideal format for AI applications.

To convert the HTML into Markdown, install Markdownify in your activated environment:

pip install markdownify

Import it:

from markdownify import markdownify as md

And utilize it for HTML to Markdown data conversion:

ai_overview_markdown = md(ai_overview_html)

Terrific! All that is left is to export the scraped AI Overview to a Markdown file.

Step #7: Export the Scraped Data

Employ the Python Standard Library to open an output file named ai_overview.md, and write the converted Markdown content to it:

with open("ai_overview.md", "w", encoding="utf-8") as f:
    f.write(ai_overview_markdown)

This marks the end of your Google AI Overview scraping journey!

Step #8: Put It All Together

Right now, scraper.py should contain:

import asyncio
from playwright.async_api import async_playwright
from playwright.async_api import TimeoutError as PlaywrightTimeoutError
from markdownify import markdownify as md

async def run():
    async with async_playwright() as p:
        # Start a new Chromium instance
        browser = await p.chromium.launch(headless=True) # Set to False while developing
        context = await browser.new_context()
        page = await context.new_page()

        # Navigate to Google
        await page.goto("https://google.com/")

        # Fill out search form
        search_query = "What is web scraping?" # Replace it with the search query of interest
        await page.fill("textarea[aria-label='Search']", search_query)
        await page.keyboard.press("Enter")

        # Wait for the AI overview section to be ready
        await page.locator(
            "div\[jsname\][role='heading'] strong", has_text="ai overview"
        ).first.wait_for(timeout=30000)

        try:
            # Clicking the "Show more" button if it is present on the page
            ai_overview_show_more_button = page.locator("div[aria-label='Show more AI Overview']").first
            await ai_overview_show_more_button.click()
        except PlaywrightTimeoutError:
            print("'Show more' button not present")

        # Extract the AI overview HTML
        ai_overview_element = page.locator("div\[jsname\][data-rl] div").first
        ai_overview_html = await ai_overview_element.evaluate("el => el.outerHTML")

        # Convert the HTML to Markdown
        ai_overview_markdown = md(ai_overview_html)

        # Export the Markdown to a file
        with open("ai_overview.md", "w", encoding="utf-8") as f:
            f.write(ai_overview_markdown)

        # Close the browser and free resources
        await browser.close()

asyncio.run(run())

Wow! With just less than 50 lines of code, you just scraped Google’s AI Overview section.

Launch the above Google AI Overview scaper with:

python script.py

If everything goes as expected, an ai_overview.md file will appear in the project’s folder. Open it and you should see something like:

Web scraping is the process of using automated tools (called scrapers or bots) to extract content and data from websites. Unlike screen scraping, which captures only the visible pixels, web scraping delves deeper to retrieve the underlying HTML code and data stored in a website's database. This extracted data can then be used for various purposes like price comparison, market research, or data analysis.

Here's a more detailed explanation:

* **Automated Extraction:**
  Web scraping involves using software to automatically visit websites, locate and extract specific data, and save it in a structured format like a CSV file or database.

* **HTML and Database Data:**
  Scrapers don't just copy the visual content; they access the HTML code and data stored in the website's database to retrieve more comprehensive information.

* **Various Use Cases:**
  Web scraping is employed for various purposes, including price comparison, market research, competitor analysis, lead generation, sentiment analysis, and more.

* **Not Just for Businesses:**
  While businesses often use web scraping for data-driven decision-making, it's also valuable for individuals seeking price comparisons, market trends, or general data analysis.

* **Consider Ethical and Legal Implications:**
  When web scraping, it's crucial to be aware of the website's terms of service and robots.txt file to ensure you are not violating their policies or engaging in illegal activities.

Copy the above Markdown content and paste it into a Markdown viewer like StackEdit:

Note the Markdown representation on the right

That is exactly the structured, easy-to-read, and information-rich version of the Google AI Overview snippet—converted from raw HTML to clean Markdown!

Et voilà! Mission complete.

Challenges in Scraping the Google AI Overview

If you keep running the script in headed mode, at some point, you will likely encounter this blocking page:

The Google CAPTCHA stopping your bot

When making too many automated requests or using an IP address with a low reliability score, Google will detect your activity as a bot and challenge you with a reCAPTCHA.

As a workaround, you can try bypassing CAPTCHAs in Python. This can work for simpler CAPTCHAs but often fails against more advanced or newer reCAPTCHA versions like reCAPTCHA v3.

In such cases, you will likely need a premium CAPTCHA solving service. Another approach is to configure Playwright to operate on a browser other than Chromium. The problem is that, by default, Playwright instruments Chromium (or any other browser) in a way that can be detected by Google’s anti-bot systems.

To avoid detection, you can integrate Playwright with AI Agent Browser. That is a cloud browser compatible with Playwright and specialized in web scraping and data retrieval in agentic workflows.

The benefits of this approach include nearly infinite scalability and a significant reduction in CAPTCHA challenges. Even when CAPTCHAs do appear, Agent Browser comes with CAPTCHA solving capabilities as well as proxy integration, allowing you to geolocate your Google AI Overview scraping to any country or language.

Conclusion

In this tutorial, you learned what the Google AI Overview is and how to scrape data from it. As you saw, building a simple Python script to automatically retrieve this data requires just a few lines of code.

While this solution works well for small projects, it is not practical for large-scale scraping. Google uses some of the most advanced anti-bot technologies in the industry, which can result in CAPTCHAs or IP bans. Plus, scaling this process across many pages would increase infrastructure costs significantly.

If you need Google SERP data for your AI workflows, consider using an API that provides AI-ready SERP data directly, such as Bright Data’s SERP API.

Create a free Bright Data account and gain access to all the solutions in our AI data infrastructure!

Antonello Zanini

Technical Writer

5.5 years experience

Antonello Zanini is a technical writer, editor, and software engineer with 5M+ views. Expert in technical content strategy, web development, and project management.

Expertise
Web Development Web Scraping AI Integration