How to Scrape Google in Python: Retrieve Search Result Data

Learn how to scrape Google SERP data using Python and Selenium in this detailed tutorial. Understand the importance of SERP data for marketing, competitor monitoring, and optimizing SEO strategies.
20 min read
How to Scrape Google with Python blog image

Python is one of the most popular languages when it comes to web scraping. What is the biggest source of information on the Internet? Google! That is why scraping Google with Python is so popular. The idea is to automatically retrieve SERP data and use it for marketing, competitor monitoring, and more.

Follow this guided tutorial and learn how to do Google scraping in Python with Selenium. Let’s dive in!

What Data To Scrape From Google?

Google is one of the largest sources of public data on the Internet. There is tons of interesting information that you can retrieve from it, from Google Maps reviews to “People also ask” answers:

Example of how the Google search results page looks like

However, what users and companies are usually interested in is SERP data. A SERP, short for “Search Engine Results Page,” is the page returned by search engines such as Google in response to a user query. Typically, it includes a list of cards with links and text descriptions to web pages proposed by the search engine.

Here is what a SERP page looks like:

Another example of how the results page looks

SERP data is crucial for businesses to understand their online visibility and study the competition. It provides insights into user preferences, keyword performance, and competitor strategies. By analyzing SERP data, businesses can optimize their content, improve SEO rankings, and tailor marketing strategies to better meet user needs.

So, now you know that SERP data is undoubtedly very valuable. All that remains is to figure out how to choose the right tool to retrieve it. Python is one of the best programming languages for web scraping and is perfectly suited for this purpose. But before diving into manual scraping, let’s explore the best and fastest option for scraping Google search results: Bright Data’s SERP API.

Introducing Bright Data’s SERP API

Before diving into the manual scraping guide, consider leveraging Bright Data’s SERP API for efficient and seamless data collection. The SERP API provides real-time access to search engine results from all major search engines, including Google, Bing, DuckDuckGo, Yandex, Baidu, Yahoo, and Naver. This powerful tool is built on Bright Data’s industry-leading proxy services and advanced anti-bot solutions, ensuring reliable and accurate data retrieval without the usual challenges associated with web scraping.

Why Choose Bright Data’s SERP API Over Manual Scraping?

  1. Real-Time Results and High Accuracy: The SERP API delivers real-time search engine results, ensuring you get accurate and up-to-date data. With location accuracy down to the city level, you see exactly what a real user would see anywhere in the world.
  2. Advanced Anti-Bot Solutions: Forget about getting blocked or dealing with CAPTCHA challenges. The SERP API includes automated CAPTCHA solving, browser fingerprinting, and full proxy management to ensure smooth and uninterrupted data collection.
  3. Customizable and Scalable: The API supports a variety of tailored search parameters, allowing you to customize your queries to match specific needs. It’s built for volume, handling growing traffic and peak periods with ease.
  4. Ease of Use: With simple API calls, you can retrieve structured SERP data in JSON or HTML format, making it easy to integrate into your existing systems and workflows. The response time is exceptional, typically under 5 seconds.
  5. Cost-Effective: Save on operational costs by using the SERP API. You only pay for successful requests, and there’s no need to invest in maintaining scraping infrastructure or dealing with server issues.

Start your free trial today and experience the efficiency and reliability of Bright Data’s SERP API!

No credit card required

Build a Google SERP Scraper in Python

Follow this step-by-step tutorial and see how to build a Google SERP scraping script in Python.

Step 1: Project setup

To follow this guide, you need Python 3 installed on your machine. If you have to install it, download the installer, launch it, and follow the wizard.

You now have everything you need to scrape Google in Python!

Use the commands below to create a Python project with a virtual environment:

mkdir google-scraper

cd google-scraper

python -m venv env

google-scraper will be the root directory of your project.

Load the project folder in your favorite Python IDE. PyCharm Community Edition or Visual Studio Code with the Python extension are both great options.

In Linux or macOS, activate the virtual environment with the following command:

./env/bin/activate

On Windows run instead:

env/Scripts/activate

Note that some IDEs recognize the virtual environment for you, so you do need to activate it manually.

Add a scraper.py file in your project’s folder and initialize it as below:

print("Hello, World!")

This is just a simple script that prints the “Hello, World!” message, but it will soon contain the Google scraping logic.

Verify that your script works as desired by launching it via the run button in your IDE or with this command:

python scraper.py

The script should print:

Hello, World!

Well done! You now have a Python environment for SERP scraping.

Before jumping into scraping Google with Python, consider taking a look at our guide on web scraping with Python.

Step 2: Install the scraping libraries

It is time to install the appropriate Python scraping library for scraping data from Google. There are a few options available, and choosing the best approaches requires an analysis of the target site. At the same time, it is Google we are talking about, and we all know how Google works.

Forging a Google search URL that does not raise the sights of its anti-bot technologies is complex. We all know that Google requires user interaction. That is why the easiest and most effective way to interact with the search engine is through a browser, simulating what a real user would do.

In other words, you will need a headless browser tool to render web pages in a controllable browser. Selenium will be perfect!

In an activated Python virtual environment, execute the command below to install the selenium package:

pip install selenium

The setup process may take some time, so be patient.

Awesome! You just added selenium to your project’s dependencies.

Step 3: Set up Selenium

Import Selenium by adding the following lines to scraper.py:

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.chrome.options import Options

Initialize a a Chrome WebDriver instance to control a Chrome window in headless mode as below:

# options to launch Chrome in headless mode

options = Options()

options.add_argument('--headless') # comment it while developing locally

# initialize a web driver instance with the

# specified options

driver = webdriver.Chrome(

service=Service(),

options=options

)

Note: The --headless flag makes sure that Chrome will be launched with no GUI. If you want to see the operations performed by your script on the Google page, comment out that option. In general, disable the --headless flag while developing locally but leave it in production. That is because running Chrome with the GUI takes a lot of resources.

As the last line of your script, do not forget to close the web driver instance:

driver.quit()

Your scraper.py file should now contain:

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.chrome.options import Options

# options to launch Chrome in headless mode

options = Options()

options.add_argument('--headless') # comment it while developing locally

# initialize a web driver instance with the

# specified options

driver = webdriver.Chrome(

service=Service(),

options=options

)

# scraping logic...

# close the browser and free up its resources

driver.quit()

Wonderful! You have everything it takes to scrape dynamic websites.

Step 4: Visit Google

The first step in scraping Google with Python is to connect to the target site. Use the get() function from the driver object to instruct Chrome to visit the Google home page:

driver.get("https://google.com/")

This is what your Python SERP scraping script should look like so far:

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.chrome.options import Options

# options to launch Chrome in headless mode

options = Options()

options.add_argument('--headless') # comment it while developing locally

# initialize a web driver instance with the

# specified options

driver = webdriver.Chrome(

service=Service(),

options=options

)

# connect to the target site

driver.get("https://google.com/")

# scraping logic...

# close the browser and free up its resources

driver.quit()

Launch the script in headed mode and you will see the following browser window for a split second before the quit() instruction terminates it:

the browser window we'll see that shows Google's main page

If you are a user located in the EU (European Union) the Google home page will also contain the GDPR popup below:

GDPR notice popup for EU residents

In both cases, the “Chrome is being controlled by automated test software.” message informs you that Selenium is controlling Chrome as you want.

Great! Selenium opens the Google page as desired.

Note: If Google showed the cookie policy dialog for GDPR reasons, follow the next step. Otherwise, you can skip to step 6.

Step 5: Deal with the GDPR cookie dialog

The following Google GDPR cookie dialog will or will not appear depending on the location of your IP. Integrate a proxy server into Selenium to choose an exit IP of the country you prefer and avoid this issue.

Inspect the HTML element of the cookie dialog with the DevTools:

Inspecting the HTML of the cookie dialog

Expand the code and you will notice that you can select this HTML element with the CSS selector below:

[role='dialog']

If you inspect the “Accept all” button, you will notice that there is no simple CSS selection strategy to select it:

Inspecting the popup 'accept all' button

In detail, the CSS classes in the HTML code seem to be randomly generated. To select the button, get all buttons in the cookie dialog element and find the one with the “Accept all” text. The CSS selector to get all the buttons inside the cookie dialog is:

[role='dialog'] button

Apply a CSS selector on the DOM by passing it to the find_elements() Selenium method. This selects HTML elements on the page based on the specified strategy, which is a CSS selector in this case:

buttons = driver.find_elements(By.CSS_SELECTOR, "[role='dialog'] button")

To work properly, the above line requires the following import:

from selenium.webdriver.common.by import By

Use next() to find the “Accept all” button. Next, click it:

accept_all_button = next((b for b in buttons if "Accept all" in b.get_attribute("innerText")), None)

# click the "Accept all" button, if present

if accept_all_button is not None:

accept_all_button.click()

This instruction will locate the <button> element in the dialog whose text contains the “Accept all” string. If present, it will then click it by calling the Selenium click() method.

Fantastic! You are ready to simulate a Google search in Python to collect some SERP data.

Step 6: Simulate a Google search

Open Google in your browser and inspect the search form in the DevTools:

Inspecting the search form on Google's main page

The CSS classes appear to be randomly generated, but you can select the form by targeting its action attribute with this CSS selector:

form[action='/search']

Apply it in Selenium to retrieve the form element via the find_element() method:

search_form = driver.find_element(By.CSS_SELECTOR, "form[action='/search']")

If you skipped step 5, you will need to add the following import:

from selenium.webdriver.common.by import By

Expand the HTML code of the form and focus on the search textarea:

Expanding the HTML of the search form

Again, the CSS class is randomly generated but you can select it by targeting its aria-label value:

textarea[aria-label='Search']

Thus, locate the textarea inside the form and use the send_keys() button to type in the Google search query:

search_form_textarea= search_form.find_element(By.CSS_SELECTOR, "textarea[aria-label='Search']")

google_search_query = "bright data"

search_form_textarea.send_keys(google_search_query)

In this case, the Google query will be “bright data.” Keep in mind that any other query will do.

Now, call the submit() on the form element to submit the form and simulate a Google search:

search_form.submit()

Google will perform the search based on the specified query and redirect you to the desired SERP page:

Performing a search on Google on the specified query

The lines to simulate a Google search in Python with Selenium are:

# select the Google search form

search_form = driver.find_element(By.CSS_SELECTOR, "form[action='/search']")

# select the textarea inside the form

search_form_textarea = search_form.find_element(By.CSS_SELECTOR, "textarea[aria-label='Search']")

# fill out the text area with a given query

google_search_query = "bright data"

search_form_textarea.send_keys(google_search_query)

# submit the form and perform the Google search

search_form.submit()

Here we go! Get ready to retrieve SERP data by scraping Google in Python.

Step 7: Select the search result elements

Inspect the right-hand column in the result section:

Inspecting the right-hand column

As you can see, this is a <div> element you can select with the CSS selector below:

#search

Do not forget that Google pages are dynamic. As a result, you should wait for this element to be present on the page before interacting with it. Achieve that with the following line:

search_div = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#search')))

WebDriverWait is a special class offered by Selenium to implement explicit waits. In particular, it enables you to wait for a specific event to occur on the page.

In this case, the script will wait up to 10 seconds for the #search HTML node to be present on the node. This way, you can ensure that the Google SERP has loaded as desired.

WebDriverWait requires some extra imports, so add them to scraper.py:

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

Now, inspect the Google search elements:

Inspecting Google's search elements

Again, selecting them via CSS classes is not a good approach. Instead, focus on their unusual HTML attributes. A proper CSS selector to get the Google search elements is:

div[jscontroller][lang][jsaction][data-hveid][data-ved]

This identifies all <div> that have the jscontroller, lang, jsaction, data-hveid, and data-ved attributes.

Pass it to find_elements() to select all Google search elements in Python via Selenium:

google_search_elements = search_div.find_elements(By.CSS_SELECTOR, "div[jscontroller][lang][jsaction][data-hveid][data-ved]")

The entire logic will be:

# wait up to 10 seconds for the search div to be on the page

# and select it

search_div = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#search')))

# select the Google search elements in the SERP

google_search_elements = search_div.find_elements(By.CSS_SELECTOR, "div[jscontroller][lang][jsaction][data-hveid][data-ved]")

Terrific! You are just one step away from scraping SERP data in Python.

Step 8: Extract the SERP data

Not all Google SERPs are created equal. In some cases, the first search result on the page has a different HTML code than the other search elements:

Example of the first result having a different HTML code

For example, in this case, the first search result element can be retrieved with this CSS selector:

div.g[data-hveid]

Aside from that, what a Google search element contains is pretty much the same. This includes:

  • The page title in an <h3> node.
  • A URL to the specific page in an <a> element that is the parent of the <h3> above.
  • A description in the [data-sncf='1'] <div>.
what a Google search element contains

Since a single SERP contains several search results, initialize an array where to store your scraped data:

serp_elements = []

You will also need a rank integer to keep track of their ranking on the page:

rank = 1

Define a function to scrape Google search elements in Python as follows:

def scrape_search_element(search_element, rank):

# select the elements of interest inside the

# search element, ignoring the missing ones, and apply

# the data extraction logic

try:

title_h3 = search_element.find_element(By.CSS_SELECTOR, "h3")

title = title_h3.get_attribute("innerText")

except NoSuchElementException:

title = None

try:

# get the "a" element that has an "h3" child

url_a = search_element.find_element(By.CSS_SELECTOR, "a:has(> h3)")

url = url_a.get_attribute("href")

except NoSuchElementException:

url = None

try:

description_div = search_element.find_element(By.CSS_SELECTOR, "[data-sncf='1']")

description = description_div.get_attribute("innerText")

except NoSuchElementException:

description = None

# return a new SERP data element

return {

'rank': rank,

'url': url,

'title': title,

'description': description

}

Google tends to change its SERP pages a lot. Nodes inside the search elements may disappear and you should protect against with try ... catch statements. In detail, when an element is not on the DOM, find_element() raises a NoSuchElementException exception.

Import the exception:

from selenium.common import NoSuchElementException

Note the use of the has() CSS operator to select a node with a specific child. Find out more about it on the official documentation.

Now, pass the first search element and the remaining ones to the scrape_search_element() function. Then, add the returned objects to the serp_elements array:

# scrape data from the first element on the SERP

# (if present)

try:

first_search_element = driver.find_element(By.CSS_SELECTOR, "div.g[data-hveid]")

serp_elements.append(scrape_search_element(first_search_element, rank))

rank += 1

except NoSuchElementException:

pass

# scrape data from all search elements on the SERP

for google_search_element in google_search_elements:

serp_elements.append(scrape_search_element(google_search_element, rank))

rank += 1

At the end of these instructions, serp_elements will store all SERP data of interest. Verify that by printing it in the terminal:

print(serp_elements)

That will produce something like:

[

{'rank': 1, 'url': 'https://brightdata.com/', 'title': 'Bright Data - All in One Platform for Proxies and Web Data', 'description': None},

{'rank': 2, 'url': 'https://il.linkedin.com/company/bright-data', 'title': 'Bright Data', 'description': "Bright Data is the world's largest data collection platform dedicated to helping all businesses view the Internet just like their consumers and potential..."},

# omitted for brevity...

{'rank': 6, 'url': 'https://aws.amazon.com/marketplace/seller-profile?id=bf9b4324-6ee3-4eb3-9ca4-083e558d04c4', 'title': 'Bright Data - AWS Marketplace', 'description': 'Bright Data is a leading data collection platform, which enables our customers to collect structured and unstructured data sets from millions of websites...'},

{'rank': 7, 'url': 'https://techcrunch.com/2024/02/26/meta-drops-lawsuit-against-web-scraping-firm-bright-data-that-sold-millions-of-instagram-records/', 'title': 'Meta drops lawsuit against web-scraping firm Bright Data ...', 'description': 'Feb 26, 2024 — Meta has dropped its lawsuit against Israeli web-scraping company Bright Data, after losing a key claim in its case a few weeks ago.'}

]

Incredible! It only remains to export the scraped data to CSV.

Step 9: Export the scraped data to CSV

Now that you know how to scrape Google with Python, see how to export the retrieved data to a CSV file.

First, import the csv package from the Python standard library:

import csv

Next, use the csv package to populate the output serp_data.csv file with your SERP data:

header = ["rank", "url", "title", "description"]

with open("serp_data.csv", 'w', newline='', encoding='utf-8') as csvfile:

writer = csv.DictWriter(csvfile, fieldnames=header)

writer.writeheader()

writer.writerows(serp_elements)

Et voilà! Your Google Python scraping script is ready.

Step 10: Put it all together

This is the final code of your scraper.py script:

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.common.by import By

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.common import NoSuchElementException

import csv

def scrape_search_element(search_element, rank):

# select the elements of interest inside the

# search element, ignoring the missing ones, and apply

# the data extraction logic

try:

title_h3 = search_element.find_element(By.CSS_SELECTOR, "h3")

title = title_h3.get_attribute("innerText")

except NoSuchElementException:

title = None

try:

# get the "a" element that has an "h3" child

url_a = search_element.find_element(By.CSS_SELECTOR, "a:has(> h3)")

url = url_a.get_attribute("href")

except NoSuchElementException:

url = None

try:

description_div = search_element.find_element(By.CSS_SELECTOR, "[data-sncf='1']")

description = description_div.get_attribute("innerText")

except NoSuchElementException:

description = None

# return a new SERP data element

return {

'rank': rank,

'url': url,

'title': title,

'description': description

}

# options to launch Chrome in headless mode

options = Options()

options.add_argument('--headless') # comment it while developing locally

# initialize a web driver instance with the

# specified options

driver = webdriver.Chrome(

service=Service(),

options=options

)

# connect to the target site

driver.get("https://google.com/?hl=en-US")

# select the buttons in the cookie dialog

buttons = driver.find_elements(By.CSS_SELECTOR, "[role='dialog'] button")

accept_all_button = next((b for b in buttons if "Accept all" in b.get_attribute("innerText")), None)

# click the "Accept all" button, if present

if accept_all_button is not None:

accept_all_button.click()

# select the Google search form

search_form = driver.find_element(By.CSS_SELECTOR, "form[action='/search']")

# select the textarea inside the form

search_form_textarea = search_form.find_element(By.CSS_SELECTOR, "textarea[aria-label='Search']")

# fill out the text area with a given query

google_search_query = "bright data"

search_form_textarea.send_keys(google_search_query)

# submit the form and perform the Google search

search_form.submit()

# wait up to 10 seconds for the search div to be on the page

# and select it

search_div = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#search')))

# select the Google search elements in the SERP

google_search_elements = search_div.find_elements(By.CSS_SELECTOR, "div[jscontroller][lang][jsaction][data-hveid][data-ved]")

# where to store the scraped data

serp_elements = []

# to keep track of the current ranking

rank = 1

# scrape data from the first element on the SERP

# (if present)

try:

first_search_element = driver.find_element(By.CSS_SELECTOR, "div.g[data-hveid]")

serp_elements.append(scrape_search_element(first_search_element, rank))

rank += 1

except NoSuchElementException:

pass

# scrape data from all search elements on the SERP

for google_search_element in google_search_elements:

serp_elements.append(scrape_search_element(google_search_element, rank))

rank += 1

# export the scraped data to CSV

header = ["rank", "url", "title", "description"]

with open("serp_data.csv", 'w', newline='', encoding='utf-8') as csvfile:

writer = csv.DictWriter(csvfile, fieldnames=header)

writer.writeheader()

writer.writerows(serp_elements)

# close the browser and free up its resources

driver.quit()

Wow! In just over 100 lines of code, you can build a Google SERP scraper in Python.

Verify that it produces the expected results by running it in your IDE or by using this command:

python scraper.py

Wait for the scraper execution to end, and a serp_results.csv file will appear in the root folder of the project. Open it, and you will see:

The data you will see in the results csv file

Congrats! You just performed Google scraping in Python.

Conclusion

In this tutorial, you saw what data can be collected from Google and why SERP data is the most interesting. In particular, you learned how to use browser automation to build a SERP scraper in Python using Selenium.

That works on simple examples, but there are three main challenges in scraping Google with Python:

  1. Google keeps changing the page structure of SERPs.
  2. Google has some of the most advanced anti-bot solutions on the market.
  3. Building an effective scraping process that can retrieve tons of SERP data in parallel is complex and costs a lot of money.

Forget about those challenges with Bright Data’s SERP API. This next-generation API provides a set of endpoints that expose real-time SERP data from all major search engines. SERP API is based on top-notch Bright Data’s proxy services and anti-bot bypass solutions, targeting several search engines with no effort.

Make a simple API call and get your SERP data in JSON or HTML format thanks to SERP API, start your free trial today!

No credit card required