How to Scrape Google Trends with Python

Learn how to scrape Google Trends data with Python and use it for keyword research, market insights, and trend analysis.
12 min read
How to Scrape Google Trends with Python blog image

Google Trends is a free tool that provides insights into what people are searching for online. By analyzing these search trends, businesses can identify emerging market trends, understand consumer behavior, and make data-driven decisions to boost sales and marketing efforts. Extracting data from Google Trends allows companies to stay ahead of the competition by tailoring their strategies.

In this article, you’ll learn how to scrape data from Google Trends using Python and how to store and analyze that data effectively.

Why Scrape Google Trends

Scraping and analyzing Google Trends data can be valuable in various scenarios, including the following:

  • Keyword research: Content creators and SEO specialists need to know what keywords are gaining traction so that they can drive more organic traffic to their websites. Google Trends helps explore trending search terms by region, category, or time, enabling you to optimize your content strategy based on evolving user interest.
  • Market research: Marketers must understand customer interests and anticipate shifts in demands to make informed decisions. Scraping and analyzing Google Trends data enables them to understand customer search patterns and monitor trends over time.
  • Societal research: Several factors, including local and global events, technological innovations, economic shifts, and political developments, can significantly impact public interest and search trends. Google Trends data provides valuable insights into these changing trends over time, enabling comprehensive analysis and informed future predictions.
  • Brand monitoring: Businesses and marketing teams must monitor how their brand is perceived in the market. When you scrape Google Trends data, you can compare your brand’s visibility with competitors and swiftly react to changes in public perception.

Bright Data’s Alternative to Scraping Google Trends – Bright Data’s SERP API

Instead of manually scraping Google Trends, use Bright Data’s SERP API to automate real-time data collection from search engines. The SERP API offers structured data like search results and trends, with precise geo-targeting and no risk of blocks or CAPTCHAs. You only pay for successful requests, and data is delivered in JSON or HTML formats for easy integration.

This solution is faster, more scalable, and eliminates the need for complex scraping scripts. Start your free trial and streamline your data collection with Bright Data’s Google Trends scraper.

How to Scrape Data from Google Trends

Google Trends doesn’t offer official APIs for scraping trends data, but you can use several third-party APIs and libraries to access this information, like pytrends, which is a Python library that provides user-friendly APIs that let you automatically download reports from Google Trends. However, while pytrends is easy to use, it provides limited data because it cannot access data that is dynamically rendered or behind interactive elements. To address this issue, you can leverage Selenium with Beautiful Soup to scrape Google Trends and extract data from dynamically rendered web pages. Selenium is an open source tool for interacting with and scraping websites that uses JavaScript to load content dynamically. Beautiful Soup helps parse the scraped HTML contents, enabling you to extract specific data from web pages.

Before you begin this tutorial, you need to have Python installed and set up on your machine. You also need to create an empty project directory for the Python scripts that you’ll build in the next few sections.

Create a Virtual Environment

A virtual environment allows you to isolate Python packages into separate directories to avoid version conflicts. To create a new virtual environment, execute the following command in your terminal:

# navigate to the root of your project directory before executing the command
python -m venv myenv

This command creates a folder named myenv in the project directory. Activate the virtual environment by executing the following command:

source myenv/bin/activate

Any subsequent Python or pip commands are also executed in this environment.

Install Your Dependencies

As discussed previously, you need Selenium and Beautiful Soup to scrape and parse web pages. Additionally, to analyze and visualize the scraped data, you need to install the pandas and Matplotlib Python modules. Use the following command to install these packages:

pip install beautifulsoup4 pandas matplotlib selenium

Query Google Trends Search Data

The Google Trends dashboard lets you explore search trends by region, date range, and category. For instance, this URL shows search trends for coffee in the United States for the past seven days:

https://trends.google.com/trends/explore?date=now%207-d&geo=US&q=coffee

When you open this web page in your browser, you’ll notice that the data loads dynamically using JavaScript. To scrape dynamic content, you can use the Selenium WebDriver, which mimics user interactions, such as clicking, typing, or scrolling.

You can use webdriver in your Python script to load the web page in a browser window and extract its page source once the content has loaded. To handle dynamic content, you can add an explicit time.sleep to ensure all content is loaded before you fetch the page source. If you want to learn more techniques to handle dynamic content, check out this guide.

Create a main.py file in the project’s root and add the following code snippet to it:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

import time

def get_driver():
    # update the path to the location of your Chrome binary
    CHROME_PATH = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"

    options = Options()
    # options.add_argument("--headless=new")
    options.binary_location = CHROME_PATH

    driver = webdriver.Chrome(options=options)

    return driver


def get_raw_trends_data(
    driver: webdriver.Chrome, date_range: str, geo: str, query: str
) -> str:
    url = f"https://trends.google.com/trends/explore?date={date_range}&geo={geo}&q={query}"

    print(f"Getting data from {url}")

    driver.get(url)
    # workaround to get the page source after initial 429 error
    driver.get(url)
    driver.maximize_window()

    # Wait for the page to load
    time.sleep(5)

    return driver.page_source

The get_raw_trends_data method accepts date range, geographical region, and query name as parameters and uses the Chrome WebDriver to fetch page contents. Notice that the driver.get method is called twice as a workaround to fix the initial 429 error thrown by Google when the URL is loaded for the first time.

You’ll use this method in the following sections to fetch data.

Parse Data Using Beautiful Soup

The Trends page for a search term includes an Interest by sub-region widget that contains paginated records with values between 0 and 100, indicating the popularity of the search term based on location. Use the following code snippet to parse this data with Beautiful Soup:

# Add import
from bs4 import BeautifulSoup

def extract_interest_by_sub_region(content: str) -> dict:
    soup = BeautifulSoup(content, "html.parser")

    interest_by_subregion = soup.find("div", class_="geo-widget-wrapper geo-resolution-subregion")

    related_queries = interest_by_subregion.find_all("div", class_="fe-atoms-generic-content-container")

    # Dictionary to store the extracted data
    interest_data = {}

    # Extract the region name and interest percentage
    for query in related_queries:
        items = query.find_all("div", class_="item")
        for item in items:
            region = item.find("div", class_="label-text").text.strip()
            interest = item.find("div", class_="progress-value").text.strip()
            interest_data[region] = interest

    return interest_data

This code snippet finds the matching div for sub-region data using its class name and iterates over the result to construct an interest_data dictionary.

Note that the class name could change in the future, and you might need to use the Chrome DevTools Inspect element feature to find the correct name.

Now that you’ve defined the helper methods, use the following code snippet to query data for “coffee”:

# Parameters
date_range = "now 7-d"
geo = "US"
query = "coffee"

# Get the raw data
driver = get_driver()

raw_data = get_raw_trends_data(driver, "now 7-d", "US", "coffee")

# Extract the interest by region
interest_data = extract_interest_by_sub_region(raw_data)

# Print the extracted data
for region, interest in interest_data.items():
    print(f"{region}: {interest}")

Your output looks like this:

Hawaii: 100
Montana: 96
Oregon: 90
Washington: 86
California: 84

Manage Data Pagination

As the data in the widget is paginated, the code snippet in the previous section returns only the data from the first page of the widget. To fetch more data, you can use the Selenium WebDriver to find and click the Next button. Additionally, your script must handle the cookie consent banner by clicking the Accept button to ensure that the banner doesn’t obstruct other elements on the page.

To handle cookies and pagination, add this code snippet at the end of main.py:

# Add import
from selenium.webdriver.common.by import By


all_data = {}

# Accept the cookies
driver.find_element(By.CLASS_NAME, "cookieBarConsentButton").click()

# Get paginated interest data
while True:
    # Click the md-button to load more data if available
    try:
        geo_widget = driver.find_element(
            By.CSS_SELECTOR, "div.geo-widget-wrapper.geo-resolution-subregion"
        )
        
        # Find the load more button with class name "md-button" and aria-label "Next"
        load_more_button = geo_widget.find_element(
            By.CSS_SELECTOR, "button.md-button[aria-label='Next']"
        )
        
        icon = load_more_button.find_element(By.CSS_SELECTOR, ".material-icons")
        
        # Check if the button is disabled by checking class-name includes arrow-right-disabled
        if "arrow-right-disabled" in icon.get_attribute("class"):
            print("No more data to load")
            break
        
        load_more_button.click()
        time.sleep(2)
        
        extracted_data = extract_interest_by_sub_region(driver.page_source)
        all_data.update(extracted_data)
    except Exception as e:
        print("No more data to load", e)
        break

driver.quit()

This snippet uses the existing driver instance to find and click the Next button by matching it against its class name. It checks the presence of the arrow-right-disabled class in the element to determine whether the button is disabled, indicating that you have reached the last page of the widget. It exits the loop when this condition is met.

Visualize the Data

To easily access and further analyze the data you’ve scraped, you can persist the extracted sub-region data to a CSV file using a csv.DictWriter.

Start by defining the save_interest_by_sub_region in the main.py to save the all_data dictionary to a CSV file:

# Add import
import csv


def save_interest_by_sub_region(interest_data: dict):
    interest_data = [{"Region": region, "Interest": interest} for region, interest in interest_data.items()]

    csv_file = "interest_by_region.csv"

    # Open the CSV file for writing
    with open(csv_file, mode='w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=["Region", "Interest"])
        writer.writeheader()  # Write the header
        writer.writerows(interest_data)  # Write the data

    print(f"Data saved to {csv_file}")
    return csv_file

Then, you can use pandas to open the CSV file as a DataFrame and perform analytics, such as filtering data by specific conditions, aggregating data with group-by operations, or visualizing trends with plots.

For instance, let’s visualize the data as a bar chart to compare interest by sub-regions. To create plots, use the matplotlib Python library, which works seamlessly with DataFrames. Add the following function to the main.py file to create a bar chart and save it as an image:

# Add imports
import pandas as pd
import matplotlib.pyplot as plt


def plot_sub_region_data(csv_file_path, output_file_path):
    # Load the data from the CSV file
    df = pd.read_csv(csv_file_path)

    # Create a bar chart for comparison by region
    plt.figure(figsize=(30, 12))
    
    plt.bar(df["Region"], df["Interest"], color="skyblue")
    
    # Add titles and labels
    plt.title('Interest by Region')
    plt.xlabel('Region')
    plt.ylabel('Interest')

    # Rotate the x-axis labels if needed
    plt.xticks(rotation=45)

    # Show the plot
    plt.savefig(output_file_path)

Add the following code snippet at the end of the main.py file to call the earlier functions:

csv_file_path = save_interest_by_sub_region(all_data)

output_file_path = "interest_by_region.png"
plot_sub_region_data(csv_file_path, output_file_path)

This snippet creates a plot that looks like this:

Plot of interest in coffee by sub-regions

All the code for this tutorial is available in this GitHub repo.

Scraping Challenges

In this tutorial, you scraped a small amount of data from Google Trends, but as your scraping scripts grow in size and complexity, you’re likely to run into challenges like IP bans and CAPTCHAs.

For instance, as you send more frequent traffic to a website using this script, you could face IP bans as many websites have safeguards in place to detect and block bot traffic. To avoid this, you can use manual IP rotation or one of the best proxy services. If you are not sure what proxy type you should use, read our article that covers the best proxy types for web scraping.

Encountering a CAPTCHA or reCAPTCHA is another common challenge that websites use when they detect or suspect bot traffic or anomalies. To avoid this, you can reduce the request frequency, use proper request headers, or use third-party services that can solve these challenges.

Conclusion

In this article, you learned how to scrape Google Trends data with Python using Selenium and Beautiful Soup.

As you continue in your web scraping journey, you may encounter challenges like IP bans and CAPTCHAs. Rather than managing complex scraping scripts, consider using Bright Data’s SERP API, which automates the process of collecting accurate, real-time search engine data, including Google Trends. The SERP API handles dynamic content, location-based targeting, and ensures high success rates, saving you time and effort.

Sign up now and start your SERP API free trial!

No credit card required