Google Trends is a free tool that provides insights into what people are searching for online. By analyzing these search trends, businesses can identify emerging market trends, understand consumer behavior, and make data-driven decisions to boost sales and marketing efforts. Extracting data from Google Trends allows companies to stay ahead of the competition by tailoring their strategies.
In this article, you’ll learn how to scrape data from Google Trends using Python and how to store and analyze that data effectively.
Why Scrape Google Trends
Scraping and analyzing Google Trends data can be valuable in various scenarios, including the following:
- Keyword research: Content creators and SEO specialists need to know what keywords are gaining traction so that they can drive more organic traffic to their websites. Google Trends helps explore trending search terms by region, category, or time, enabling you to optimize your content strategy based on evolving user interest.
- Market research: Marketers must understand customer interests and anticipate shifts in demands to make informed decisions. Scraping and analyzing Google Trends data enables them to understand customer search patterns and monitor trends over time.
- Societal research: Several factors, including local and global events, technological innovations, economic shifts, and political developments, can significantly impact public interest and search trends. Google Trends data provides valuable insights into these changing trends over time, enabling comprehensive analysis and informed future predictions.
- Brand monitoring: Businesses and marketing teams must monitor how their brand is perceived in the market. When you scrape Google Trends data, you can compare your brand’s visibility with competitors and swiftly react to changes in public perception.
Bright Data’s Alternative to Scraping Google Trends – Bright Data’s SERP API
Instead of manually scraping Google Trends, use Bright Data’s SERP API to automate real-time data collection from search engines. The SERP API offers structured data like search results and trends, with precise geo-targeting and no risk of blocks or CAPTCHAs. You only pay for successful requests, and data is delivered in JSON or HTML formats for easy integration.
This solution is faster, more scalable, and eliminates the need for complex scraping scripts. Start your free trial and streamline your data collection with Bright Data’s Google Trends scraper.
How to Scrape Data from Google Trends
Google Trends doesn’t offer official APIs for scraping trends data, but you can use several third-party APIs and libraries to access this information, like pytrends, which is a Python library that provides user-friendly APIs that let you automatically download reports from Google Trends. However, while pytrends is easy to use, it provides limited data because it cannot access data that is dynamically rendered or behind interactive elements. To address this issue, you can leverage Selenium with Beautiful Soup to scrape Google Trends and extract data from dynamically rendered web pages. Selenium is an open source tool for interacting with and scraping websites that uses JavaScript to load content dynamically. Beautiful Soup helps parse the scraped HTML contents, enabling you to extract specific data from web pages.
Before you begin this tutorial, you need to have Python installed and set up on your machine. You also need to create an empty project directory for the Python scripts that you’ll build in the next few sections.
Create a Virtual Environment
A virtual environment allows you to isolate Python packages into separate directories to avoid version conflicts. To create a new virtual environment, execute the following command in your terminal:
# navigate to the root of your project directory before executing the command
python -m venv myenv
This command creates a folder named myenv
in the project directory. Activate the virtual environment by executing the following command:
source myenv/bin/activate
Any subsequent Python or pip commands are also executed in this environment.
Install Your Dependencies
As discussed previously, you need Selenium and Beautiful Soup to scrape and parse web pages. Additionally, to analyze and visualize the scraped data, you need to install the pandas and Matplotlib Python modules. Use the following command to install these packages:
pip install beautifulsoup4 pandas matplotlib selenium
Query Google Trends Search Data
The Google Trends dashboard lets you explore search trends by region, date range, and category. For instance, this URL shows search trends for coffee in the United States for the past seven days:
https://trends.google.com/trends/explore?date=now%207-d&geo=US&q=coffee
When you open this web page in your browser, you’ll notice that the data loads dynamically using JavaScript. To scrape dynamic content, you can use the Selenium WebDriver, which mimics user interactions, such as clicking, typing, or scrolling.
You can use webdriver
in your Python script to load the web page in a browser window and extract its page source once the content has loaded. To handle dynamic content, you can add an explicit time.sleep
to ensure all content is loaded before you fetch the page source. If you want to learn more techniques to handle dynamic content, check out this guide.
Create a main.py
file in the project’s root and add the following code snippet to it:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
def get_driver():
# update the path to the location of your Chrome binary
CHROME_PATH = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
options = Options()
# options.add_argument("--headless=new")
options.binary_location = CHROME_PATH
driver = webdriver.Chrome(options=options)
return driver
def get_raw_trends_data(
driver: webdriver.Chrome, date_range: str, geo: str, query: str
) -> str:
url = f"https://trends.google.com/trends/explore?date={date_range}&geo={geo}&q={query}"
print(f"Getting data from {url}")
driver.get(url)
# workaround to get the page source after initial 429 error
driver.get(url)
driver.maximize_window()
# Wait for the page to load
time.sleep(5)
return driver.page_source
The get_raw_trends_data
method accepts date range, geographical region, and query name as parameters and uses the Chrome WebDriver to fetch page contents. Notice that the driver.get
method is called twice as a workaround to fix the initial 429 error
thrown by Google when the URL is loaded for the first time.
You’ll use this method in the following sections to fetch data.
Parse Data Using Beautiful Soup
The Trends page for a search term includes an Interest by sub-region widget that contains paginated records with values between 0 and 100, indicating the popularity of the search term based on location. Use the following code snippet to parse this data with Beautiful Soup:
# Add import
from bs4 import BeautifulSoup
def extract_interest_by_sub_region(content: str) -> dict:
soup = BeautifulSoup(content, "html.parser")
interest_by_subregion = soup.find("div", class_="geo-widget-wrapper geo-resolution-subregion")
related_queries = interest_by_subregion.find_all("div", class_="fe-atoms-generic-content-container")
# Dictionary to store the extracted data
interest_data = {}
# Extract the region name and interest percentage
for query in related_queries:
items = query.find_all("div", class_="item")
for item in items:
region = item.find("div", class_="label-text").text.strip()
interest = item.find("div", class_="progress-value").text.strip()
interest_data[region] = interest
return interest_data
This code snippet finds the matching div
for sub-region data using its class name and iterates over the result to construct an interest_data
dictionary.
Note that the class name could change in the future, and you might need to use the Chrome DevTools Inspect element feature to find the correct name.
Now that you’ve defined the helper methods, use the following code snippet to query data for “coffee”:
# Parameters
date_range = "now 7-d"
geo = "US"
query = "coffee"
# Get the raw data
driver = get_driver()
raw_data = get_raw_trends_data(driver, "now 7-d", "US", "coffee")
# Extract the interest by region
interest_data = extract_interest_by_sub_region(raw_data)
# Print the extracted data
for region, interest in interest_data.items():
print(f"{region}: {interest}")
Your output looks like this:
Hawaii: 100
Montana: 96
Oregon: 90
Washington: 86
California: 84
Manage Data Pagination
As the data in the widget is paginated, the code snippet in the previous section returns only the data from the first page of the widget. To fetch more data, you can use the Selenium WebDriver to find and click the Next button. Additionally, your script must handle the cookie consent banner by clicking the Accept button to ensure that the banner doesn’t obstruct other elements on the page.
To handle cookies and pagination, add this code snippet at the end of main.py
:
# Add import
from selenium.webdriver.common.by import By
all_data = {}
# Accept the cookies
driver.find_element(By.CLASS_NAME, "cookieBarConsentButton").click()
# Get paginated interest data
while True:
# Click the md-button to load more data if available
try:
geo_widget = driver.find_element(
By.CSS_SELECTOR, "div.geo-widget-wrapper.geo-resolution-subregion"
)
# Find the load more button with class name "md-button" and aria-label "Next"
load_more_button = geo_widget.find_element(
By.CSS_SELECTOR, "button.md-button[aria-label='Next']"
)
icon = load_more_button.find_element(By.CSS_SELECTOR, ".material-icons")
# Check if the button is disabled by checking class-name includes arrow-right-disabled
if "arrow-right-disabled" in icon.get_attribute("class"):
print("No more data to load")
break
load_more_button.click()
time.sleep(2)
extracted_data = extract_interest_by_sub_region(driver.page_source)
all_data.update(extracted_data)
except Exception as e:
print("No more data to load", e)
break
driver.quit()
This snippet uses the existing driver
instance to find and click the Next button by matching it against its class name. It checks the presence of the arrow-right-disabled
class in the element to determine whether the button is disabled, indicating that you have reached the last page of the widget. It exits the loop when this condition is met.
Visualize the Data
To easily access and further analyze the data you’ve scraped, you can persist the extracted sub-region data to a CSV file using a csv.DictWriter
.
Start by defining the save_interest_by_sub_region
in the main.py
to save the all_data
dictionary to a CSV file:
# Add import
import csv
def save_interest_by_sub_region(interest_data: dict):
interest_data = [{"Region": region, "Interest": interest} for region, interest in interest_data.items()]
csv_file = "interest_by_region.csv"
# Open the CSV file for writing
with open(csv_file, mode='w', newline='') as file:
writer = csv.DictWriter(file, fieldnames=["Region", "Interest"])
writer.writeheader() # Write the header
writer.writerows(interest_data) # Write the data
print(f"Data saved to {csv_file}")
return csv_file
Then, you can use pandas
to open the CSV file as a DataFrame and perform analytics, such as filtering data by specific conditions, aggregating data with group-by
operations, or visualizing trends with plots.
For instance, let’s visualize the data as a bar chart to compare interest by sub-regions. To create plots, use the matplotlib
Python library, which works seamlessly with DataFrames. Add the following function to the main.py
file to create a bar chart and save it as an image:
# Add imports
import pandas as pd
import matplotlib.pyplot as plt
def plot_sub_region_data(csv_file_path, output_file_path):
# Load the data from the CSV file
df = pd.read_csv(csv_file_path)
# Create a bar chart for comparison by region
plt.figure(figsize=(30, 12))
plt.bar(df["Region"], df["Interest"], color="skyblue")
# Add titles and labels
plt.title('Interest by Region')
plt.xlabel('Region')
plt.ylabel('Interest')
# Rotate the x-axis labels if needed
plt.xticks(rotation=45)
# Show the plot
plt.savefig(output_file_path)
Add the following code snippet at the end of the main.py
file to call the earlier functions:
csv_file_path = save_interest_by_sub_region(all_data)
output_file_path = "interest_by_region.png"
plot_sub_region_data(csv_file_path, output_file_path)
This snippet creates a plot that looks like this:
All the code for this tutorial is available in this GitHub repo.
Scraping Challenges
In this tutorial, you scraped a small amount of data from Google Trends, but as your scraping scripts grow in size and complexity, you’re likely to run into challenges like IP bans and CAPTCHAs.
For instance, as you send more frequent traffic to a website using this script, you could face IP bans as many websites have safeguards in place to detect and block bot traffic. To avoid this, you can use manual IP rotation or one of the best proxy services. If you are not sure what proxy type you should use, read our article that covers the best proxy types for web scraping.
Encountering a CAPTCHA or reCAPTCHA is another common challenge that websites use when they detect or suspect bot traffic or anomalies. To avoid this, you can reduce the request frequency, use proper request headers, or use third-party services that can solve these challenges.
Conclusion
In this article, you learned how to scrape Google Trends data with Python using Selenium and Beautiful Soup.
As you continue in your web scraping journey, you may encounter challenges like IP bans and CAPTCHAs. Rather than managing complex scraping scripts, consider using Bright Data’s SERP API, which automates the process of collecting accurate, real-time search engine data, including Google Trends. The SERP API handles dynamic content, location-based targeting, and ensures high success rates, saving you time and effort.
Sign up now and start your SERP API free trial!
No credit card required