In this tutorial, you will learn how to build a Python script to scrape Google’s “People Also Ask” section. That includes commonly asked questions related to your search query and contains valuable information.
Let’s dive in!
Understanding Google’s “People Also Ask” Feature
“People also ask” (PAA) is a section in Google SERPs (Search Engine Result Pages) that features a dynamic list of questions related to their search query:
This section helps you explore topics related to your search query more deeply. First launched around 2015, PAA appears within search results as a series of expandable questions. When a question is clicked, it expands to reveal a brief answer sourced from a relevant webpage, along with a link to the source:
The “People also ask” section is frequently updated and adapts based on user searches, offering fresh and relevant information. New questions are loaded dynamically as you open dropdowns.
Scraping “People Also Ask” Google: Step-By-Step Guide
Follow this guided section and learn how to build a Python script to scrape “People also ask” from a Google SERP.
The end goal is to retrieve the data contained in each question in the “People also ask” section of the page. If you are instead interested in scraping Google, follow our tutorial on SERP scraping.
Step 1: Project Setup
Before getting started, make sure that you have Python 3 installed on your machine. Otherwise, download it, launch the executable, and follow the installation wizard.
Next, use the commands below to initialize a Python project with a virtual environment:
The people-also-ask-scraper
directory represents the project folder of your Python PAA scraper.
Load the project folder in your favorite Python IDE. PyCharm Community Edition or Visual Studio Code with the Python extension are two great options.
In the project’s folder, create a scraper.py
file. This is now a blank script, but it will soon contain the scraping logic:
In the IDE’s terminal, activate the virtual environment. In Linux or macOS, execute this command:
Alternatively, on Windows, run:
Great, you now have a Python environment for your scraper!
Step 2: Install the Selenium
Google is a platform that requires user interaction. Also, forging a valid Google search URL can be challenging. So, the best way to work with the search engine is within a browser.
In other words, to scrape the “People Also Ask” section, you need a browser automation tool. If you are not familiar with this concept, browser automation tools enable you to render and interact with web pages within a controllable browser. One of the best options in Python is Selenium!
Install Selenium by running the command below in an activated Python virtual environment:
The selenium
pip package will be added to your project’s dependencies. This may take a while, so be patient.
For more details on how to use this tool, read our guide on web scraping with Selenium.
Wonderful, you now have everything you need to start scraping Google pages!
Step 3: Navigate to the Google Home Page
Import Selenium in scraper.py
and initialize a WebDriver
object to control a Chrome instance in headless mode:
The above snippet creates a Chrome WebDriver
instance, the object to programmatically control a Chrome window. The --headless
option configures Chrome to run in headless mode. For debugging purposes, comment that line so that you can observe the automated script’s actions in real time.
Then, use the get()
method to connect to the Google home page:
Do not forget to release the driver resources at the end of the script:
Put it all together, and you will get:
Fantastic, you are ready to scrape dynamic websites!
Step 4: Deal With the GDPR Cookie Dialog
Note: If you are not located in the EU (European Union), you can skip this step.
Run the scraper.py
script in headed mode. This will briefly open a Chrome browser window displaying a Google page before the quit()
command closes it. If you are in the EU, here is what you will see:
The “Chrome is being controlled by automated test software.” message ensures that Selenium is controlling Chrome as desired.
EU users are shown a cookie policy dialog for GDPR reasons. If this is your case, you need to deal with it if you want to interact with the underlying page. Otherwise, you can skip to step 5.
Open a Google page in incognito mode and inspect the GDPR cookie dialog. Right-click on it and choose the “Inspect” option:
Note that you can locate the dialog HTML element with:
find_element()
is a method provided by Selenium to locate HTML elements on the page via different strategies. In this case, we used a CSS selector.
Do not forget to import By
as follows:
Now, focus on the “Accept all” button:
As you can tell, there is not an easy way to select it, as its CSS class seems to be randomly generated. So, you can retrieve it using an XPath expression that targets its content:
This instruction will locate the first button in the dialog whose text contains the “Accept” string. For more information, read our guide on XPath vs CSS selector.
Here is how everything fits together to handle the optional Google cookie dialog:
The click()
instruction clicks the “Accept all” button to close the dialog and allow user interaction. If the cookie policy dialog box is not present, a NoSuchElementException
will be thrown instead. The script will catch it and continue.
Remember to import the NoSuchElementException
:
Well done! You are ready to reach the page with the “People also ask” section.
Step 5: Submit the Search Form
Reach the Google home page in your browser and inspect the search form. Right-click on it and select the “Inspect” option:
This element has no CSS class, but you can select it via its action
attribute:
If you skipped step 4, import By
with:
Expand the HTML code of the form and take a look at the search textarea:
The CSS class of this node seems to be randomly generated. Thus, select it through its aria-label
attribute. Then, use the send_keys()
method to type in the target search query:
In this example, the search query is “Bright Data,” but any other search is fine.
Submit the form to trigger a page change:
Terrific! The controlled browser will now be redirected to the Google page containing the “People also ask” section.
If you execute the script in headed mode, this is what you should be seeing before the browser closes:
Note the “People also ask” section at the bottom of the above screenshot.
Step 6: Select the “People also ask” Node
Inspect the “People also ask” HTML element:
Again, there is no easy way to select it. This time, what you can do is retrieve the <div>
element with the jscontroller
, jsname
, and jsaction
attributes that contains a div
with role=heading
with the “People also ask” text:
WebDriverWait
is a special Selenium class that pauses the script until a specific condition is met on the page. Above, it waits up to 5 seconds for the desired HTML element to appear. This is required to let the page load fully after submitting the form.
The XPath expression used within presence_of_element_located()
is complex but accurately describes the criteria needed to select the “People also ask” element.
Do not forget to add the required imports:
Time to scrape data from Google’s “People also ask” section!
Step 7: Scrape “People Also Ask”
First, initialize a data structure where to store the scraped data:
This must be an array, as the “People also ask” section contains several questions.
Now, inspect the first question dropdown in the “People also ask” node:
Here, you can see that the elements of interest are the children of the data-sgrd="true"
<div>
inside the “People also ask” element with only the jsname
attribute. The last two children are used by Google as placeholders and are populated dynamically as you open dropdowns.
Select the question dropdowns with the following logic:
Click the element to expand it:
Next, focus on the content inside the question elements:
Note that the question is contained in the <span>
inside the aria-expanded="true"
node. Scrape it as follows:
Then, inspect the answer element:
Notice how you can retrieve it by collecting the text in the <span>
node with the lang
attribute inside the data-attrid="wa:/description"
element:
Next, inspect the optional image in the answer box:
You can get its URL by accessing the src
attribute from the <img>
element with the data-ilt
attribute:
Since the image element is optional, you must wrap the above code with a try ... except
block. If the node is not present in the current question, find_element()
will raise a NoSuchElementException
. The code will intercept it and move on, in that case,
If you skipped step 4, import the exception:
Lastly, inspect the source section:
You can get the URL of the source by selecting the <a>
parent of the <h3>
element:
Use the scraped data to populate a new object and add it to the people_also_ask_questions
array:
Way to go! You just scraped the “People also ask” section from a Google page.
Step 8: Export the Scraped Data to CSV
If you print people_also_ask_questions
, you will see the following output:
Sure, this is great, but it would be much better if it was in a format you can easily share with other team members. So, export people_also_ask_questions
to a CSV file!
Import the csv
package from the Python standard library:
Next, use it to populate an output CSV file with your SERP data:
Finally! Your “People also ask” scraping script is complete.
Step 9: Put It All Together
Your final scraper.py
script should contain the following code:
In 100 lines of code, you just built a PAA scraper!
Verify that it works by executing it. On Windows, launch the scraper with:
Alternatively, on Linux or macOS, run:
Wait for the scraper execution to terminate, and a people_also_ask.csv
file will appear in the root directory of your project. Open it, and you will see:
Congrats, mission complete!
Conclusion
In this tutorial, you learned what the “People Also Ask” section is on Google pages, the data it contains, and how to scrape it using Python. As you learned here, building a simple script to automatically retrieve data from it takes only a few lines of Python code.
While the solution presented works well for small projects, it is not practical for large-scale scraping. The problem is that Google has some of the most advanced anti-bot technology in the industry. So, it could block you with CAPTCHAs or IP bans. Additionally, scaling this process across multiple pages would increase infrastructure costs.
Does that mean that scraping Google efficiently and reliably is impossible? Not at all! You simply need an advanced solution that addresses these challenges, such as Bright Data’s Google Search API.
Google Search API provides an endpoint to retrieve data from Google SERP pages, including the “People also ask” section. With a simple API call, you can get the data you want in JSON or HTML format. See how to get started with it in the official documentation.
Sign up now and start your free trial!
No credit card required