Google Images is one of the more difficult sites to scrape on the web. They don’t explicitly block scrapers, but they really make you work for the data… You’ve got to want it!
From dynamic CSS selectors to Base64 encoding, scraping Google Images is a lot more like solving a puzzle than scraping regular HTML.
Prerequisites
To scrape Google Images with us, you should have a basic understanding of Python and Selenium. You’ll need to make sure you’ve got Selenium installed. We suggest you learn more about web scraping with Python and Selenium if needed.
First, make sure you’ve got ChromeDriver and Chrome installed. You can download the most recent one here.
When downloading ChromeDriver, make sure you’re getting a version that matches your version of Chrome.
You can check your Chrome version with the following command.
google-chrome --version
The output should be similar to what you see below.
Google Chrome 131.0.6778.139
Once you’ve got these, you can install Selenium with pip
.
pip install selenium
What To Scrape
We can’t just plunge head first into code. We need to get a better idea of what we’re scraping and how we’ll extract it. Like we said earlier, scraping Google Images is like solving a puzzle.
Let’s examine one of the images from Google. This image is actually embedded in a custom HTML tag called, g-img
. We’ll need to find all of these g-img
elements.
Once we’ve found all the g-img
tags, we need to extract their img
elements. You can see one of those below.
If you looked at the img
closely, you should’ve noticed something extremely strange. The src
is a bizarre string of seemingly random characters.

The beginning of this string holds the key to everything: data:image/jpeg;base64,
. jpeg
tells us that this is a JPEG file. base64
tells us that it’s encoded using Base64. When we decode this string, we actually get the binary of the image. We’re not actually able to trace the true source of the image since its binary is actually inside the web page. However, we can write this binary to a file and recreate the image.
Scraping Google Images With Python
Now that we know what we want, it’s actually time to start coding our scraper. In the next few sections, we’ll put the scraper together and go through exactly what the code does.
Getting Started
Go ahead and create a new Python file. We’ll start with just our basic imports and structure.
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
import base64
from pathlib import Path
options = webdriver.ChromeOptions()
"""
Our actual scraping logic will go here
"""
if __name__ == "__main__":
scrape_images("linux penguin", 100)
- We import
webdriver
andBy
from Selenium.webdriver
is used to control our browser.By
is used for locating items on the page. - We’ll use
sleep
to pause our scraper for a period of time. For example, if we want the scraper to wait for one second, we’d usesleep(1)
. - As you might have guessed,
base64
is going to decode our image binaries. Path
will be used to write our images to a folder containing our results.options = webdriver.ChromeOptions()
allows us to use custom settings with Selenium. Primarily, this is to run Selenium in headless mode. Headless mode allows us to run the scraper without rendering the actual browser on the machine. This saves valuable resources.
Scraping Google Images
Next, we’ll write our scraping function. The code below contains our entire scraper. Pay close attention to scrape_images()
.
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
import base64
from pathlib import Path
options = webdriver.ChromeOptions()
def scrape_images(keyword, batch_size, headless=True):
if headless:
options.add_argument("--headless")
formatted_keyword = keyword.replace(" ", "+")
folder_name = keyword.replace(" ", "-")
output_folder = Path(f"results-{folder_name}")
output_folder.mkdir(parents=True, exist_ok=True)
result_count = 0
driver = webdriver.Chrome(options=options)
driver.get(f"https://www.google.com/search?q={formatted_keyword}")
sleep(1)
list_items = driver.find_elements(By.CSS_SELECTOR, "div[role='listitem']")
list_items[1].click()
while result_count < batch_size:
driver.execute_script("window.scrollBy(0, 300);")
sleep(1)
img_tags = driver.find_elements(By.CSS_SELECTOR, "g-img > img")
for img_tag in img_tags:
src = img_tag.get_attribute("src")
if not src or not src.startswith("data:image/"):
continue
base64_binary = src.split("base64,")[-1]
mime_type = src.split(";")[0].split(":")[1]
file_extension = mime_type.split("/")[-1]
if file_extension == "gif":
continue
alt_text = img_tag.get_attribute("alt") or "image"
filename = f"{alt_text}-{result_count}.{file_extension}"
image_binary = base64.b64decode(base64_binary)
output_path = output_folder.joinpath(filename)
with open(output_path, "wb") as file:
file.write(image_binary)
result_count+=1
print(f"Saved: {filename}")
driver.quit()
if __name__ == "__main__":
scrape_images("linux penguin", 100)
- We set
headless
toTrue
by default. If the user sets it toFalse
, this will launch an actual browser that you can see on screen. This is useful for debugging purposes. - We create a
formatted_keyword
andfolder_name
by removing spaces from our actualkeyword
. This allows us to store the files without any issues. - We launch our browser with
webdriver.Chrome(options=options)
. driver.get(f"https://www.google.com/search?q={formatted_keyword}")
takes us to the Google search results for ourkeyword
.- Now we need to click on the images tab. We do this by finding all
div
elements with the rolelistitem
.list_items[1].click()
clicks on the second item, the images tab. - We use a
while
loop to run our scraping code over and over until we’ve found all the images we want. driver.execute_script("window.scrollBy(0, 300);")
runs JavaScript to scroll the page down by 300 pixels. After scrolling, wesleep()
for one second while the content loads.driver.find_elements(By.CSS_SELECTOR, "g-img > img")
is used to find allimg
tags that are nested inside ag-img
.- Next, we iterate through the
img
items we found. - If the
img
doesn’t start withdata:image/
, we usecontinue
to skip it. Otherwise, we pull itssrc
attribute. - We use some basic string splitting to extract the encoded binary and the file extension (JPEG, PNG, etc.). If the extension is a GIF, we skip it. For some reason, GIFs don’t display when we write them to a file.
base64.b64decode(base64_binary)
decodes our image into actual machine readable binary.
If you run the code, you’ll see a new folder pop up inside your project folder. It should be full of images.
Consider Using Bright Data
Our SERP API parses the Google Images so you don’t have to. It even finds the image metadata, so our images will have actual names. Of course, the API is fully scalable and can deal with an enormous number of requests.
First, sign up for our SERP API.
When you’re ready, finish creating the zone.
Under Access Details, you’ll see your credentials.
Copy and paste the code below into a Python file. Replace the credentials in proxy_auth
with your own and you’re good to go.
import requests
import base64
from pathlib import Path
import json
proxy = "brd.superproxy.io:33335"
proxy_auth = "brd-customer-<your-customer-id>-zone-<your-zone-name>:<your-zone-password>"
proxy_url = f"http://{proxy_auth}@{proxy}"
def scrape_images(keyword):
formatted_keyword = keyword.replace(" ", "+")
folder_name = keyword.replace(" ", "-")
output_folder = Path(f"serp-results-{folder_name}")
output_folder.mkdir(parents=True, exist_ok=True)
url = f"https://www.google.com/search?q={formatted_keyword}&tbm=isch&brd_json=1"
response = requests.get(
url,
proxies={"http": proxy_url, "https": proxy_url},
verify=False
)
images = response.json()["images"]
result_count = 0
for image in images:
image_binary = base64.b64decode(image["source_logo"].split("base64,")[-1])
title = image["title"].replace(" ", "-").replace("/", "").strip(".")
file_extension = image["source_logo"].split(";")[0].split(":")[1].split("/")[-1]
if file_extension == "gif":
continue
filename = f"{title}.{file_extension}"
with open(output_folder.joinpath(filename), "wb") as file:
file.write(image_binary)
print(f"Saved: {filename}")
if __name__ == "__main__":
scrape_images("linux penguin")
if you run the code, you’ll get a bunch of images again, but this time, they all have names.
Conclusion
In conclusion, scraping images from Google is a bit like trying to solve a puzzle without all the pieces. Our Google Images API finds the metadata and cuts out the need for Selenium!
If you need to scrape images from other sources, we also have an Instagram Image API, Shutterstock Scraper, and different structured datasets. Sign up now and find the perfect product for your needs, including a free trial!
No credit card required