Google Travel collects aggregator travel data from all over the web for all sorts of travel related categories like flights, vacation packages, and hotel rooms. Shopping for hotels is difficult, and one of the biggest pains is sorting through the mess of ad-sponsored listings and random rooms that just don’t apply to your search.
If you’re not interested in scraping, take a look at our premade travel datasets. With datasets, we do the scraping so you don’t have to. If you’re ready to scrape, read on!
Prerequisites
To scrape travel data, you’re going to need Python and either Selenium, Requests, or AIOHTTP. With Selenium, we’ll scrape hotel information straight from Google Travel. With Requests and AIOHTTP, we’ll use Bright Data’s Booking.com API.
If you’re using Selenium, you need to make sure you have webdriver installed. If you’re unfamiliar with Selenium, you can take a look at this guide to get acquainted quickly.
Install Selenium
pip install selenium
Install Requests
pip install requests
Install AIOHTTP
pip install aiohttp
Once you’ve installed your tool of choice, you’re ready to go.
What To Extract From Google Travel
If you’re choosing to scrape Google Travel manually, you need to get a better understanding of what data we’re trying to scrape. All of our hotel results come embedded in a custom c-wiz
element from Google Travel.
However, there are many c-wiz
elements on the page. Each of our hotel cards contains an a
element directly descended from a div
and this c-wiz
element. We can write a CSS selector to find all a
tags descended from these elements: c-wiz > div > a
.
The name of the listing comes embedded in an h2
.
Our price comes embedded in a span
.
Our amenities are embedded in li
(list) elements.
After finding our hotel card, we can extract all of the aforementioned data from it.
Extract The Data With Selenium
Extracting this data with Selenium is relatively straightforward once you know what to look for. However, Google Travel loads our results dynamically, which makes it a bit of a delicate process held together by preconfigured waits, mouse clicks, and custom windows. Without the custom window, your results will not load properly.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import json
from time import sleep
OPTIONS = webdriver.ChromeOptions()
OPTIONS.add_argument("--headless")
OPTIONS.add_argument("--window-size=1920,1080")
def scrape_hotels(location, pages=5):
driver = webdriver.Chrome(options=OPTIONS)
actions = ActionChains(driver)
url = f"https://www.google.com/travel/search?q={location}"
driver.get(url)
done = False
found_hotels = []
page = 1
result_number = 1
while page <= pages:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(5)
hotel_links = driver.find_elements(By.CSS_SELECTOR, "c-wiz > div > a")
print(f"-----------------PAGE {page}------------------")
print("FOUND ITEMS: ", len(hotel_links))
for hotel_link in hotel_links:
hotel_card = hotel_link.find_element(By.XPATH, "..")
try:
info = {}
info["url"] = hotel_link.get_attribute("href")
info["rating"] = 0.0
info["price"] = "n/a"
info["name"] = hotel_card.find_element(By.CSS_SELECTOR, "h2").text
price_holder = hotel_card.find_elements(By.CSS_SELECTOR, "span")
info["amenities"] = []
amenities_holders = hotel_card.find_elements(By.CSS_SELECTOR, "li")
for amenity in amenities_holders:
info["amenities"].append(amenity.text)
if "DEAL" in price_holder[0].text or "PRICE" in price_holder[0].text:
if price_holder[1].text[0] == "$":
info["price"] = price_holder[1].text
else:
info["price"] = price_holder[0].text
rating_holder = hotel_card.find_elements(By.CSS_SELECTOR, "span[role='img']")
if rating_holder:
info["rating"] = float(rating_holder[0].get_attribute("aria-label").split(" ")[0])
info["result_number"] = result_number
if info not in found_hotels:
found_hotels.append(info)
result_number+=1
except:
continue
print("Scraped Total:", len(found_hotels))
next_button = driver.find_elements(By.XPATH, "//span[text()='Next']")
if next_button:
print("next button found!")
sleep(1)
actions.move_to_element(next_button[0]).click().perform()
page+=1
sleep(5)
else:
done = True
driver.quit()
with open("scraped-hotels.json", "w") as file:
json.dump(found_hotels, file, indent=4)
if __name__ == "__main__":
PAGES = 2
scrape_hotels("miami", pages=PAGES)
- First, we create an instance of
ChromeOptions
. We use this to add our--headless
and--window-size=1920,1080
arguments.- Without our custom window size, the results don’t load properly, so we wind up scraping the same results over and over again.
- When we launch the browser, we use the keyword argument,
options=OPTIONS
. This launches Chrome with our custom options. ActionChains(driver)
gives us anActionChains
instance. We use this later in our script to move the cursor to theNext
button and then click on it.- We use a
while
loop to contain our runtime. Once the scrape has finished, we’ll exit this loop. hotel_links = driver.find_elements(By.CSS_SELECTOR, "c-wiz > div > a")
gives us all of the hotel links on the page. We find their parent elements using their xpath:hotel_card = hotel_link.find_element(By.XPATH, "..")
.- Next, we go through and extract all the individual bits of data we looked at earlier:
- url:
hotel_link.get_attribute("href")
- name:
hotel_card.find_element(By.CSS_SELECTOR, "h2").text
- When looking for the price, there are sometimes additional elements in the card such as
DEAL
andGREAT PRICE
. To ensure that we’re always getting the right price, we extract thespan
elements in an array. If the array contains these words, we take the second element (price_holder[1].text
) instead of the first (price_holder[0].text
) - We also use the
find_elements()
method when looking for the rating. If there is no rating present, we give it a default value ofn/a
. hotel_card.find_elements(By.CSS_SELECTOR, "li")
yields our amenity holders. We extract each of them using theirtext
attribute.
- url:
- We continue this loop until we’ve scraped all of our desired pages. Once we’ve got our data, we set
done
toTrue
and exit the loop. - We close the browser and use
json.dump()
to save all of our scraped data to a JSON file.
When scraping hotels from Google Travel, we didn’t run into any blocking issues, but anything is possible. If do run into any issues, we offer both residential proxies and a proxy integrated Scraping Browser to help get you past anything that might get in your way.
Scraping these results with Selenium is both tedious and delicate, but entirely doable.
Extract the Data With Bright Data’s Travel API
Sometimes you don’t want to depend on a scraper or spend all day dealing with selectors and locators. That’s fine! We offer several types of travel data. You can even extract hotel data using our Booking.com API. All you need to do is make a few HTTP requests. We handle all the rest so you can get on with your day.
Requests
The code below sets you up with the Booking.com API. Simply enter your API key, travel location, check-in date and check-out date. First, it makes a request to the API to generate the data. Then, it repeatedly checks on the data every 10 seconds until our report is ready. Once we’ve received our data, we save it conveniently in a JSON file.
import requests
import json
import time
def get_bookings(api_key, location, dates):
url = "https://api.brightdata.com/datasets/v3/trigger"
#booking.com dataset
dataset_id = "gd_m4bf7a917zfezv9d5"
endpoint = f"{url}?dataset_id={dataset_id}&include_errors=true"
auth_token = api_key
#
headers = {
"Authorization": f"Bearer {auth_token}",
"Content-Type": "application/json"
}
payload = [
{
"url": "https://www.booking.com",
"location": location,
"check_in": dates["check_in"],
"check_out": dates["check_out"],
"adults": 2,
"rooms": 1
}
]
response = requests.post(endpoint, headers=headers, json=payload)
if response.status_code == 200:
print("Request successful. Response:")
print(json.dumps(response.json(), indent=4))
return response.json()["snapshot_id"]
else:
print(f"Error: {response.status_code}")
print(response.text)
def poll_and_retrieve_snapshot(api_key, snapshot_id, output_file="snapshot-data.json"):
#create the snapshot url
snapshot_url = f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
headers = {
"Authorization": f"Bearer {api_key}"
}
print(f"Polling snapshot for ID: {snapshot_id}...")
while True:
response = requests.get(snapshot_url, headers=headers)
if response.status_code == 200:
print("Snapshot is ready. Downloading...")
snapshot_data = response.json()
#write the snapshot to a new json file
with open(output_file, "w", encoding="utf-8") as file:
json.dump(snapshot_data, file, indent=4)
print(f"Snapshot saved to {output_file}")
break
elif response.status_code == 202:
print("Snapshot is not ready yet. Retrying in 10 seconds...")
else:
print(f"Error: {response.status_code}")
print(response.text)
break
time.sleep(10)
if __name__ == "__main__":
API_KEY = "your-bright-data-api-key"
LOCATION = "Miami"
CHECK_IN = "2025-02-01T00:00:00.000Z"
CHECK_OUT = "2025-02-02T00:00:00.000Z"
DATES = {
"check_in": CHECK_IN,
"check_out": CHECK_OUT
}
snapshot_id = get_bookings(API_KEY, LOCATION, DATES)
poll_and_retrieve_snapshot(API_KEY, snapshot_id)
get_bookings()
takes yourAPI_KEY
,LOCATION
andDATES
. It then makes a request for the data and returns thesnapshot_id
.- The
snapshot_id
is very important. We need it in order to retrieve the snapshot. - After the
snapshot_id
has been generated,poll_and_retrieve_snapshot()
checks every 10 seconds to see if the data is ready. - Once the data’s ready, we use
json.dump()
to save it to a JSON file.
When you run the code, you should see something similar to this in your terminal.
Request successful. Response:
{
"snapshot_id": "s_m5moyblm1wikx4ntot"
}
Polling snapshot for ID: s_m5moyblm1wikx4ntot...
Snapshot is not ready yet. Retrying in 10 seconds...
Snapshot is not ready yet. Retrying in 10 seconds...
Snapshot is not ready yet. Retrying in 10 seconds...
Snapshot is not ready yet. Retrying in 10 seconds...
Snapshot is ready. Downloading...
Snapshot saved to snapshot-data.json
Then you’ll get a JSON file full of objects like this.
{
"input": {
"url": "https://www.booking.com",
"location": "Miami",
"check_in": "2025-02-01T00:00:00.000Z",
"check_out": "2025-02-02T00:00:00.000Z",
"adults": 2,
"rooms": 1
},
"url": "https://www.booking.com/hotel/us/ramada-plaze-by-wyndham-marco-polo-beach-resort.html?checkin=2025-02-01&checkout=2025-02-02&group_adults=2&no_rooms=1&group_children=",
"location": "Miami",
"check_in": "2025-02-01T00:00:00.000Z",
"check_out": "2025-02-02T00:00:00.000Z",
"adults": 2,
"children": null,
"rooms": 1,
"id": "55989",
"title": "Ramada Plaza by Wyndham Marco Polo Beach Resort",
"address": "19201 Collins Avenue",
"city": "Sunny Isles Beach (Florida)",
"review_score": 6.2,
"review_count": "1788",
"image": "https://cf.bstatic.com/xdata/images/hotel/square600/414501733.webp?k=4c14cb1ec5373f40ee83d901f2dc9611bb0df76490f3673f94dfaae8a39988d8&o=",
"final_price": 217,
"original_price": 217,
"currency": "USD",
"tax_description": null,
"nb_livingrooms": 0,
"nb_kitchens": 0,
"nb_bedrooms": 0,
"nb_all_beds": 2,
"full_location": {
"description": "This is the straight-line distance on the map. Actual travel distance may vary.",
"main_distance": "11.4 miles from downtown",
"display_location": "Miami Beach",
"beach_distance": "Beachfront",
"nearby_beach_names": []
},
"no_prepayment": false,
"free_cancellation": true,
"property_sustainability": {
"is_sustainable": false,
"level_id": "L0",
"facilities": [
"436",
"490",
"492",
"496",
"506"
]
},
"timestamp": "2025-01-07T16:43:24.954Z"
},
AIOHTTP
With AIOHTTP, we can make this process quite a bit faster. We can actually trigger, poll, and download multiple datasets simultaneously. The code below builds on our concepts from the Requests example above, but instead uses the powerful aiohttp.ClientSession()
to make multiple requests asynchronously.
import aiohttp
import asyncio
import json
async def get_bookings(api_key, location, dates):
url = "https://api.brightdata.com/datasets/v3/trigger"
dataset_id = "gd_m4bf7a917zfezv9d5"
endpoint = f"{url}?dataset_id={dataset_id}&include_errors=true"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
payload = [
{
"url": "https://www.booking.com",
"location": location,
"check_in": dates["check_in"],
"check_out": dates["check_out"],
"adults": 2,
"rooms": 1
}
]
async with aiohttp.ClientSession(headers=headers) as session:
async with session.post(endpoint, json=payload) as response:
if response.status == 200:
response_data = await response.json()
print(f"Request successful for location: {location}. Response:")
print(json.dumps(response_data, indent=4))
return response_data["snapshot_id"]
else:
print(f"Error for location: {location}. Status: {response.status}")
print(await response.text())
return None
async def poll_and_retrieve_snapshot(api_key, snapshot_id, output_file):
snapshot_url = f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
headers = {
"Authorization": f"Bearer {api_key}"
}
print(f"Polling snapshot for ID: {snapshot_id}...")
async with aiohttp.ClientSession(headers=headers) as session:
while True:
async with session.get(snapshot_url) as response:
if response.status == 200:
print(f"Snapshot for {output_file} is ready. Downloading...")
snapshot_data = await response.json()
# Save snapshot data to a file
with open(output_file, "w", encoding="utf-8") as file:
json.dump(snapshot_data, file, indent=4)
print(f"Snapshot saved to {output_file}")
break
elif response.status == 202:
print(f"Snapshot for {output_file} is not ready yet. Retrying in 10 seconds...")
else:
print(f"Error polling snapshot for {output_file}. Status: {response.status}")
print(await response.text())
break
await asyncio.sleep(10)
async def process_location(api_key, location, dates):
snapshot_id = await get_bookings(api_key, location, dates)
if snapshot_id:
output_file = f"snapshot-{location.replace(' ', '_').lower()}.json"
await poll_and_retrieve_snapshot(api_key, snapshot_id, output_file)
async def main():
api_key = "your-bright-data-api-key"
locations = ["Miami", "Key West"]
dates = {
"check_in": "2025-02-01T00:00:00.000Z",
"check_out": "2025-02-02T00:00:00.000Z"
}
# Process all locations in parallel
tasks = [process_location(api_key, location, dates) for location in locations]
await asyncio.gather(*tasks)
if __name__ == "__main__":
asyncio.run(main())
- Both
get_bookings()
andpoll_and_retrieve_snapshot()
now use ouraiohttp.ClientSession
object to create async requests to the server. process_location()
is used to process all data for a location.main()
allows us to callprocess_location()
on all locations simultaneously.
With AIOHTTP, you can trigger, poll, and download multiple datasets at the same time. This way, you don’t need to wait unnecessarily for one report to complete before generating the next.
Take a look at the output. As you can see, we trigger both reports. Then we download one report while still waiting for the other. At scale, this will save you an incredible amount of time.
Request successful for location: Miami. Response:
{
"snapshot_id": "s_m5mtmtv62hwhlpyazw"
}
Request successful for location: Key West. Response:
{
"snapshot_id": "s_m5mtmtv72gkkgxvdid"
}
Polling snapshot for ID: s_m5mtmtv62hwhlpyazw...
Polling snapshot for ID: s_m5mtmtv72gkkgxvdid...
Snapshot for snapshot-miami.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-key_west.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-key_west.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-miami.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-key_west.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-miami.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-miami.json is ready. Downloading...
Snapshot for snapshot-key_west.json is not ready yet. Retrying in 10 seconds...
Snapshot saved to snapshot-miami.json
Snapshot for snapshot-key_west.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-key_west.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-key_west.json is not ready yet. Retrying in 10 seconds...
Snapshot for snapshot-key_west.json is ready. Downloading...
Snapshot saved to snapshot-key_west.json
Bright Data’s Alternative Solutions
Beyond our powerful Web Scraper APIs, Bright Data provides ready-to-use datasets tailored to meet diverse needs. Among our most sought-after travel datasets are:
With Bright Data, you can choose between fully managed or self-managed custom datasets, allowing you to extract data from any public website and customize it to your exact specifications.
Conclusion
When scraping the web, you can find a treasure trove of hotel information from Google Travel. Whether you prefer the DIY model with Selenium, or you just want quick and convenient results with the Booking.com API, you can harvest this data to gain some really valuable insights. Whether you want to analyze historical prices, or just shop for a room efficiently, you’ve just added another useful skill to your tech stack!
Sign up now to try Bright Data’s products for free.
No credit card required