How To Scrape Booking.com With Python

Learn to build a Python scraper for Booking.com to extract hotel data, reviews, and prices efficiently.
19 min read
How to Scrape Booking.com With Python blog image

In this tutorial, you will see:

  • The definition of a Booking scraper
  • What data you can extract with it
  • How to build a Booking.com scraping script with Python

Let’s dive in!

What Is a Booking Scraper?

Booking.com scraper is a tool to automatically extract data from Booking.com pages. It enables you to retrieve information from property listing pages, such as hotel names, prices, reviews, ratings, amenities, and availability. This data can be used for various purposes, including market analysis, price comparison, and building travel-related datasets.

Data You Can Scrape From Booking.com

Below is a list of data points you can retrieve from Booking.com:

  • Property details: Hotel name, address, distance from landmarks (e.g., city center, downtown, etc.)
  • Pricing information: Regular price, discounted price (if available)
  • Reviews and ratings: Review score, number of reviews, guest feedback
  • Availability: Room types available, booking options (e.g., free cancellation, breakfast included), dates with availability
  • Media: Property images, room images
  • Amenities: Facilities offered (e.g., Wi-Fi, parking, pool), room-specific amenities
  • Promotions: Special offers or discounts, limited-time deals
  • Policies: Cancellation policy, check-in and check-out times
  • Additional details: Property description, nearby attractions, number of rooms available for specific dates

Scraping Booking.com in Python: Step-By-Step Guide

In this guided section, you will learn how to build a Booking.com scraper.

The objective is to create a Python script that automatically gathers data from the property listing page:

Property listing page example


Follow the steps below!

Step #1: Project Setup

Before getting started, make sure you have Python 3 installed on your machine. If it is not, download it, launch the executable, and follow the installation wizard.

Now, use the commands below to create a folder for your project:

mkdir booking-scraper

The booking-scraper directory represents the project folder of your Python Booking.com scraping script.

Enter it, and initialize a virtual environment within it:

cd booking-scraper
python -m venv env

Load the project folder in your favorite Python IDE. Visual Studio Code with the Python extension and PyCharm Community Edition are both great choices.

Create a scraper.py file in the project’s folder, which should contain this file structure:

A scraper.py file in the project folder

scraper.py is now a blank Python script, but it will soon contain the scraping logic.

In the IDE’s terminal, activate the virtual environment. To do that, in Linux or macOS, execute this command:

./env/bin/activate

Equivalently, on Windows, run:

env/Scripts/activate

Amazing, you now have a Python environment for web scraping!

Step #2: Select the Scraping Library

It is time to determine whether Booking.com is a static or dynamic site and select the appropriate scraping library accordingly. This can be done by inspecting the site’s behavior. Start by opening Booking.com in your browser. Perform a search and navigate to the property page:

search and navigate to the property page

Notice that the page loads new data dynamically as you scroll down:

New data loaded dynamically as you scroll down


That pattern is known as infinite scrolling and is a hallmark of dynamic sites. Learn more on how to perform web scraping on dynamic sites.

Without even diving into the HTML code of the document returned by the server or inspecting the Network tab in DevTools (two common steps for understanding whether a site is static or not), we can already conclude that Booking.com is a dynamic site.

The best approach to scraping a dynamic-content site is to use a browser automation tool. These solutions let you control a browser and perform specific interactions on the page to extract data effectively.

One of the most powerful browser automation tools for Python is Selenium, making it an excellent choice for scraping Booking.com. Get ready to install it, as it will be the primary library for this task!

Step #3: Install and Configure Selenium

In Python, Selenium is available through the selenium pip package. In an activated Python virtual environment, install it with this command:

pip install selenium

For guidance on how to use the tool, read our tutorial on web scraping with Selenium.

Import Selenium in scraper.py and initialize a WebDriver object to control a Chrome instance:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# create a Chrome web driver instance
driver = webdriver.Chrome(service=Service())

The code above initializes a Chrome WebDriver instance to control a Chrome browser. Note that Booking.com appears to use anti-scraping technology that blocks headless browsers. Therefore, avoid setting the --headless flag. As an alternative solution, read our guide on Playwright Stealth.

As the last line of your scraper, remember to close the web driver:

driver.quit()

Wonderful! You are now fully configured to start scraping Booking.com.

Step #4: Visit the Target Page

Booking.com pages offer numerous interactive features to refine your search:

interactive features to refine your search

Simulating all these interactions programmatically with Selenium would be complex and time-consuming. So, to simplify and speed things up, perform the interactions manually in your browser first.

Once you have configured a search query of interest, copy the resulting page URL from your browser’s address bar:

For example, the above URL represents a search for New York apartments from November 18 to December 18 for two adults.

Copy the URL and feed it to the get() method offered by Selenium:

driver.get("https://www.booking.com/searchresults.html?ss=New+York&ssne=New+York&ssne_untouched=New+York&label=gen173nr-1FCAEoggI46AdIM1gEaHGIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4Aof767kGwAIB0gIkNGE2MTI1MjgtZjJlNC00YWM4LWFlMmQtOGIxZjM3NWIyNDlm2AIF4AIB&sid=b91524e727f20006ae00489afb379d3a&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=20088325&dest_type=city&checkin=2024-11-18&checkout=2024-12-18&group_adults=2&no_rooms=1&group_children=0")

Your scraping script will automatically connect to the desired Booking.com page.

The scraper.py file will now contain these lines of code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# create a Chrome web driver instance
driver = webdriver.Chrome(service=Service())

# connect to the target page
driver.get("https://www.booking.com/searchresults.html?ss=New+York&ssne=New+York&ssne_untouched=New+York&label=gen173nr-1FCAEoggI46AdIM1gEaHGIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4Aof767kGwAIB0gIkNGE2MTI1MjgtZjJlNC00YWM4LWFlMmQtOGIxZjM3NWIyNDlm2AIF4AIB&sid=b91524e727f20006ae00489afb379d3a&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=20088325&dest_type=city&checkin=2024-11-18&checkout=2024-12-18&group_adults=2&no_rooms=1&group_children=0")

# scraping logic...

# close the web driver and release its resources
driver.quit()

Place a debugging breakpoint on the final line, and run the script. Below is what you should be seeing:

Example of what you should see after you run the script


The “Chrome is being controlled by automated test software.” message certifies that Selenium is operating on Chrome as desired. Well done!

Step #5: Deal With the Login Alert

When you first visit Booking.com in a browser, the site often displays a sign-in alert within the first 20 seconds. That blocks access to the page’s content, making web scraping more difficult:

a sign-in alert within the first 20 seconds


Until you interact with it, you will not be able to access the content on the underlying page.

To handle the alert, close it using Selenium. Right-click on the close button and select the “Inspect” option from the context menu:

Right-click on the close button


Note that you can close the modal by selecting the button with the following CSS selector:

[role="dialog"] button[aria-label="Dismiss sign-in info."]

Now, instruct Selenium to wait up to 10 seconds for the alert to appear. Once it shows up, close it by clicking the dismiss button. Since the modal may not always appear, it makes sense to wrap this logic in a try...except block:

try:
    # wait up to 20 seconds for the sign-in alert to appear
    close_button = WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "[role=\"dialog\"] button[aria-label=\"Dismiss sign-in info.\"]"))
    )
    # click the close button
    close_button.click()
except TimeoutException:
    print("Sign-in modal did not appear, continuing...")

WebDriverWait is a specialized Selenium class that pauses the script until a specified condition on the page is met. In the example above, it waits up to 10 seconds for the alert close button to be on the page.

If the alert does not appear, Selenium raises the TimeoutException exception. Import it together with WebDriverWaitEC, and By as below:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

Great! The login alert is no longer a problem.

Step #6: Select the Booking.com Items

Note that the Booking.com page to scrape contains several items. Since you want to scrape them all, initialize an array where to store the scraped data:

items = []

Now, you need to understand how to select the HTML elements associated with those items. Open Booking.com in your browser, perform a search, and inspect one of the property items:

Open Booking.com in your browser, perform a search, and inspect one of the property items

Notice that the classes of the HTML elements appear to be randomly generated. This means they are likely to change with every site deployment, making them unreliable for element selection. Instead, focus on more stable attributes like data-testid.

data-* attributes are excellent targets for web scraping.

Use the Selenium find_elements() method to apply a CSS selector on the page and select the elements of interest:

property_items = driver.find_elements(By.CSS_SELECTOR, "[data-testid=\"property-card\"]")

Iterate over the property items and prepare your Booking.com scraper to extract some data:

for property_item in property_items:
    # scraping logic...

Gorgeous! The next step is to scrape data from these elements.

Step #7: Scrape the Booking.com Items

Take a look at the property items on the page and notice that the elements they contain are inconsistent:

the property items on the page

Some have a review score, while others do not. Again, some have a discounted price, while others do not.

These differences make it difficult to write a consistent scraping logic for all property items. When you try to select an element that is not on the page, Selenium raises a NoSuchElementException. So, it makes sense to define a function to handle that scenario:

def handle_no_such_element_exception(data_extraction_task):
    try:
        return data_extraction_task()
    except NoSuchElementException as e:
        return None

The function above accepts a lambda function and attempts to execute it. If it raises a NoSuchElementException, it catches the exception and returns None. This allows your Booking.com scraping script to continue without breaking.

Import NoSuchElementException:

from selenium.common import NoSuchElementException

Inspect a property item that contains all the elements (review score, discounted price, and so on):

Inspecting a property item

Note that you can extract:

  • The property link from a[data-testid="property-card-desktop-single-image"]
  • The property image from img[data-testid=image]

In the for loop, apply the current logic to select those elements and extract data from them:

url = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "a[data-testid=\"property-card-desktop-single-image\"]").get_attribute("href"))
image = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "img[data-testid=\"image\"]").get_attribute("src"))

find_element() selects a single node on the page, while get_attribute() gets the content inside the specified HTML attribute. Note that the data extraction instructions are wrapped by handle_no_such_element_exception to handle NoSuchElementExceptions.

Similarly, focus on the information in the title section and right below it:

focus on the information in the title section and below it

Here, you can get:

  • The property title from [data-testid="title"]
  • The property address from [data-testid="address"]
  • The property distance from [data-testid="distance"]

Scrape them all with:

title = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"title\"]").text)
address = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"address\"]").text)
distance = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"distance\"]").text)

The text attribute contains the text inside the selected elements.

Next, focus on the review score node:

review score node

Select it with data-testid="review-score" and extract its text. Consider that the text has a special format, as in this example:

'Scored 8.4\n8.4\nVery Good\n120 reviews'

With some custom logic, you can extract the review score and review count from it:

review_score = None
review_count = None
review_text = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"review-score\"]").text)
if review_text is not None:
  # split the review string by \n
  parts = review_text.split("\n")

  # process each part
  for part in parts:
      part = part.strip()
      # check if this part is a number (potential review score)
      if part.replace(".", "", 1).isdigit():
          review_score = float(part)
      # check if it contains the "reviews" string
      elif "reviews" in part:
          # extract the number before "reviews"
          review_count = int(part.split(" ")[0].replace(",", ""))

Target the description element:

The description element

Select it with data-testid="recommended-units" and scrape the description:

description = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"recommended-units\"]").text)

Lastly, focus on the price elements:

The price element

From the data-testid="availability-rate-information" element, select:

  • The original price from the node that has the aria-hidden="true" attribute and does not have the data-testid attribute
  • The discounted/current price from data-testid="price-and-discounted-price"

Write the price extraction logic as below:

price_element = handle_no_such_element_exception(lambda: (property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"availability-rate-information\"]")))
if price_element is not None:
    original_price = handle_no_such_element_exception(lambda: (
        price_element.find_element(By.CSS_SELECTOR, "[aria-hidden=\"true\"]:not([data-testid])").text.replace(",", "")
    ))
    price = handle_no_such_element_exception(lambda: (
        price_element.find_element(By.CSS_SELECTOR, "[data-testid=\"price-and-discounted-price\"]").text.replace(",", "")
    ))

Wow! The Booking.com scraping logic is almost complete.

Step #7: Collect the Scraped Data

You now have the scraped data spread across several variables within the for loop. Create a new item object, populate it with that data, and append it to the items array:

item = {
  "url": url,
  "image": image,
  "title": title,
  "address": address,
  "distance": distance,
  "review_score": review_score,
  "review_count": review_count,
  "description": description,
  "original_price": original_price,
  "price": price
}
items.append(item)

At the end of the for loop, items will contain all your scraping data. Verify that by printing items:

print(items)

This will produce an output as follows:

[{'url': 'https://www.booking.com/hotel/us/murray-hill-east-manhattan.html?label=gen173nr-1FCAEoggI46AdIM1gEaHGIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4Aof767kGwAIB0gIkNGE2MTI1MjgtZjJlNC00YWM4LWFlMmQtOGIxZjM3NWIyNDlm2AIF4AIB&sid=b91524e727f20006ae00489afb379d3a&aid=304142&ucfs=1&arphpl=1&checkin=2024-11-18&checkout=2024-12-18&dest_id=20088325&dest_type=city&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&hpos=1&hapos=1&sr_order=popularity&srpvid=c6926559ebaa0862&srepoch=1731939905&all_sr_blocks=5604802_204869446_2_0_0&highlighted_blocks=5604802_204869446_2_0_0&matching_block_id=5604802_204869446_2_0_0&sr_pri_blocks=5604802_204869446_2_0_0__523000&from=searchresults', 'image': 'https://cf.bstatic.com/xdata/images/hotel/square600/84564452.webp?k=ff50b7387e08e01ba7a400effa788e668f894cabe4a295f60d6cd018ec9ac4d0&o=', 'title': 'Murray Hill East Suites', 'address': 'Murray Hill, New York', 'distance': '1.3 miles from downtown', 'review_score': 8.2, 'review_count': 54, 'description': 'Studio\nEntire studio • 1 bathroom • 1 kitchen • 398 ft²\nMultiple bed types', 'original_price': None, 'price': '$5230'}, 
# omitted for brevity...
, {'url': 'https://www.booking.com/hotel/us/renaissance-times-square.html?label=gen173nr-1FCAEoggI46AdIM1gEaHGIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4Aof767kGwAIB0gIkNGE2MTI1MjgtZjJlNC00YWM4LWFlMmQtOGIxZjM3NWIyNDlm2AIF4AIB&sid=b91524e727f20006ae00489afb379d3a&aid=304142&ucfs=1&arphpl=1&checkin=2024-11-18&checkout=2024-12-18&dest_id=20088325&dest_type=city&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&hpos=12&hapos=12&sr_order=popularity&srpvid=c6926559ebaa0862&srepoch=1731939905&all_sr_blocks=2315604_274565698_0_2_0&highlighted_blocks=2315604_274565698_0_2_0&matching_block_id=2315604_274565698_0_2_0&sr_pri_blocks=2315604_274565698_0_2_0__1805400&from_sustainable_property_sr=1&from=searchresults', 'image': 'https://cf.bstatic.com/xdata/images/hotel/square600/437371642.webp?k=d1a06036e365573e326e6b0f1b045f8f43b6ad0d18e119cfb92d92cc81fa5c88&o=', 'title': 'Renaissance New York Times Square by Marriott', 'address': 'Manhattan, New York', 'distance': '0.6 miles from downtown', 'review_score': 8.4, 'review_count': 2209, 'description': 'King Room\n1 king bed', 'original_price': '$20060', 'price': '$18054'}]

Fantastic! It only remains to export this information to a human-readable file like CSV.

Step #8: Export to CSV

Import the csv package from the Python standard library:

import csv

Then, use it to export items to a CSV file:

# specify the name of the output CSV file
output_file = "properties.csv"

# export the items list to a CSV file
with open(output_file, mode="w", newline="", encoding="utf-8") as file:
    #create a CSV writer object
    writer = csv.DictWriter(file, fieldnames=["url", "image", "title", "address", "distance", "review_score", "review_count", "description", "original_price", "price"])

    # write the header row
    writer.writeheader()

    # write each item as a row in the CSV
    writer.writerows(items)

This snippet populates a CSV file named properties.csv using data from the items arrays. Key functions utilized above are:

  • open(): Open the specified file in write mode with UTF-8 encoding.
  • csv.DictWriter(): Create a CSV writer with the given field names.
  • writeheader(): Write the header row to the CSV file based on the specified field names.
  • writer.writerow(): Write each dictionary item as a row in the CSV.

Step #9: Put It All Together

scraper.py should now contain these lines:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.common import NoSuchElementException

import csv

def handle_no_such_element_exception(data_extraction_task):
    try:
        return data_extraction_task()
    except NoSuchElementException as e:
        return None

# create a Chrome web driver instance
driver = webdriver.Chrome(service=Service())

# connect to the target page
driver.get("https://www.booking.com/searchresults.html?ss=New+York&ssne=New+York&ssne_untouched=New+York&label=gen173nr-1FCAEoggI46AdIM1gEaHGIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4Aof767kGwAIB0gIkNGE2MTI1MjgtZjJlNC00YWM4LWFlMmQtOGIxZjM3NWIyNDlm2AIF4AIB&sid=b91524e727f20006ae00489afb379d3a&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=20088325&dest_type=city&checkin=2024-11-18&checkout=2024-12-18&group_adults=2&no_rooms=1&group_children=0")

# handle the sign-in alert
try:
    # wait up to 20 seconds for the sign-in alert to appear
    close_button = WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "[role=\"dialog\"] button[aria-label=\"Dismiss sign-in info.\"]"))
    )
    # click the close button
    close_button.click()
except e:
    print("Sign-in modal did not appear, continuing...")

# where to store the scraped data
items = []

# select all property items on the page
property_items = driver.find_elements(By.CSS_SELECTOR, "[data-testid=\"property-card\"]")

# iterate over the property items and
# extract data from them
for property_item in property_items:
    # scraping logic...
    url = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "a[data-testid=\"property-card-desktop-single-image\"]").get_attribute("href"))
    image = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "img[data-testid=\"image\"]").get_attribute("src"))

    title = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"title\"]").text)
    address = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"address\"]").text)
    distance = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"distance\"]").text)

    review_score = None
    review_count = None
    review_text = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"review-score\"]").text)
    if review_text is not None:
      # split the review string by \n
      parts = review_text.split("\n")

      # process each part
      for part in parts:
          part = part.strip()
          # check if this part is a number (potential review score)
          if part.replace(".", "", 1).isdigit():
              review_score = float(part)
          # check if it contains the "reviews" string
          elif "reviews" in part:
              # extract the number before "reviews"
              review_count = int(part.split(" ")[0].replace(",", ""))

    decription = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"recommended-units\"]").text)

    price_element = handle_no_such_element_exception(lambda: (property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"availability-rate-information\"]")))
    if price_element is not None:
        original_price = handle_no_such_element_exception(lambda: (
            price_element.find_element(By.CSS_SELECTOR, "[aria-hidden=\"true\"]:not([data-testid])").text.replace(",", "")
        ))
        price = handle_no_such_element_exception(lambda: (
            price_element.find_element(By.CSS_SELECTOR, "[data-testid=\"price-and-discounted-price\"]").text.replace(",", "")
        ))

    # populate a new item with the scraped data
    item = {
      "url": url,
      "image": image,
      "title": title,
      "address": address,
      "distance": distance,
      "review_score": review_score,
      "review_count": review_count,
      "decription": decription,
      "original_price": original_price,
      "price": price
    }
    # add the new item to the list of scraped items
    items.append(item)

# specify the name of the output CSV file
output_file = "properties.csv"

# export the items list to a CSV file
with open(output_file, mode="w", newline="", encoding="utf-8") as file:
    #create a CSV writer object
    writer = csv.DictWriter(file, fieldnames=["url", "image", "title", "address", "distance", "review_score", "review_count", "decription", "original_price", "price"])

    # write the header row
    writer.writeheader()

    # write each item as a row in the CSV
    writer.writerows(items)

# close the web driver and release its resources
driver.quit()

Can you believe it? In just around 110 lines, you just built a Python Booking.com scraper.

Verify that it works by executing the scraping script. On Windows, run the scraper with:

python scraper.py

Equivalently, on Linux or macOS, execute:

python3 scraper.py

Wait for the script to finish running. A properties.csv file will appear in your project’s root directory. Open the file to view the extracted data:

A properties.csv file with all the extracted data

Congrats, mission complete!

Conclusion

In this tutorial, you learned what a Booking.com scraper is and how to build one using Python. As shown, creating a basic script to automatically retrieve data from Booking.com requires only a few lines of code.

However, the example presented here did not address many of the challenges you may encounter when scraping Booking.com. Issues like anti headless browser measures, handling user interactions to generate search results, and dealing with infinite scrolling can quickly complicate your scraping operations.

Looking for an easier, full-featured, powerful scraping solution? Try Bright Data’s Booking Scraper API!

The Booking Scraper API provides powerful endpoints to scrape public hotel data, reviews, ratings, and more. With simple API calls, you can retrieve data in JSON or HTML formats.

Prefer pre-built solutions? Bright Data also offers ready-to-use Booking.com datasets!

Create a free Bright Data account today to try our scraper APIs or explore our datasets.

No credit card required