How To Scrape Tripadvisor With Python

Discover how to scrape Tripadvisor hotel data with Selenium, navigate blocking challenges, and efficiently save results in a CSV format.
16 min read
How to Scrape Tripadvisor blog image

For almost 25 years, Tripadvisor has been a great place to discover all sorts of travel destinations on the web. Today, we’re going to scrape hotel data from Tripadvisor. Tripadvisor employs a variety of techniques to block web scrapers such as:

  • JavaScript Challenges
  • Browser Fingerprinting
  • Dynamic Page Content

Follow our guide below, and by the end, you’ll be scraping Tripadvisor with ease.


Prerequisites

Tripadvisor uses a variety of blocking techniques. For simplicity, we break them down in the list below.

  • JavaScript Challenge: Tripadvisor sends simple challenge (in JavaScript) to your browser in the form of a CAPTCHA, if your browser can’t solve it, it’s likely a bot.
  • Browser Fingerprinting: They send a cookie to your browser, then they track you with it.
  • Dynamic Content: We initially get a blank page. Then, it makes a series of API calls to fetch and render our data.

Python Requests and BeautifulSoup simply won’t get the job done. We need an actual browser. With Selenium, we use webdriver to actually control our browser from inside a Python script. Selenium comes prepacked everything we need. Learn more about web scraping with Selenium here.

Let’s install Selenium. You should also make sure you have webdriver installed. You can find the latest version of webdriver here. You should make sure that your version of Chromedriver matches your version of Chrome.

You can check your version number with the following command. Make sure that it matches your Chromedriver version.

google-chrome --version

It should give an output similar to this.

Google Chrome 130.0.6723.116

Then we can install Selenium with the following command.

pip install selenium

With Selenium installed, we don’t need to install anything else. Selenium will handle all of our scraping related needs. All of our other packages in this tutorial come pre-packed with Python.


What to Scrape from Tripadvisor

Let’s take a look at exactly how we’ll scrape hotels from Tripadvisor. When we perform a basic search for Miami on Tripadvisor, we get a page similar to what you see in the screenshot below. If you look, we’re not just getting results for hotels, we’re getting results for all categories.

Results from all the categories

Take a closer look at the url for this page: https://www.tripadvisor.com/Search?q=miami&geo=1&ssrc=a&searchNearby=false&searchSessionId=001f4b791e61703a.ssid&offset=0. Now, we’ll click Hotels and examine our url: https://www.tripadvisor.com/Search?q=miami&geo=1&ssrc=h&searchNearby=false&searchSessionId=001f4b791e61703a.ssid&offset=0. The urls still look really similar. Below, we’ll take a look at these urls, but we’ll remove the unnecessary portions.

  • All results: https://www.tripadvisor.com/Search?q=miami&geo=1&ssrc=a
  • Hotels: https://www.tripadvisor.com/Search?q=miami&geo=1&ssrc=h.

ssrc is the query we use to select our results. ssrc=a is used for All results. ssrc=h is used for Hotels. If you click this link https://www.tripadvisor.com/Search?q=miami&geo=1&ssrc=h you should get a page similar to what you see in the screenshot below.

Hotel results on Tripadvisor

Now, we just need to figure out which elements we want to locate. If you inspect these elements, you should notice that each result has a data-test-attribute of "location-results-card". This is really important. We can use this to write our CSS selector: div[data-test-attribute='location-results-card']. When we scrape the actual page, we’ll look for all elements on the page that match this selector.

Inspecting one of the hotels from the search

Scrape Tripadvisor With Vanilla Selenium

Now, we’re going to try scraping Tripadvisor using plain old Selenium. We’ll write a script that’s pretty simple overall. We really only need two functions. We need one to perform our scrape, and one to write our data to CSV. Once we’ve got these, we’ll put everything together into a fully functioning script.

Take a look at write_to_csv(). It takes two arguments, data, and page_number. data can be either a dict or an array of dict objects that we want to write. page_number is used to write our filename. We use Path(filename).exists() to check if our file exists. mode is the mode we use to open the file. If the file exists, we set our mode to "a", or append. If the file doesn’t exist, we leave our mode as the default, "w", write. These two modes ensure that we always have a file and that the existing file won’t get overwritten.

Individual Functions

def write_to_csv(data, page_number):
    if type(data) != list:
        data = [data]
    print("Writing to CSV...")
    filename = f"tripadvisor-{page_number}.csv"
    mode = "w"
    if Path(filename).exists():
        mode = "a"
    print("Writing data to CSV File...")
    with open(filename, mode) as file:
        writer = csv.DictWriter(file, fieldnames=data[0].keys())
        if mode == "w":
            writer.writeheader()
        writer.writerows(data)
    print(f"Successfully wrote {page} to CSV...")
  • At the beginning of the function, we check if our data is a list. If it is not, we convert it to one.
  • f"tripadvisor-{page_number}.csv" builds our filename.
  • Our default mode is "w", but if the file exists, we change our mode to "a".
  • csv.DictWriter(file, fieldnames=data[0].keys()) initializes our file writer.
  • If we’re in write mode, we write use the keys of our first object for our headers. If we’re appending the file, we don’t need to do this.
  • After the file has been setup, we use writer.writerows(data) to write our data to a CSV file.

Now, let’s take a look at our scraping function. This function takes only one argument, our page_number… pretty self-explanatory. We start by setting up some custom ChromeOptions. We add arguments to make our browser headless and to use a fake user agent. This should hopefully mask our browser well enough that Tripadvisor lets us through. We then use webdriver to launch our browser and navigate to the search results page. We use sleep(5) so we can wait 5 seconds for the content to load, and this also makes us look more like a regular user. We use the CSS selector we mentioned earlier in the What To Scrape section. If we have no hotel_cards, we take a screenshot and exit the function early. If we do have hotel_cards, we extract their data and add it to our scraped_data array. Once we’re finished scraping the data, we write it all to CSV.

def scrape_page(page_number: int): 
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument(f"--user-agent={USER_AGENT}")
    print("Connecting to Scraping Browser...")  
    scraped_data = []
    print("-------------------------------")
    driver = webdriver.Chrome(options=options)  
    driver.get(f"https://www.tripadvisor.com/Search?q=Miami&geo=1&ssrc=h&offset={page_number*30}")
    print("Connected! Scraping page...")  
    sleep(5)
        
    hotel_cards = driver.find_elements(By.CSS_SELECTOR, "div[data-test-attribute='location-results-card']")
    if not hotel_cards:
        driver.save_screenshot("error.png")
        driver.quit()
        return
    for index, card in enumerate(hotel_cards):
        score = None
        divs = card.find_elements(By.CSS_SELECTOR, "div")
        for div in divs:
            aria_label = div.get_attribute("aria-label")
            if aria_label:
                if "bubbles" in aria_label:
                    score = aria_label
                    break
        data_array = card.text.split("\n")
        hotel_dict = {
            "name": data_array[1],
            "reviews": int(data_array[2].replace(",", "")),
            "score": float(aria_label[0:3]),
            "location": data_array[3],
            "location_mentions": data_array[4].split(" ")[0],
            "review_summary": data_array[5]
        }
        scraped_data.append(hotel_dict)
        print(f"Successfully scraped card {index}")
        
    print(f"Scraped page {page_number}")
    write_to_csv(scraped_data, page_number)

Scrape Tripadvisor Data

When we put everything together, we get a script like this. Feel free to copy and paste the code below into your own Python file.

from selenium import webdriver 
from selenium.webdriver.common.by import By
from time import sleep
import csv
from pathlib import Path

USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36"

def write_to_csv(data, page_number):
    if type(data) != list:
        data = [data]
    print("Writing to CSV...")
    filename = f"tripadvisor-{page_number}.csv"
    mode = "w"
    if Path(filename).exists():
        mode = "a"
    print("Writing data to CSV File...")
    with open(filename, mode) as file:
        writer = csv.DictWriter(file, fieldnames=data[0].keys())
        if mode == "w":
            writer.writeheader()
        writer.writerows(data)
    print(f"Successfully wrote {page} to CSV...")

  
def scrape_page(page_number: int): 
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument(f"--user-agent={USER_AGENT}")
    print("Connecting to Scraping Browser...")  
    scraped_data = []
    print("-------------------------------")
    driver = webdriver.Chrome(options=options)  
    driver.get(f"https://www.tripadvisor.com/Search?q=Miami&geo=1&ssrc=h&offset={page_number*30}")
    print("Connected! Scraping page...")  
    sleep(5)
        
    hotel_cards = driver.find_elements(By.CSS_SELECTOR, "div[data-test-attribute='location-results-card']")
    if not hotel_cards:
        driver.save_screenshot("error.png")
        driver.quit()
        return
    for index, card in enumerate(hotel_cards):
        score = None
        divs = card.find_elements(By.CSS_SELECTOR, "div")
        for div in divs:
            aria_label = div.get_attribute("aria-label")
            if aria_label:
                if "bubbles" in aria_label:
                    score = aria_label
                    break
        data_array = card.text.split("\n")
        hotel_dict = {
            "name": data_array[1],
            "reviews": int(data_array[2].replace(",", "")),
            "score": float(aria_label[0:3]),
            "location": data_array[3],
            "location_mentions": data_array[4].split(" ")[0],
            "review_summary": data_array[5]
        }
        scraped_data.append(hotel_dict)
        print(f"Successfully scraped card {index}")
        
    print(f"Scraped page {page_number}")
    write_to_csv(scraped_data, page_number)

  
if __name__ == '__main__':

    PAGES = 1
    for page in range(PAGES):
        scrape_page(page)

When we run this code, more often than not, we get a block screen or a CAPTCHA like you see in the following screenshot.

A CATPCHA block screen

Advanced Techniques

Below are some of the more advanced techniques used in our script. Mainly, we’ll go over how pagination is handled and some techniques to prevent from getting blocked.

Handling Pagination

Take a look at the url we use: https://www.tripadvisor.com/Search?q=Miami&geo=1&ssrc=h&offset={page_number*30}. Our pagination is handled with the offset param. We get 30 results per page. page_number*30 multiplies our page number by the results per page (30). Page 0 will yield results 1 through 30. Page 2 holds the results for 31 through 60…and so on and so forth.

Take a closer look at our main as well. PAGES holds the number of pages that we’d like to scrape. If you’d like to scrape the first five pages of data, simply change PAGES = 1 to PAGES = 5.

if __name__ == '__main__':

    PAGES = 1
    for page in range(PAGES):
        scrape_page(page)

Mitigate Blocking

With Vanilla Selenium, we use a couple of techniques to help prevent us from getting blocked. We use both a fake user agent and we sleep(5). This sleep function allows the page to load and it also spaces out our requests when we’re scraping multiple pages.

Here is our user agent. This tells Tripadvisor that our browser is compatible with Chrome 130.0.0.0 and Safari 537.36. When Tripadvisor reads this, their server will send us back a page compatible with these browsers.

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36

However, it’s still possible to be spotted and for your scraper to get blocked. To consistently get past their blocking, we need something a little stronger than Vanilla Selenium.

A message that says that you have been blocked

Consider Using Bright Data

Bright Data has all sorts of solutions to get us past the blocks we ran into with Vanilla Selenium. Scraping Browser gives us the power to run a remote instance of Selenium using nothing but the best proxies from Bright Data. First, we’ll go over the process of getting signed up. Then, we’ll tweak our earlier script to run with Scraping Browser.

Creating An Account

First, head on over to our scraping browser page. Go ahead and click Start free trial. You can create an account using Google, Github, or your email address.

The Scraping Browser page on Bright Data's website

Once you’ve created your account, you’ll be taken to the dashboard. Go ahead and click on Add.

Click "add" at the Proxies and Scraping Infrastructure dashboard

You should see a dropdown similar to the image below. Click on Scraping Browser.

From the dropdown list, choose "Scraping Browser"

Now, you’ll be taken to the page where you setup Scraping Browser. We’re just going with the default settings. By default, Scraping Browser comes with a built-in CAPTCHA solver.

Basic settings of the Scraping Browser

Finally, you’ll be prompted to create your Scraping Browser zone. If you are ready to try scraping browser, click Yes.

Finishing and creating the new Scraping Browser zone

If you look at the Overview of your new Scraping Browser zone, you’ll be able to get your unique username and password. You’ll need these in order to access Scraping Browser from within your Python script.

Overview of the new Scraping Browser zone you added

Extract Our Data Using Bright Data Scraping Browser

Our code example below has been modified to use Remote Webdriver with Scraping Browser. Make sure to replace YOUR_USERNAME, YOUR_ZONE_NAME, and YOUR_PASSWORD with your actual username, zone, and password!

from selenium.webdriver import Remote, ChromeOptions  
from selenium.webdriver.chromium.remote_connection import ChromiumRemoteConnection  
from selenium.webdriver.common.by import By
from time import sleep
import csv
from pathlib import Path

AUTH = "brd-customer-YOUR_USERNAME-zone-YOUR_ZONE_NAME:YOUR_PASSWORD"

SBR_WEBDRIVER = f"https://{AUTH}@zproxy.lum-superproxy.io:9515"

def write_to_csv(data, page_number):
    if type(data) != list:
        data = [data]
    print("Writing to CSV...")
    filename = f"tripadvisor-{page_number}.csv"
    mode = "w"
    if Path(filename).exists():
        mode = "a"
    print("Writing data to CSV File...")
    with open(filename, mode) as file:
        writer = csv.DictWriter(file, fieldnames=data[0].keys())
        if mode == "w":
            writer.writeheader()
        writer.writerows(data)
    print(f"Successfully wrote {page} to CSV...")

  
def scrape_page(page_number: int):  
    print("Connecting to Scraping Browser...")  
    sbr_connection = ChromiumRemoteConnection(SBR_WEBDRIVER, "goog", "chrome")  
    scraped_data = []
    print("-------------------------------")
    with Remote(sbr_connection, options=ChromeOptions()) as driver:  
        driver.get(f"https://www.tripadvisor.com/Search?q=Miami&geo=1&ssrc=h&offset={page_number*30}")
        print("Connected! Scraping page...")  
        sleep(5)
        
        hotel_cards = driver.find_elements(By.CSS_SELECTOR, "div[data-test-attribute='location-results-card']")

        if not hotel_cards:
            print("No hotel cards found! Taking a screenshot and exiting.")
            driver.get_screenshot_as_file(f"./error_screenshot_page_{page_number}.png")
            return
        for index, card in enumerate(hotel_cards):
            score = None
            divs = card.find_elements(By.CSS_SELECTOR, "div")
            for div in divs:
                aria_label = div.get_attribute("aria-label")
                if aria_label:
                    if "bubbles" in aria_label:
                        score = aria_label
                        break
            data_array = card.text.split("\n")
            hotel_dict = {
                "name": data_array[1],
                "reviews": int(data_array[2].replace(",", "")),
                "score": float(aria_label[0:3]),
                "location": data_array[3],
                "location_mentions": data_array[4].split(" ")[0],
                "review_summary": data_array[5]
            }
            scraped_data.append(hotel_dict)
            print(f"Successfully scraped card {index}")
        
    print(f"Scraped page {page_number}")
    write_to_csv(scraped_data, page_number)
        



  
if __name__ == '__main__':

    PAGES = 1
    for page in range(PAGES):
        scrape_page(page)

This example is pretty similar to our example with Vanilla Selenium, but there are a few small differences to look at here. They mainly involve that fact that we’re using remote webdriver instead of standard webdriver.

  • We setup a remote webdriver instance with our proxy connection: SBR_WEBDRIVER = f"https://{AUTH}@zproxy.lum-superproxy.io:9515".
  • Our error handling is changed slightly: driver.get_screenshot_as_file(f"./error_screenshot_page_{page_number}.png"). We now use driver.get_screenshot_as_file() instead of driver.save_screenshot().

Other than a few minor tweaks for our remote proxy connection, our code for Scraping Browser with Selenium is virtually the same as with Vanilla Selenium. The biggest difference: Scraping Browser gets our results easily.

When running this code, you may receive the error below. This can happen when dealing with a remote connection. If it does, retry the script. Sometimes, it takes multiple tries to establish a stable connection.

urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

If your script ran successfully, you should receive the output below.

Connecting to Scraping Browser...
-------------------------------
Connected! Scraping page...
Successfully scraped card 0
Successfully scraped card 1
Successfully scraped card 2
Successfully scraped card 3
Successfully scraped card 4
Successfully scraped card 5
Successfully scraped card 6
Successfully scraped card 7
Successfully scraped card 8
Successfully scraped card 9
Successfully scraped card 10
Successfully scraped card 11
Successfully scraped card 12
Successfully scraped card 13
Successfully scraped card 14
Successfully scraped card 15
Successfully scraped card 16
Successfully scraped card 17
Successfully scraped card 18
Successfully scraped card 19
Successfully scraped card 20
Successfully scraped card 21
Successfully scraped card 22
Successfully scraped card 23
Successfully scraped card 24
Successfully scraped card 25
Successfully scraped card 26
Successfully scraped card 27
Successfully scraped card 28
Successfully scraped card 29
Scraped page 0
Writing to CSV...
Writing data to CSV File...
Successfully wrote 0 to CSV...

Here is a screenshot of our CSV data using ONLYOFFICE.

Image Not ShowingPossible Reasons

  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported

Alternative Approach: Datasets

If coding a scraper isn’t your preferred approach or you require larger-scale data, consider leveraging structured Tripadvisor datasets. Our datasets provide well-organized, high-quality data tailored to your needs, allowing you to analyze travel trends, monitor competitor pricing, and optimize customer experiences effortlessly.

With a Tripadvisor dataset, you can access key data points like hotel names, reviews, ratings, amenities, prices, and more—all delivered in flexible formats (e.g., JSON, CSV, Parquet) and updated on a schedule that fits your workflow. Best of all, these datasets are 100% compliant and scalable, saving you time and resources while ensuring accuracy.

Key Benefits:

  • Access all major Tripadvisor data points without dealing with blocks.
  • Tailor datasets to your specific needs with filters and custom formatting.
  • Automate data delivery to platforms like Snowflake, S3, or Azure.

Focus on analyzing the data, not collecting it—let us handle the hard part. Explore our Tripadvisor datasets today!


Conclusion

From JavaScript challenges to fully dynamic content, Tripadvisor can be a really difficult scrape. Now that you’ve finished our guide, it should be a little easier. By this point, you should understand that you can use Selenium to control a browser both locally and using a remote session. With headless browsers (like Selenium), you also get the ability to take a screenshot of your data. This makes it much easier to debug our scraper. You know how to extract hotel data, and you know how to write a CSV file using plain old Python, without having to install anything else!

If you’re looking to scrape at scale, Bright Data has a ton of products to help with that. Scraping Browser gives you all the best tools for any scraping related task. You can control a real browser with a stable proxy connection with your choice of headless browser. You never need to worry about CAPTCHAs either!

Or, you can choose the best way of getting data – purchase a ready-to-use Tripadvisor dataset. Sign up now to start your free trial!

No credit card required