This tutorial will cover:
- Why scrape e-commerce data from the Web?
- eBay Scraping Libraries and Tools
- Scraping eBay product data with Beautiful Soup
Why Scrape E-Commerce Data From the Web?
Scraping e-commerce data allows you to retrieve useful information for different scenarios and activities. These include:
- Price monitoring: By tracking e-commerce websites, businesses can monitor the prices of products in real-time. This helps you identify price fluctuations, spot trends, and adjust your pricing strategy accordingly. If you are a consumer, this will help you find the best deals and save money.
- Competitor analysis: By gathering info about your competitors’ product offerings, prices, discounts, and promotions, you can make data-driven decisions about your own pricing strategies, product assortment, and marketing campaigns.
- Market research: E-commerce data provides valuable insights into market trends, consumer preferences, and demand patterns. You can use that info as the source of a data analysis process to study emerging trends and understand customer behavior.
- Sentiment analysis: By scraping customer reviews from e-commerce sites, you can gain insights into customer satisfaction, product feedback, and areas for improvement.
When it comes to e-commerce scraping, eBay is one of the most popular choices for at least three good reasons:
- It has a wide range of products.
- It is based on an auction and bidding system that allows you to retrieve much more data than Amazon and similar platforms.
- It has several prices for the same products (Auction + Buy It Now!)
By scraping eBay, you can access a wealth of information to support your price monitoring, comparison, or analysis strategy.
eBay Scraping Libraries and Tools
Python is considered one of the best languages for scraping thanks to its ease of use, simple syntax, and vast ecosystem of libraries. So, it will be the programming language chosen to scrape eBay. Explore our in-depth guide on how to do web scraping with Python.
You now need to choose the right scraping libraries out of the many available. To make the right decision, explore eBay in the browser. By inspecting the AJAX calls made by the page, you will notice that most of the data on the site is embedded in the HTML document returned by the server.
This means that a simple HTTP client to replicate the request to the server and an HTML parser will be enough. For this reason, we recommend:
- Requests: The most popular HTTP client library for Python. It simplifies the process of sending HTTP requests and handling their responses, making it easier to retrieve page content from web servers.
- Beautiful Soup: A full-featured HTML and XML parsing Python library. It is mostly used for web scraping as it provides powerful methods for exploring the DOM and extracting data from its elements.
Thanks to Requests and Beautiful Soup, you will be able to scrape the target site with Python. Let’s see how!
Scraping eBay Product Data With Beautiful Soup
Follow this step-by-step tutorial and learn how to build an eBay web scraping Python script.
Step 1: Getting started
To implement price scraping, you need to meet these prerequisites:
- Python 3+ installed on your computer: Download the installer, launch it, and follows the installation wizard.
- A Python IDE of your choice: Visual Studio Code with the Python extension or PyCharm Community Edition are two great choices.
Next, initialize a Python project with a virtual environment called ebay-scraper by running the commands below:
mkdir ebay-scraper
cd ebay-scraper
python -m venv env
Enter the project folder and add a scraper.py file containing the following snippet:
print('Hello, World!')
This is a sample script that only prints “Hello, World!” but it will soon contain the logic to scrape eBay.
Verify that it works by executing it with:
python scraper.py
In the terminal, you should see:
Hello, World!
Great, you now have a Python project!
Step 2: Install the scraping libraries
It is time to add libraries required to perform web scraping to your project’s dependencies. Launch the command below in the project folder to install the Beautiful Soup and Requests packages:
pip install beautifulsoup4 requests
Import the libraries in scraper.py and get ready to use them to extract data from eBay:
import requests
from bs4 import BeautifulSoup
# scraping logic...
Make sure your Python IDE does not report any error, and you are ready to implement price monitoring with scraping!
Step 3: Download the target web page
If you are an eBay user, you may have noticed that the product page URL follows the format below:
https://www.ebay.com/itm/<ITM_ID>
As you can see, it is a dynamic URL that changes based on the item ID.
For example, this is the URL of an eBay product:
https://www.ebay.com/itm/225605642071?epid=26057553242&hash=item348724e757:g:~ykAAOSw201kD1un&amdata=enc%3AAQAIAAAA4OMICjL%2BH6HBrWqJLiCPpCurGf8qKkO7CuQwOkJClqK%2BT2B5ioN3Z9pwm4r7tGSGG%2FI31uN6k0IJr0SEMEkSYRrz1de9XKIfQhatgKQJzIU6B9GnR6ZYbzcU8AGyKT6iUTEkJWkOicfCYI5N0qWL8gYV2RGT4zr6cCkJQnmuYIjhzFonqwFVdYKYukhWNWVrlcv5g%2BI9kitSz8k%2F8eqAz7IzcdGE44xsEaSU2yz%2BJxneYq0PHoJoVt%2FBujuSnmnO1AXqjGamS3tgNcK5Tqu36QhHRB0tiwUfAMrzLCOe9zTa%7Ctkp%3ABFBMmNDJgZJi
Here, 225605642071 is the unique identifier of the item. Note that the query parameters are not required to visit the page. You can remove them and eBay will still load the product page correctly.
Instead of hard-coding the target page in your script, you can make it read the item ID from a command line argument. This way, you could scrape data from any product page.
Achieve that by updating scraper.py as follows:
import requests
from bs4 import BeautifulSoup
import sys
# if there are no CLI parameters
if len(sys.argv) <= 1:
print('Item ID argument missing!')
sys.exit(2)
# read the item ID from a CLI argument
item_id = sys.argv[1]
# build the URL of the target product page
url = f'https://www.ebay.com/itm/{item_id}'
# scraping logic...
Assume you want to scrape the product 225605642071. You can launch your scraper with:
python scraper.py 225605642071
Thanks to sys, you can access the command-line arguments. The first element of sys.argv is the name of your script, scraper.py. To get item ID, you then need to target the element with index 1.
If you forget the item ID in the CLI, the application will fail with the error below:
Item ID argument missing!
Otherwise, it will read the CLI parameter and use it in an f-string to generate the target URL of the product to scrape. In this case, URL will contain:
https://www.ebay.com/itm/225605642071
Now, you can use requests to download that web page with the following line of code:
page = requests.get(url)
Behind the scene, request.get() performs an HTTP GET request to the URL passed as a parameter. page will store the response produced by the eBay server, including the HTML content of the target page.
Fantastic! Let’s learn how to retrieve data from it.
Step 4: Parse the HTML document
page.text contains the HTML document returned by the server. Pass it to the BeautifulSoup() constructor to parse it:
soup = BeautifulSoup(page.text, 'html.parser')
The second parameter specifies the parser used by Beautiful Soup. If you are not familiar with it, html.parser is the name of the Python built-in HTML parser.
The soup variable now stores a tree structure that exposes some useful methods for selecting elements from the DOM. The most popular ones are:
- find(): Returns the first HTML element that matches the selector condition passed as a parameter.
- find_all(): Returns a list of HTML elements matching the input selector strategy.
- select_one(): Returns the HTML elements matching the input CSS selector.
- select(): Returns a list of HTML elements matching the CSS selector passed as a parameter.
Use them to select HTML elements by tag, ID, CSS classes, and more. Then, you can extract data from their attributes and text content. See how!
Step 5: Inspect the product page
If you want to structure an effective data scraping strategy, you must first get familiar with the structure of the target web pages. Open your browser and visit some eBay products.
You will first notice that, depending on the product category, the page contains different information. In electronics products, you will have access to technical specifications.
When you visit clothing products, you see the sizes and colors available.
These inconsistencies in the structure of web pages make scraping a bit challenging. However, some information fields are on each page, such as product and shipping prices.
Familiarize yourself with your browser’s DevTools as well. Right-click on an HTML element containing interesting data and select “Inspect.” This will open the window below:
Here, you can explore the DOM structure of the page and understand how to define effective selector strategies.
Spend some time inspecting the product pages with the DevTools.
Step 6: Extract the price data
First, you need a data structure where to store the data to scrape. Initialize a Python dictionary with:
item = {}
As you should have noticed in the previous step, the price data is in this section:
Inspect the HTML price element:
You can get the product price with the CSS selector below:
.x-price-primary span[itemprop="price"]
And the currency with:
.x-price-primary span[itemprop="priceCurrency"]
Apply those selectors in Beautiful Soup and retrieve the desired data with:
price_html_element = soup.select_one('.x-price-primary span[itemprop="price"]')
price = price_html_element['content']
currency_html_element = soup.select_one('.x-price-primary span[itemprop="priceCurrency"]')
currency = currency_html_element['content']
This snippet selects the price and currency HTML elements and then collect the string contained in their content attribute.
Keep in mind that the price scraped above is only part of the full price you will have to pay to get the item you want. That also includes shipping costs.
Inspect the shipping element:
This time, extracting the desired data is a bit trickier as there is not an easy CSS selector to get the element. What you can do is iterate over each .ux-labels-values__labels div. When the current element contains the “Shipping:” string, you can access the next sibling in the DOM and extract the price from .ux-textspans–BOLD:
label_html_elements = soup.select('.ux-labels-values__labels')
for label_html_element in label_html_elements:
if 'Shipping:' in label_html_element.text:
shipping_price_html_element = label_html_element.next_sibling.select_one('.ux-textspans--BOLD')
# if there is a shipping price HTML element
if shipping_price_html_element is not None:
# extract the float number of the price from
# the text content
shipping_price = re.findall("\d+[.,]\d+", shipping_price_html_element.text)[0]
break
The shipping price element contains the desired data in the following format:
US $105.44
To extract the price, you can use a regex with the re.findall() method. Do not forget to add the following line in the import section of your script:
import re
Add the collected data to the item dictionary:
item['price'] = price
item['shipping_price'] = shipping_price
item['currency'] = currency
Print it with:
print(item)
And you will get:
{'price': '499.99', 'shipping_price': '72.58', 'currency': 'USD'}
This is enough to implement a price-tracking process in Python. Nevertheless, there is a lot of other useful information on the eBay product page. So, it is worth learning how to extract it!
Step 7: Retrieve the item details
If you take a look at the “About this item” tab, you will notice that it contains a lot of interesting data:
The sections and fields within them change from product to product, so you need to find a way to scrape them all with a smart approach.
In detail, the most important sections are “Item specifics” and “About this product.” These two are present on most products. Inspect one of the two and notice that you can select them with:
.section-title
Given a section, explore its DOM structure:
Note that it consists of several rows, each with some .ux-layout-section-evo__col elements. These contain two elements:
- .ux-labels-values__labels: The attribute name.
- .ux-labels-values__values: The attribute value.
You are now ready to scrape all the detail section info programmatically with:
section_title_elements = soup.select('.section-title')
for section_title_element in section_title_elements:
if 'Item specifics' in section_title_element.text or 'About this product' in section_title_element.text:
# get the parent element containing the entire section
section_element = section_title_element.parent
for section_col in section_element.select('.ux-layout-section-evo__col'):
print(section_col.text)
col_label = section_col.select_one('.ux-labels-values__labels')
col_value = section_col.select_one('.ux-labels-values__values')
# if both elements are present
if col_label is not None and col_value is not None:
item[col_label.text] = col_value.text
This code goes through each HTML detail field element and adds the key-value pair associated with each product attribute to the item dictionary.
At the end of the for loop, item will contain:
{'price': '499.99', 'shipping_price': '72.58', 'currency': 'USD', 'Condition': "New: A brand-new, unused, unopened, undamaged item in its original packaging (where packaging is applicable). Packaging should be the same as what is found in a retail store, unless the item is handmade or was packaged by the manufacturer in non-retail packaging, such as an unprinted box or plastic bag. See the seller's listing for full details. See all condition definitionsopens in a new window or tab ", 'Manufacturer Warranty': '1 Year', 'Item Height': '16.89"', 'Item Length': '18.5"', 'Item Depth': '6.94"', 'Item Weight': '15.17 lbs', 'UPC': '0711719558255', 'Brand': 'Sony', 'Type': 'Home Console', 'Region Code': 'Region Free', 'Platform': 'Sony PlayStation 5', 'Color': 'White', 'Model': 'Sony PlayStation 5 Blu-Ray Edition', 'Connectivity': 'HDMI', 'MPN': '1000032624', 'Features': '3D Audio Technology, Blu-Ray Compatible, Wi-Fi Capability, Internet Browsing', 'Storage Capacity': '825 GB', 'Resolution': '4K (UHD)', 'eBay Product ID (ePID)': '26057553242', 'Manufacturer Color': 'White', 'Edition': 'God of War Ragnarök Bundle', 'Release Year': '2022'}
Wonderful! You just achieved your data retrieval goal!
Step 8: Export scraped data to JSON
Right now, the scraped data is stored in a Python dictionary. To make it more easily shareable and readable, you can export it to JSON with:
import json
# scraping logic...
with open('product_info.json', 'w') as file:
json.dump(item, file)
First, you need to initialize a product_info.json file with open(). Then, you can write the JSON representation of the item dictionary to the output file with json.dump(). Check out our article to learn more about how to parse and serialize data to JSON in Python.
The json package comes from Python Standard Library, so you do not even need to install an extra dependency to achieve the objective.
Great! You started from raw data contained in a webpage and now have semi-structured JSON data. It is time to take a look at the entire eBay scraper.
Step 9: Put it all together
Here is the full scraper.py script:
import requests
from bs4 import BeautifulSoup
import sys
import re
import json
# if there are no CLI parameters
if len(sys.argv) <= 1:
print('Item ID argument missing!')
sys.exit(2)
# read the item ID from a CLI argument
item_id = sys.argv[1]
# build the URL of the target product page
url = f'https://www.ebay.com/itm/{item_id}'
# download the target page
page = requests.get(url)
# parse the HTML document returned by the server
soup = BeautifulSoup(page.text, 'html.parser')
# initialize the object that will contain
# the scraped data
item = {}
# price scraping logic
price_html_element = soup.select_one('.x-price-primary span[itemprop="price"]')
price = price_html_element['content']
currency_html_element = soup.select_one('.x-price-primary span[itemprop="priceCurrency"]')
currency = currency_html_element['content']
shipping_price = None
label_html_elements = soup.select('.ux-labels-values__labels')
for label_html_element in label_html_elements:
if 'Shipping:' in label_html_element.text:
shipping_price_html_element = label_html_element.next_sibling.select_one('.ux-textspans--BOLD')
# if there is not a shipping price HTML element
if shipping_price_html_element is not None:
# extract the float number of the price from
# the text content
shipping_price = re.findall("\d+[.,]\d+", shipping_price_html_element.text)[0]
break
item['price'] = price
item['shipping_price'] = shipping_price
item['currency'] = currency
# product detail scraping logic
section_title_elements = soup.select('.section-title')
for section_title_element in section_title_elements:
if 'Item specifics' in section_title_element.text or 'About this product' in section_title_element.text:
# get the parent element containing the entire section
section_element = section_title_element.parent
for section_col in section_element.select('.ux-layout-section-evo__col'):
print(section_col.text)
col_label = section_col.select_one('.ux-labels-values__labels')
col_value = section_col.select_one('.ux-labels-values__values')
# if both elements are present
if col_label is not None and col_value is not None:
item[col_label.text] = col_value.text
# export the scraped data to a JSON file
with open('product_info.json', 'w') as file:
json.dump(item, file, indent=4)
In less than 70 lines of code, you can build a web scraper to monitor data from eBay products.
As an example, launch it against the item identified by the ID 225605642071 with:
python scraper.py 225605642071
At the end of the scraping process, the product_info.json file below will appear in the root folder of your project:
{
"price": "499.99",
"shipping_price": "72.58",
"currency": "USD",
"Condition": "New: A brand-new, unused, unopened, undamaged item in its original packaging (where packaging is applicable). Packaging should be the same as what is found in a retail store, unless the item is handmade or was packaged by the manufacturer in non-retail packaging, such as an unprinted box or plastic bag. See the seller's listing for full details",
"Manufacturer Warranty": "1 Year",
"Item Height": "16.89\"",
"Item Length": "18.5\"",
"Item Depth": "6.94\"",
"Item Weight": "15.17 lbs",
"UPC": "0711719558255",
"Brand": "Sony",
"Type": "Home Console",
"Region Code": "Region Free",
"Platform": "Sony PlayStation 5",
"Color": "White",
"Model": "Sony PlayStation 5 Blu-Ray Edition",
"Connectivity": "HDMI",
"MPN": "1000032624",
"Features": "3D Audio Technology, Blu-Ray Compatible, Wi-Fi Capability, Internet Browsing",
"Storage Capacity": "825 GB",
"Resolution": "4K (UHD)",
"eBay Product ID (ePID)": "26057553242",
"Manufacturer Color": "White",
"Edition": "God of War Ragnarok Bundle",
"Release Year": "2022"
}
Congrats! You just learned how to scrape eBay in Python!
Conclusion
In this guide, you figured out why eBay is one the best scraping target to track product prices and how to achieve that. In detail, you saw how to build a Python scraper that can retrieve item data in a step-by-step tutorial. As shown here, it is not complex and requires only a few lines of code.
At the same time, you understood how inconsistent the structure of eBay’s pages is. The scraper built here might therefore work for one product but not for another. Also, eBay’s UI changes often, which forces you to continually maintain the script. Fortunately, you can avoid this with our eBay scraper API!
Don’t want to deal with eBay web scraping at all but are interested in item data? Purchase an eBay dataset.
Interested in scraping other websites? Register now and try our Web Scraper API.
No credit card required
Note: This guide was thoroughly tested by our team at the time of writing, but as websites frequently update their code and structure, some steps may no longer work as expected.