In this tutorial, you will explore:
- The definition of ecommerce scraping and why it is useful
- The types of ecommerce scraper tools
- The data you can scrape from ecommerce platforms
- How to create an ecommerce scraping script with Python
- The challenges of scraping ecommerce websites
Let’s dive in!
What Is eCommerce Web Scraping?
Ecommerce web scraping is the process of extracting data from online retail platforms like Amazon, Walmart, eBay, and similar sites. While it can be done by manually copying the data, it is usually performed using automated tools or scripts.
The data extracted from ecommerce sites can help businesses, researchers, and developers:
- Analyze product price fluctuations
- Track review scores
- Identify market trends
- Study competitors
These insights enable informed decision-making and strategic planning.
Note that an ecommerce data scraping tool is commonly referred to as an ecommerce scraper.
Types of eCommerce Scrapers
Below is a list of some of the most popular types of ecommerce scraper tools:
- Custom scripts: Tailored scripts to extract specific ecommerce data using web scraping programming languages like Python or JavaScript.
- No-code scrapers: User-friendly tools allowing data extraction without coding, ideal for non-technical users. Discover the best no-code scrapers.
- Web scraping APIs: Interfaces that provide structured ecommerce data programmatically, often supporting real-time or large-scale extraction.
- Scraping extensions: Browser-based add-ons that simplify data collection directly from ecommerce web pages as you navigate them.
In this article, we will focus specifically on building a custom ecommerce web scraping bot.
Data to Scrape from eCommerce Sites
ecommerce web scrapers typically help you retrieve the following data:
- Product details: Names, descriptions, specifications, and images.
- Pricing information: Current prices, discounts, and historical price trends.
- Customer reviews: Ratings, review content, and customer feedback.
- Categories and tags: Classification and categorization of products.
- Seller information: Seller names, ratings, and contact details.
- Shipping details: Costs, delivery times, and shipping policies.
- Stock availability: Inventory levels and out-of-stock notifications.
- Marketing data: Product listings, pricing strategies, promotions, and seasonal discounts.
Now, learn how to build a Python ecommerce scraper!
How To Build an eCommerce Scraper
To manually build an ecommerce scraper, you first need to familiarize yourself with the target site. Inspect the target page with the DevTools to:
- Understand its structure
- Determine what data you can extract
- Decide which scraping libraries to use
For simpler ecommerce sites, the following two Python libraries are sufficient:
- Requests: For sending HTTP requests. It helps you get the raw HTML content of a webpage.
- Beautiful Soup: For parsing HTML and XML documents. It simplifies navigation and data extraction from a page’s HTML structure. Learn more in our guide on Beautiful Soup scraping.
You can install them both with:
pip install requests beautifulsoup4
For eCommerce platforms that load data dynamically or rely heavily on JavaScript rendering, you will need browser automation tools like Selenium. For more information, see our tutorial on Selenium scraping.
You can install Selenium with:
pip install selenium
Next, the web scraping process is as follows:
- Connect to the target site: Use Requests or Selenium to retrieve and parse the HTML of the page.
- Select the elements of interest: Locate specific elements (e.g., product image, price, description) in the HTML structure and select them with CSS selectors or XPath expressions.
- Extract data: Pull the desired information from these HTML elements.
- Clean the data: Process the extracted data to remove unnecessary content or reformat it, if needed.
- Export the data: Save the cleaned data in a preferred format, such as JSON or CSV.
The advantages of this approach include having full control over the data extraction process and the ability to customize it to meet specific requirements. However, it does require technical expertise for design and maintenance,. Plus, each ecommerce site necessitates its own script.
In the next chapters, you will find Python eCommerce scraping script examples for extracting data from Amazon, Walmart, and eBay!
Amazon Scraping
- Target page: “laptop” search page on Amazon
- Target page URL: https://www.amazon.com/s?k=laptop&ref=nb_sb_noss
Amazon has anti-scraping measures designed to block requests that do not originate from a browser. To bypass these restrictions, you need to use a browser automation tool like Selenium:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import json
# Initialize the WebDriver
driver = webdriver.Chrome(service=Service())
# Open the Amazon home page in the browser
driver.get("https://amazon.com/")
# Fill out the search form
search_input_element = driver.find_element(By.ID, "twotabsearchtextbox")
search_input_element.send_keys("laptop")
# Locate the search button and click it
search_button_element = driver.find_element(By.ID, "nav-search-submit-button")
search_button_element.click()
# You are now on the target page
# Where to store the scraped data
products = []
# Select all product elements on the page
product_elements = driver.find_elements(By.CSS_SELECTOR, "\[role=\"listitem\"\][data-asin]")
# Iterate over them
for product_element in product_elements:
# Scraping logic
url_element = product_element.find_element(By.CSS_SELECTOR, ".a-link-normal")
url = url_element.get_attribute("href")
name_element = product_element.find_element(By.CSS_SELECTOR, "h2")
name = name_element.text
image_element = product_element.find_element(By.CSS_SELECTOR, "img[data-image-load]")
image = image_element.get_attribute("src")
# Populate a new object with the scraped data
product = {
"url": url,
"name": name,
"image": image
}
# Add it to the list of scraped products
products.append(product)
# Export data to a JSON file
with open("products.json", "w", encoding="utf-8") as json_file:
json.dump(products, json_file, indent=4)
Run the above Amazon eCommerce scraper, and if Amazon does not show a CAPTCHA, it will generate the following result:
[
{
"url": "https://www.amazon.com/A315-24P-R7VH-Display-Quad-Core-Processor-Graphics/dp/B0BS4BP8FB/ref=sr_1_3?crid=1W7R6D59KV9L1&dib=eyJ2IjoiMSJ9.iBCtzwnCm6CE8Bx8hKmQ8ez6PkzMg3asWNhAxvflBg3pKVi5IxQUSDpcaksihO-jEO1nyLGkdoGk_2hNyQ7EWOa6epS_hZHxqV7msqdtcEZv4irFZRnYHcP5YnEwKu17BjsYS_IPI1tFVDS65v_roSCu_IiBNfotAEHSx4zOwQ4u1CRKfvnLjIX4VlECydRjsKaAQ-mErT89tyBUCfEGjzKPPZxwHi3Y0MoieuPceL8.jIuIrqzxNYISYPLHifRJq289Vy9Z6hqT8vmMcUQw9HY&dib_tag=se&keywords=laptop&qid=1735572968&sprefix=l%2Caps%2C271&sr=8-3",
"name": "Acer Aspire 3 A315-24P-R7VH Slim Laptop | 15.6\" Full HD IPS Display | AMD Ryzen 3 7320U Quad-Core Processor | AMD Radeon Graphics | 8GB LPDDR5 | 128GB NVMe SSD | Wi-Fi 6 | Windows 11 Home in S Mode",
"image": "https://m.media-amazon.com/images/I/61gKkYQn6lL._AC_UY218_.jpg"
},
// omitted for brevity...
{
"url": "https://www.amazon.com/Lenovo-Newest-Flagship-Chromebook-HubxcelAccesory/dp/B0CBJ46QZX/ref=sr_1_8?crid=1W7R6D59KV9L1&dib=eyJ2IjoiMSJ9.iBCtzwnCm6CE8Bx8hKmQ8ez6PkzMg3asWNhAxvflBg3pKVi5IxQUSDpcaksihO-jEO1nyLGkdoGk_2hNyQ7EWOa6epS_hZHxqV7msqdtcEZv4irFZRnYHcP5YnEwKu17BjsYS_IPI1tFVDS65v_roSCu_IiBNfotAEHSx4zOwQ4u1CRKfvnLjIX4VlECydRjsKaAQ-mErT89tyBUCfEGjzKPPZxwHi3Y0MoieuPceL8.jIuIrqzxNYISYPLHifRJq289Vy9Z6hqT8vmMcUQw9HY&dib_tag=se&keywords=laptop&qid=1735572968&sprefix=l%2Caps%2C271&sr=8-8",
"name": "Lenovo Newest Flagship Chromebook, 14'' FHD Touchscreen Slim Thin Light Laptop Computer, 8-Core MediaTek Kompanio 520 Processor, 4GB RAM, 64GB eMMC, WiFi 6,Chrome OS+HubxcelAccesory, Abyss Blue",
"image": "https://m.media-amazon.com/images/I/61KlKRdsQ7L._AC_UY218_.jpg"
}
]
Note that Amazon may still show a CAPTCHA and block your request, even if you are making it through Selenium. In that case, you should check out SeleniumBase as an alternative. Otherwise, keep reading the article as we will present a definitive solution.
For a comprehensive walkthrough, check out our detailed tutorial on Amazon web scraping.
Walmart Scraping
- Target page: “keyboard” search page on Walmart
- Target page URL: https://www.walmart.com/search?q=keyboard
Just like Amazon, Walmart uses anti-bot solutions to block requests that come from automated HTTP clients. So, you can scrape it with Selenium as below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import json
# Initialize the WebDriver
driver = webdriver.Chrome(service=Service())
# Navigate to the target page
driver.get("https://www.walmart.com/search?q=keyboard")
# Where to store the scraped data
products = []
# Select all product elements on the page
product_elements = driver.find_elements(By.CSS_SELECTOR, ".carousel-4[data-testid=\"carousel-container\"] li")
# Iterate over them
for product_element in product_elements:
# Scraping logic
url_element = product_element.find_element(By.CSS_SELECTOR, "a")
url = url_element.get_attribute("href")
name_element = product_element.find_element(By.CSS_SELECTOR, "h3")
name = name_element.get_attribute("innerText")
image_element = product_element.find_element(By.CSS_SELECTOR, "img[data-testid=\"productTileImage\"]")
image = image_element.get_attribute("src")
# Populate a new object with the scraped data
product = {
"url": url,
"name": name,
"image": image
}
# Add it to the list of scraped products
products.append(product)
# Export data to a JSON file
with open("products.json", "w", encoding="utf-8") as json_file:
json.dump(products, json_file, indent=4)
Execute the Walmart ecommerce scraper, and you will get:
[
{
"url": "https://www.walmart.com/sp/track?bt=1&eventST=click&plmt=sp-search-middle~desktop~Results%20for%20%22Electronics%22&pos=1&tax=3944_1089430_132959_1008621_7197407&rdf=1&rd=https%3A%2F%2Fwww.walmart.com%2Fip%2FLogitech-920-004536-Mk270-Keyboard-Mouse-USB-Wireless-Combo-Black%2F28540111%3FclassType%3DREGULAR%26adsRedirect%3Dtrue&adUid=094fb4ae-62f3-4954-ae99-b2938550d72c&mloc=sp-search-middle&pltfm=desktop&pgId=keyboard&pt=search&spQs=sAX_0l4wzWXzBji34bVpmheXU7_ETXGbDXcA9LhcshG_YbqBx24VWzt7yesHivpt1lpckuNhxQqbLidA-d8L4agqx_YPQVlj2EfM_TnEyfsSWiTEkvBaqgkaMzy6bgIZ4eC8t9-qqz7qtb7uXMz3cH92UCf5EEgQlfKwnxJ-SAF1EW1ouCjC10Ur3hELs3143xQPjxNUSUoN8FIF12fxJmTlSlTe4makoj1s2NoubYTqnlJLs3pohowJCRFT76Vl&storeId=3081&couponState=na&bkt=ace1_default%7Cace2_default%7Cace3_default%7Ccoldstart_off%7Csearch_default&classType=REGULAR",
"name": "Logitech Wireless Combo MK270",
"image": "https://i5.walmartimages.com/seo/Logitech-920-004536-Mk270-Keyboard-Mouse-USB-Wireless-Combo-Black_99591453-341e-4c5b-937e-b2ab9b321519.3860011d84a23ccd0732e46474590b15.jpeg?odnHeight=784&odnWidth=580&odnBg=FFFFFF"
},
{
"url": "https://www.walmart.com/sp/track?bt=1&eventST=click&plmt=sp-search-middle~desktop~Results%20for%20%22Electronics%22&pos=2&tax=3944_1089430_132959_1008621_7197407&rdf=1&rd=https%3A%2F%2Fwww.walmart.com%2Fip%2FSteelSeries-Apex-3-TKL-RGB-Gaming-Keyboard-Tenkeyless-Water-Dust-Resistant-PC-and-USB-A%2F996783321%3FclassType%3DVARIANT%26adsRedirect%3Dtrue&adUid=094fb4ae-62f3-4954-ae99-b2938550d72c&mloc=sp-search-middle&pltfm=desktop&pgId=keyboard&pt=search&spQs=Dp3ons-xIcmPw9Ze7UUZuW3PD9Dto_vYCLjglme5vSy5Ze1p4NXg3uzApRy4mgfB-dGDchsq6FDoaZeMy6Dmeagqx_YPQVlj2EfM_TnEyfv_0r9GA9WwEd1cWbcx63Diahe72Zw6lw8suSf-OFKKH6UaiJl_8Qtpar-x0VhgrMsbqG7gDKh5DkQZql3HeMLncWSwburhSEjvpT1dXlDoWKxUrZwxZhOMry-uCqhuSb7Y6B-xZGrNPjYyel0nw11Z&storeId=3081&couponState=na&bkt=ace1_default%7Cace2_default%7Cace3_default%7Ccoldstart_off%7Csearch_default&classType=VARIANT",
"name": "SteelSeries Apex 3 TKL RGB Gaming Keyboard - Tenkeyless - Water & Dust Resistant - PC and USB-A",
"image": "https://i5.walmartimages.com/seo/SteelSeries-Apex-3-TKL-RGB-Gaming-Keyboard-Tenkeyless-Water-Dust-Resistant-PC-and-USB-A_876430c2-eed8-404a-aa55-1c66193daf8e.8c617e57ba48bc49d003f917f85cb535.jpeg?odnHeight=784&odnWidth=580&odnBg=FFFFFF"
},
// omittd for brevity...
{
"url": "https://www.walmart.com/ip/DEP-06-Portable-Digital-Piano-with-X-Stand/7598762909?classType=REGULAR",
"name": "Donner Portable Digital Piano 88-key Synth Action Keyboard with X Stand, Pedal, Auto-accompaniment for Beginner, 128 Tones, 83 Rhythms, Support USB/MIDI/Melodics, Wireless Connection",
"image": "https://i5.walmartimages.com/seo/DEP-06-Portable-Digital-Piano-with-X-Stand_1175fc1e-c191-4c71-9e9a-7e4a13274487.6673e0430c23d122744cfb63ccc8c155.jpeg?odnHeight=784&odnWidth=580&odnBg=FFFFFF"
}
]
For more guidance, read our article on Walmart web scraping.
eBay Scraping
- Target page: “mouse” search page on eBay
- Target page URL: https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=mouse&_sacat=0
eBay does not use JavaScript for rendering products or loading data dynamically. Thus, it can be scraped with Requests and Beautiful Soup as follows:
import requests
from bs4 import BeautifulSoup
import json
# Target page
url = "https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=mouse&_sacat=0"
# Send a GET request to the eBay search page
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers)
# Parse the page content with BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
# Where to store the scraped data
products = []
# Select all product elements on the page
product_elements = soup.select("li.s-item")
# Iterate over them
for product_element in product_elements:
# Scraping logic
url_element = product_element.select("a[data-interactions]")[0]
url = url_element["href"]
name_element = product_element.select("[role=\"heading\"]")[0]
name = name_element.text
image_element = product_element.select("img")[0]
image = image_element["src"]
# Populate a new object with the scraped data
product = {
"url": url,
"name": name,
"image": image
}
# Add it to the list of scraped products
products.append(product)
# Export data to a JSON file
with open("products.json", "w", encoding="utf-8") as json_file:
json.dump(products, json_file, indent=4)
Launch the eBay ecommerce web scraping script, and it will produce:
[
{
"url": "https://www.ebay.com/itm/193168148815?_skw=mouse&itmmeta=01JGC679WKT327K11R9YCGMQAN&hash=item2cf9b8094f:g:8F4AAOSw3B1drMr-&itmprp=enc%3AAQAJAAAAwHoV3kP08IDx%2BKZ9MfhVJKlr8NKoodwElhyHbl4CwcBMRqdGJme95%2F3tIll4uI7QYBk4%2BUBpwVvwiXdAl2%2BcILZ9axc%2BdHSZStWWMxWVyq4JdZ6r52PrRP2aS1jUoFoJ11vL4KyH2S8R5ha71xBtDFcGA2%2BtzhTzcR7J25kxuxbyd%2Frd4YnKbTPKwhn2Q0TP8qL30BJKcj4FnJYP0zhgO4WOGgOCHQhM21%2BanVk%2Fl0eg1H8mqCU91mkgKAt8KghFmw%3D%3D%7Ctkp%3ABlBMULSenYaDZQ",
"name": "2.4GHz Wireless Optical Mouse Mice & USB Receiver For PC Laptop Computer DPI USA",
"image": "https://i.ebayimg.com/images/g/8F4AAOSw3B1drMr-/s-l500.webp"
},
{
"url": "https://www.ebay.com/itm/356159975164?_skw=mouse&itmmeta=01JGC679WKE9V782ZXT15SEPHP&hash=item52ecc9eefc:g:0ikAAOSwHStnD33Q&itmprp=enc%3AAQAJAAAAwHoV3kP08IDx%2BKZ9MfhVJKlZ7pO0lYrvftkZhnT7ja625fcsjcktK0eaub2HNzEgsmo3b2VehoA4tffYdt0xiTXwHb%2BzYU4NBZ5onBh68cyKWhhMJowbRvnCwuwy2IQIRlkeijpbRtJNJPuaaiDZdV0eabGGkps8433kCR6fcX1xEodUxujoeYUjp0VP81OWcl%2BbBGd70%2Fq45HC3SXg4k%2FlK0%2FqR80yJYexSEfzUq7%2BN3Sa6Y01uCo5XPWFLHzRoSw%3D%3D%7Ctkp%3ABlBMULSenYaDZQ",
"name": "Ergonomics LED Screen Display Wireless Gaming Mouse Bluetooth 2.4G Wired support",
"image": "https://i.ebayimg.com/images/g/0ikAAOSwHStnD33Q/s-l500.webp"
},
// omitted for brevity...
{
"url": "https://www.ebay.com/itm/116250548048?_skw=mouse&itmmeta=01JGC679WN076MJ17QJ9P4FA5J&hash=item1b11129750:g:gr8AAOSwsSFmkXG3&itmprp=enc%3AAQAJAAAAwHoV3kP08IDx%2BKZ9MfhVJKkArX38iC0VVXTpfv4BzqCegsh22yxmsDAwZAmd4RxM9JlEMfuVRoYGVZFVCeurJYwAjWd2YK3%2BNs6m5rQHZXISyWtev1lEvfVVKP4Rd5QeC2KzLgqXOvp1lWiK5b31kfujkmKjF%2BEaR1kplulwrgUvzMO%2F78F%2BFukgIAoL8dE4nRD9jo%2BieiAgIpLBUcs8AmCy5vk65gt1JGonUOncRksGYciF%2FJg6arB9%2FVOYYq7N8A%3D%3D%7Ctkp%3ABlBMULyenYaDZQ",
"name": "Razer x Sanrio Kuromi DeathAdder Gaming Mouse and Mouse Pad Combo",
"image": "https://i.ebayimg.com/images/g/gr8AAOSwsSFmkXG3/s-l500.webp"
}
]
Amazing! You just saw some examples of Python ecomerce data scraping scripts!
Challenges in eCommerce Web Scraping and How to Overcome Them
In the examples above, we focused on extracting basic details like product name, URL, and image URL from a few ecommerce sites. While this simplicity made ecommerce scraping seem straightforward, the reality is far more complex for several reasons:
- Dynamic page structures: eCommerce platforms frequently update their page designs, requiring constant script maintenance.
- Diverse product pages: Different products may display varying sets of data and use entirely different layouts.
- Dynamic pricing: Scraping accurate price data can be challenging due to temporary deals, discounts, or region-specific offers.
Additionally, major ecommerce sites like Amazon employ advanced anti-scraping measures, such as CAPTCHAs:
Or, similarly, JavaScript challenges:
To overcome these blocks, you can:
- Learn advanced scraping techniques: Read our guide on bypassing CAPTCHA with Python and take a look at in-depth scraping tutorials for practical tips.
- Use advanced automation tools: Utilize robust tools like Playwright Stealth for scraping sites with anti-bot mechanisms.
Still, the most efficient solution is to use a dedicated eCommerce Scraper API.
Bright Data’s eCommerce Scraper API is a reliable solution for extracting data from eCommerce platforms like Amazon, Target, Walmart, Lazada, Shein, Shopee, and more. Key benefits include:
- Retrieve structured details such as product title, seller name, brand, description, reviews, initial price, currency, availability, categories, and more.
- Eliminate concerns about managing servers, proxies, or avoiding website blocks.
- Avoid interruptions from CAPTCHAs or JavaScript challenges.
Streamline your ecommerce scraping process today!
Conclusion
In this article, you learned what an ecommerce scraper is and the type of data it can extract from ecommerce web pages. No matter how sophisticated your ecommerce web scraping script is, most sites can still detect automated activity and block you.
The solution is a powerful eCommerce Scraper API specifically designed to retrieve ecommerce data reliably from various platforms. These APIs offer structured and comprehensive data, including:
- Amazon Scraper API: Scrape Amazon and collect data such as title, seller name, brand, description, reviews, initial price, currency, availability, categories, ASIN, number of sellers, and much more.
- eBay Scraper API: Collect data such as ASIN, seller name, merchant ID, URL, image URL, brand, product overview, description, sizes, colors, final price, and more.
- Walmart Scraper API: Collect data such as URL, SKU, price, image URL, related pages, available for delivery and pickup, brand, category, product ID and description, and more.
- Target Scraper API: Gather data such as URL, product ID, title, description, rating, review count, price, discount, currency, images, seller name, offers, shipping policy, and more.
- Lazada Scraper API: Scrape data such as URL, title, rating, reviews, initial and final price, currency, image, seller name, product description, SKU, colors, promotions, brand, and more.
- Shein Scraper API: Retrieve data such as product name, description, price, currency, color, in stock, size, review count, main image, country code, domain, and more.
- Shopee Scraper API: Scrape data such as URL, ID, title, rating, reviews, price, currency, stock, favorite, image, shop URL, ratings, date joined, followers, sold, brand, and more.
For scraping data from specific products, consider our Web Scraper API. If building a scraper is not your thing, explore our ready-to-use ecommerce datasets.
Create a free Bright Data account today to try our scraper APIs or explore our datasets.
No credit card required