In this guide, you will learn:
- What an Alibaba scraper is and how it works
- The types of data you can automatically retrieve from Alibaba
- How to build an Alibaba scraping script using Python
Let’s dive in!
What Is an Alibaba Scraper?
An Alibaba scraper is a web scraping bot designed to automatically extract data from Alibaba’s pages. It works by simulating a user’s browsing behavior to navigate Alibaba pages. It handles interactions like pagination and retrieves structured information such as product details, prices, and company data.
Data You Can Scrape From Alibaba
Alibaba is a treasure trove of valuable information, such as:
- Product Details: Names, descriptions, images, price ranges, seller information, and more.
- Company Information: Company names, manufacturer details, contact information, and ratings.
- Customer Feedback: Ratings, product reviews, and more.
- Logistics and Availability: Stock status, minimum order quantities, shipping options, and more.
- Categories and Tags: Product categories, relevant tags, or labels.
See how to scrape them!
Scraping Alibaba in Python: Step-By-Step Guide
In this section, you will learn how to build an Alibaba scraper in a guided tutorial.
The objective is to guide you through creating a Python script that automatically extracts data from the Alibaba “laptop” page:
Ready? Follow the steps below!
Step #1: Project Setup
First of all, verify that you have Python 3 installed on your machine. Otherwise, download it and follow the installation wizard.
Now, use the command below to create a directory for your project:
mkdir alibaba-scraper
The alibaba-scraper
folder is where you will place the Python Alibaba scraper.
Enter it in the terminal, and create a virtual environment inside it:
cd alibaba-scraper
python -m venv env
Load the project folder in your favorite Python IDE, such as Visual Studio Code with the Python extension or PyCharm Community Edition.
Create a scraper.py
file in the project’s directory, which should now contain this file structure:
scraper.py
is currently a blank Python script, but it will soon contain the desired scraping logic.
In the IDE’s terminal, activate the virtual environment. In Linux or macOS, execute this command:
./env/bin/activate
Equivalently, on Windows, run:
env/Scripts/activate
Amazing, your Python environment for Alibaba web scraping is ready!
Step #2: Select the Scraping Library
The goal now is to determine whether Alibaba uses dynamic or static pages. To do so, open the Alibaba target page in your browser in incognito mode. Then, right-click on the background, select “Inspect,” reach the “Network” tab, filter for “Fetch/XHR,” and reload the page:
In this section of the DevTools, observe whether the page makes any significant dynamic requests. In this case, it does, which indicates that the page is dynamic. Further analysis reveals that the page uses JavaScript for rendering.
In other words, you need a browser automation tool like Selenium to scrape Alibaba effectively. Learn more in our tutorial on Selenium web scraping.
Selenium allows you to programmatically control a web browser, simulating user interactions and enabling you to scrape content rendered by JavaScript. Time to install it and get started with it!
Step #3: Install and Configure Selenium
In an activated virtual environment, install Selenium with this command:
pip install -U selenium
Import Selenium in scraper.py
and create a WebDriver
object:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# initialize a Chrome web driver instance
driver = webdriver.Chrome(service=Service())
The code above initializes a WebDriver
instance to control a Chrome instance. Note that Alibaba has some anti-scraping measures in place that that may block headless browsers.
Thus, you should not set the --headless
flag. As an alternative solution, consider exploring Playwright Stealth.
As the last line of your scraper, remember to close the web driver:
driver.quit()
Wonderful! You are fully configured to start scraping Alibaba.
Step #4: Connect to the Target Page
Use the get()
method exposed by the Selenium WebDriver
object to visit the desired page:
url = "https://www.alibaba.com/trade/search?spm=a2700.product_home_newuser.home_new_user_first_screen_fy23_pc_search_bar.keydown__Enter&tab=all&SearchText=laptop"
driver.get(url)
The scraper.py
file will now contain these lines of code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# initialize a Chrome web driver instance
driver = webdriver.Chrome(service=Service())
# the url of the target page
url = "https://www.alibaba.com/trade/search?spm=a2700.product_home_newuser.home_new_user_first_screen_fy23_pc_search_bar.keydown__Enter&tab=all&SearchText=laptop"
# connect to the target page
driver.get(url)
# scraping logic...
# close the browser
driver.quit()
Place a debugging breakpoint on the final line and launch the script with the debugger. Here is what you should be seeing:
The “Chrome is being controlled by automated test software.” message certifies that Selenium is controlling Chrome as expected. Well done!
Step #5: Select the Product Elements
Since the Alibaba product page contains several products, you first need to initialize a data structure to store the scraped data. An array will work perfectly for this purpose:
products = []
Next, inspect the HTML elements of the products on the page to understand:
- How to select them
- What data they contain
- How to extract that data
Here, you can see that each product element is a .m-gallery-product-item-v2
node.
Use Selenium to select all product elements:
product_elements = driver.find_elements(By.CSS_SELECTOR, ".m-gallery-product-item-v2")
find_elements()
applies the given selector strategy to retrieve elements on the page. In the above case, the selector strategy is a CSS selector.
Do not forget to import By
:
from selenium.webdriver.common.by import By
Iterate over the selected elements and preparate to scrape data from each of them:
for product_element in product_elements:
# scrape data from each product element
Terrific! You are one step closer to successfully scraping Alibaba.
Step #6: Scrape the Product Elements
Inspect a product element to understand its HTML structure:
Here you can see that you can scrape:
- The product image from
.search-card-e-slider__img
- The product description from
.search-card-e-title
- The product price range from
.search-card-e-price-main
- The company/manufacturer from
.search-card-e-company
In the for
loop, translate that information into scraping logic:
img_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-slider__img")
img = img_element.get_attribute("src")
description_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-title")
description = description_element.text.strip()
price_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-price-main")
price = price_element.text.strip()
company_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-company")
company = company_element.text.strip()
find_element()
retrieves the only element matching the given CSS selector. Then, you can access its text content with the text
attribute. To get the value of a node’s HTML attribute, use the get_attribute()
method.
Use the scraped data to populate a product dictionary and add it to the products
array:
product = {
"img": img,
"description": description,
"price": price,
"company": company
}
products.append(product)
Fantastic! The Alibaba data extraction logic is complete.
Step #7: Export the Scraped Data to CSV
Currently, your scraped data is stored in the products
array. To make it accessible and shareable with others, you need to export it to a human-readable format like a CSV file.
Utilize the following code to create and populate a CSV file with the scraped data:
csv_file_name = "products.csv"
with open(csv_file_name, mode="w", newline="", encoding="utf-8") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["image", "description", "price", "company"])
# write the header row
writer.writeheader()
# write product data rows
for product in products:
writer.writerow(product)
Do not forget to import csv
from the Python Standard Library:
import csv
Wow! Your Aliaba scraper is complete.
Step #8: Put It All Together
Below is the final code of your Alibaba scraping script:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import csv
# initialize a Chrome web driver instance
driver = webdriver.Chrome(service=Service())
# the URL of the target page
url = "https://www.alibaba.com/trade/search?spm=a2700.product_home_newuser.home_new_user_first_screen_fy23_pc_search_bar.keydown__Enter&tab=all&SearchText=laptop"
# connect to the target page
driver.get(url)
# where to store the scraped data
products = []
# select all product elements on the page
product_elements = driver.find_elements(By.CSS_SELECTOR, ".m-gallery-product-item-v2")
# iterate over the product nodes and scrape data from them
for product_element in product_elements:
# extract the product details
img_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-slider__img")
img = img_element.get_attribute("src")
description_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-title")
description = description_element.text.strip()
price_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-price-main")
price = price_element.text.strip()
company_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-company")
company = company_element.text.strip()
# create a product dictionary with the
# scraped data
product = {
"img": img,
"description": description,
"price": price,
"company": company
}
# add the product data to the array
products.append(product)
# define the output CSV file name
csv_file_name = "products.csv"
# open the file in write mode and create a CSV writer
with open(csv_file_name, mode="w", newline="", encoding="utf-8") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["img", "description", "price", "company"])
# write the header row
writer.writeheader()
# write product data rows
for product in products:
writer.writerow(product)
# close the browser
driver.quit()
In just over 60 lines of code, you just built an Alibaba scraper in Python!
Launch the scraper with the following command:
python3 script.py
Or, on Windows:
python script.py
A products.csv
file will appear in your project’s folder. Open it and you will see:
Et voilà! Mission complete. The next steps? Handle pagination, deploy your script, automate its execution, and refine it further for optimal performance!
Conclusion
In this step-by-step tutorial, you learned what an Alibaba scraper is and the types of data it can retrieve. You also saw how to build a Python script to scrape Alibaba products using less than 100 lines of code.
The problem is that scraping Alibaba comes with challenges. The platform employs strict anti-bot measures and adopts interactions like pagination that make the scraping process more complex. Building a scalable and effective Alibaba scraping solution can be quite demanding.
Forget about those challenges with our Alibaba Scraper API! This dedicated solution lets you retrieve data from the target site through simple API calls—no risk of being blocked.
If web scraping is not your preferred approach, but you are still interested in product data, explore our ready-to-use Alibaba datasets!
Create a free Bright Data account today to try our scraper APIs or explore our datasets.
No credit card required