How to Scrape Alibaba With Python in 2024

Learn how to build a Python-based Alibaba scraper to extract product details, pricing, and more. Perfect for automating data collection efficiently.
9 min read
How to Scrape Alibaba blog image

In this guide, you will learn:

  • What an Alibaba scraper is and how it works
  • The types of data you can automatically retrieve from Alibaba
  • How to build an Alibaba scraping script using Python

Let’s dive in!

What Is an Alibaba Scraper?

An Alibaba scraper is a web scraping bot designed to automatically extract data from Alibaba’s pages. It works by simulating a user’s browsing behavior to navigate Alibaba pages. It handles interactions like pagination and retrieves structured information such as product details, prices, and company data.

Data You Can Scrape From Alibaba

Alibaba is a treasure trove of valuable information, such as:

  • Product Details: Names, descriptions, images, price ranges, seller information, and more.
  • Company Information: Company names, manufacturer details, contact information, and ratings.
  • Customer Feedback: Ratings, product reviews, and more.
  • Logistics and Availability: Stock status, minimum order quantities, shipping options, and more.
  • Categories and Tags: Product categories, relevant tags, or labels.

See how to scrape them!

Scraping Alibaba in Python: Step-By-Step Guide

In this section, you will learn how to build an Alibaba scraper in a guided tutorial.

The objective is to guide you through creating a Python script that automatically extracts data from the Alibaba “laptop” page:

Laptop search results on Alibaba

Ready? Follow the steps below!

Step #1: Project Setup

First of all, verify that you have Python 3 installed on your machine. Otherwise, download it and follow the installation wizard.

Now, use the command below to create a directory for your project:

mkdir alibaba-scraper

The alibaba-scraper folder is where you will place the Python Alibaba scraper.

Enter it in the terminal, and create a virtual environment inside it:

cd alibaba-scraper
python -m venv env

Load the project folder in your favorite Python IDE, such as Visual Studio Code with the Python extension or PyCharm Community Edition.

Create a scraper.py file in the project’s directory, which should now contain this file structure:

the project’s directory

scraper.py is currently a blank Python script, but it will soon contain the desired scraping logic.

In the IDE’s terminal, activate the virtual environment. In Linux or macOS, execute this command:

./env/bin/activate

Equivalently, on Windows, run:

env/Scripts/activate

Amazing, your Python environment for Alibaba web scraping is ready!

Step #2: Select the Scraping Library

The goal now is to determine whether Alibaba uses dynamic or static pages. To do so, open the Alibaba target page in your browser in incognito mode. Then, right-click on the background, select “Inspect,” reach the “Network” tab, filter for “Fetch/XHR,” and reload the page:

The network tab on the chrome devtools

In this section of the DevTools, observe whether the page makes any significant dynamic requests. In this case, it does, which indicates that the page is dynamic. Further analysis reveals that the page uses JavaScript for rendering.

In other words, you need a browser automation tool like Selenium to scrape Alibaba effectively. Learn more in our tutorial on Selenium web scraping.

Selenium allows you to programmatically control a web browser, simulating user interactions and enabling you to scrape content rendered by JavaScript. Time to install it and get started with it!

Step #3: Install and Configure Selenium

In an activated virtual environment, install Selenium with this command:

pip install -U selenium

Import Selenium in scraper.py and create a WebDriver object:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# initialize a Chrome web driver instance
driver = webdriver.Chrome(service=Service())

The code above initializes a WebDriver instance to control a Chrome instance. Note that Alibaba has some anti-scraping measures in place that that may block headless browsers.

Thus, you should not set the --headless flag. As an alternative solution, consider exploring Playwright Stealth.

As the last line of your scraper, remember to close the web driver:

driver.quit()

Wonderful! You are fully configured to start scraping Alibaba.

Step #4: Connect to the Target Page

Use the get() method exposed by the Selenium WebDriver object to visit the desired page:

url = "https://www.alibaba.com/trade/search?spm=a2700.product_home_newuser.home_new_user_first_screen_fy23_pc_search_bar.keydown__Enter&tab=all&SearchText=laptop"
driver.get(url)

The scraper.py file will now contain these lines of code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# initialize a Chrome web driver instance
driver = webdriver.Chrome(service=Service())

# the url of the target page
url = "https://www.alibaba.com/trade/search?spm=a2700.product_home_newuser.home_new_user_first_screen_fy23_pc_search_bar.keydown__Enter&tab=all&SearchText=laptop"

# connect to the target page
driver.get(url)

# scraping logic...

# close the browser
driver.quit()

Place a debugging breakpoint on the final line and launch the script with the debugger. Here is what you should be seeing:

What you will see after launching the script

The “Chrome is being controlled by automated test software.” message certifies that Selenium is controlling Chrome as expected. Well done!

Step #5: Select the Product Elements

Since the Alibaba product page contains several products, you first need to initialize a data structure to store the scraped data. An array will work perfectly for this purpose:

products = []

Next, inspect the HTML elements of the products on the page to understand:

  1. How to select them
  2. What data they contain
  3. How to extract that data
inspect the HTML elements of the products

Here, you can see that each product element is a .m-gallery-product-item-v2 node.

Use Selenium to select all product elements:

product_elements = driver.find_elements(By.CSS_SELECTOR, ".m-gallery-product-item-v2")

find_elements() applies the given selector strategy to retrieve elements on the page. In the above case, the selector strategy is a CSS selector.

Do not forget to import By:

from selenium.webdriver.common.by import By

Iterate over the selected elements and preparate to scrape data from each of them:

for product_element in product_elements:
    # scrape data from each product element

Terrific! You are one step closer to successfully scraping Alibaba.

Step #6: Scrape the Product Elements

Inspect a product element to understand its HTML structure:

Inspecting a product element on Alibaba

Here you can see that you can scrape:

  • The product image from .search-card-e-slider__img
  • The product description from .search-card-e-title
  • The product price range from .search-card-e-price-main
  • The company/manufacturer from .search-card-e-company

In the for loop, translate that information into scraping logic:

img_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-slider__img")
img = img_element.get_attribute("src")

description_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-title")
description = description_element.text.strip()

price_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-price-main")
price = price_element.text.strip()

company_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-company")
company = company_element.text.strip()

find_element() retrieves the only element matching the given CSS selector. Then, you can access its text content with the text attribute. To get the value of a node’s HTML attribute, use the get_attribute() method.

Use the scraped data to populate a product dictionary and add it to the products array:

product = {
    "img": img,
    "description": description,
    "price": price,
    "company": company
}
products.append(product)

Fantastic! The Alibaba data extraction logic is complete.

Step #7: Export the Scraped Data to CSV

Currently, your scraped data is stored in the products array. To make it accessible and shareable with others, you need to export it to a human-readable format like a CSV file.

Utilize the following code to create and populate a CSV file with the scraped data:

csv_file_name = "products.csv"

with open(csv_file_name, mode="w", newline="", encoding="utf-8") as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=["image", "description", "price", "company"])

    # write the header row
    writer.writeheader()

    # write product data rows
    for product in products:
        writer.writerow(product)

Do not forget to import csv from the Python Standard Library:

import csv

Wow! Your Aliaba scraper is complete.

Step #8: Put It All Together

Below is the final code of your Alibaba scraping script:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import csv

# initialize a Chrome web driver instance
driver = webdriver.Chrome(service=Service())

# the URL of the target page
url = "https://www.alibaba.com/trade/search?spm=a2700.product_home_newuser.home_new_user_first_screen_fy23_pc_search_bar.keydown__Enter&tab=all&SearchText=laptop"

# connect to the target page
driver.get(url)

# where to store the scraped data
products = []

# select all product elements on the page
product_elements = driver.find_elements(By.CSS_SELECTOR, ".m-gallery-product-item-v2")

# iterate over the product nodes and scrape data from them
for product_element in product_elements:
    # extract the product details
    img_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-slider__img")
    img = img_element.get_attribute("src")

    description_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-title")
    description = description_element.text.strip()

    price_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-price-main")
    price = price_element.text.strip()

    company_element = product_element.find_element(By.CSS_SELECTOR,".search-card-e-company")
    company = company_element.text.strip()

    # create a product dictionary with the
    # scraped data
    product = {
        "img": img,
        "description": description,
        "price": price,
        "company": company
    }

    # add the product data to the array
    products.append(product)

# define the output CSV file name
csv_file_name = "products.csv"

# open the file in write mode and create a CSV writer
with open(csv_file_name, mode="w", newline="", encoding="utf-8") as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=["img", "description", "price", "company"])

    # write the header row
    writer.writeheader()

    # write product data rows
    for product in products:
        writer.writerow(product)

# close the browser
driver.quit()

In just over 60 lines of code, you just built an Alibaba scraper in Python!

Launch the scraper with the following command:

python3 script.py

Or, on Windows:

python script.py

products.csv file will appear in your project’s folder. Open it and you will see:

The finals products.csv file with all the data

Et voilà! Mission complete. The next steps? Handle pagination, deploy your script, automate its execution, and refine it further for optimal performance!

Conclusion

In this step-by-step tutorial, you learned what an Alibaba scraper is and the types of data it can retrieve. You also saw how to build a Python script to scrape Alibaba products using less than 100 lines of code.

The problem is that scraping Alibaba comes with challenges. The platform employs strict anti-bot measures and adopts interactions like pagination that make the scraping process more complex. Building a scalable and effective Alibaba scraping solution can be quite demanding.

Forget about those challenges with our Alibaba Scraper API! This dedicated solution lets you retrieve data from the target site through simple API calls—no risk of being blocked.

If web scraping is not your preferred approach, but you are still interested in product data, explore our ready-to-use Alibaba datasets!

Create a free Bright Data account today to try our scraper APIs or explore our datasets.

No credit card required