In this guide, you will learn the following:
- What an AliExpress scraper is and how it works
- The types of data you can automatically retrieve from AliExpress
- How to build an AliExpress scraping script using Python
Let’s dive in!
What Is an AliExpress Scraper?
An AliExpress scraper automatically retrieves specific data from AliExpress pages. It navigates AliExpress pages by mimicking user’s browsing habits. It transforms web page content into a usable format—such as CSV or JSON—and controls interactions like pagination. Its end goal it to retrieve structured information such as product images, product details, customer feedback, pricing, and more.
If you want to learn more about building web scrapers, read our guide on how to build a scraping bot.
Data You Can Scrape From AliExpress: Step-By-Step Guide
AliExpress contains a vast amount of information, such as:
- Product details: Names, descriptions, images, price ranges, seller information, and more.
- Customer feedback: Ratings, product reviews, and more.
- Categories and tags: Product categories, relevant tags, or labels.
Time to learn how to scrape them!
Scraping AliExpress in Python
This tutorial section provides a step-by-step guide on building an AliExpress scraper.
The goal is to walk you through writing a Python script that automatically pulls information from the AliExpress “ergonomic chair” page:
Step #1: Project Setup
Ensure you have Python 3 installed on your local computer. If not, download it from the official documentation and follow the installation wizard to set it up.
Next, use the command below to create your project directory:
mkdir aliexpress-scraper
This directory is going to contain your Python code.
Enter the directory in your terminal, and create a virtual environment inside it:
cd aliexpress-scraper
python -m venv env
Go ahead and load the project folder in your preferred Python IDE, such as Visual Studio Code with the Python extension.
In your IDE’s terminal, activate the virtual environment. Execute the following command if you are using macOS or Linux:
.env/bin/activate
Equivalently, on Windows, use this command:
env/Scripts/activate
Good!
In your root project directory, create a scraper.py
file. Your project should now have this folder structure:
Sweet! Your Python environment for AliExpress web scraping is ready.
Step #2: Select the Scraping Library
The current objective is to determine whether AliExpress employs dynamic or static pages. Navigate to your target AliExpress page in private or incognito mode in your browser. Then, right-click on an empty space on the background of the webpage, choose the “Inspect” option navigate to the “Network” tab, apply the “Fetch/XHR” filter, and refresh the page:
Check to see if the page makes any dynamic queries in this DevTools section. After refreshing the page, you will notice multiple Fetch/XHR requests. This indicates that the page uses dynamic requests to load additional content. If you take a look at the page DOM compared to the HTML document returned by the server, you will also see that AliExpress uses JavaScript rendering.
To scrape AliExpress effectively, you will need a browser automation tool like Selenium, as the target page relies on JavaScript for rendering. Our blog on Selenium web scraping is an excellent resource for beginners.
With Selenium, you can manipulate a web browser, mimic user interactions, and scrape JavaScript-rendered content. Install it and start using it!
Step #3: Install and Configure Selenium
In the activated virtual environment, install Selenium with this command:
pip install -U selenium
In the scraper.py
file, import WebDriver
from Selenium and initialize it.
from selenium import webdriver
# Initialize Chrome driver
driver = webdriver.Chrome()
# scraping logic...
# Close the driver
driver.quit()
A WebDriver
is initialized in the code above to handle a Chrome instance. It is worth noting that AliExpress has anti-scraping measures in place that could prevent headless browsers from accessing the site.
It is therefore not advisable to set the --headless
flag. Instead, consider an alternative option such as Playwright Stealth.
Now that you are fully configured to begin scraping AliExpress, let us examine how to connect to the target page.
Step #4: Connect to the Target Page
Use the get()
method exposed by the Selenium WebDriver object to visit the target page. The scraper.py
file should now look like this:
from selenium import webdriver
# Initialize Chrome driver
driver = webdriver.Chrome()
# Url of the target page
url = "https://www.aliexpress.com/w/wholesale-ergonomic-chair.html?spm=a2g0o.productlist.search.0"
# Connect to the target page
driver.get(url)
# scraping logic...
# Close the driver
driver.quit()
Place a debugging breakpoint on the final line and launch the script with the debugger. The controlled Chrome browser should automatically open as shown below:
Great! The “Chrome is being controlled by automated test software” notification indicates that Selenium is successfully controlling Chrome as configured.
Step #5: Select the Product Elements
Since the AliExpress product page contains multiple products, you must first initialize a data structure to store the scraped data. For this purpose, an array will work perfectly:
products = []
To ensure your scraper keeps working even when the site’s layout changes, you should create a helper function that makes your selectors more resilient to those changes:
def find_element_smart(parent, by_list):
"""Try multiple selectors until an element is found"""
for by_type, selector in by_list:
try:
element = parent.find_element(by_type, selector)
if element.is_displayed():
return element
except:
continue
return None
The find_element_smart()
function iterates through a list of by_list
selector strategies to locate an element within a given parent element. It tries each <by_type, selector>
pair until it finds a visible element, returning it if successful. Otherwise, it returns None
if no matching element is found.
Next, inspect the HTML elements of the products on the page to understand how to select them, identify the type of data they contain, and determine how to extract that data.
It is obvious that each product element is a .list-–gallery—-C2f2tvm
node.
Note that list--gallery--C2f2tvm
could change at any time as it contains a randomly generated string. So, you should not rely on that class for element selection. Instead, you should start by finding products based on their structure—like div
elements that contain both images and links. If that does not work, try looking for products based on their content or focus on more specific HTML elements.
Implement the product selection logic as below:
# Find products using structural patterns first, then fallback to class patterns
product_selectors = [
(By.XPATH, "//div[.//img and .//a[contains(@href, 'item')]]"),
(By.XPATH, "//div[.//img and .//*[contains(text(), '$')]]"),
(By.CSS_SELECTOR, "div[class*='gallery']")
]
# Wait for and get products
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div[class*='gallery']")))
products_found = []
for selector_type, selector in product_selectors:
try:
elements = wait.until(EC.presence_of_all_elements_located((selector_type, selector)))
if elements:
products_found = elements
break
except:
continue
The code above applies the selector strategy to retrieve elements on the page with generic CSS selectors.
Include the following import in your Python script:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
Then, introduce a WebDriverWait
instance right after initializing the WebDriver
but before any page interactions:
wait = WebDriverWait(driver, 20)
Instead of immediately finding elements on a page when scraping dynamic websites like AliExpress, WebDriverWait
tells the scraper to be patient and wait up to the specified amount of time (20 seconds in this case) for the elements to appear. This is important because web pages load elements at different speeds, and without proper waiting, the scraper might with elements that have not loaded yet, causing errors.
You are now just a step closer to completely scraping AliExpress!
Step #6: Scrape the AliExpress Product Elements
Inspect a product element to understand its HTML structure:
It is evident that you can scrape the product image, URL, name or title, price, and discount.
Before scraping each product, verify if it is visible in the viewport:
wait.until(EC.visibility_of(product))
Now, set up selectors to scrape each product’s data. Instead of using specific class names that could break, use patterns like these:
# Get image - look for product images by source patterns
img_element = find_element_smart(product, [
(By.XPATH, ".//img[contains(@src, 'item') or contains(@src, 'product')]"),
(By.CSS_SELECTOR, "img[src*='item']"),
(By.CSS_SELECTOR, "img[class*='image']")
])
# Get URL - look for product links
url_element = find_element_smart(product, [
(By.CSS_SELECTOR, "a[href*='item']"),
(By.XPATH, ".//a[contains(@href, 'product')]")
])
# Get title - look for longest text element first
price_element = find_element_smart(product, [
(By.CSS_SELECTOR, "*[class*='price-sale']"),
(By.CSS_SELECTOR, "*[class*='price']"),
(By.XPATH, ".//*[contains(@class, 'price')]")
])
# Get price - look for currency symbols/patterns
price_element = find_element_smart(product, [
(By.XPATH, ".//*[contains(text(), '$') or contains(text(), 'US') or contains(text(), 'GHS')]"),
(By.XPATH, ".//*[contains(@class, 'price')]")
])
# Try to get discount if available
discount_element = find_element_smart(product, [
(By.XPATH, ".//*[contains(text(), '%') or contains(text(), 'OFF')]"),
(By.CSS_SELECTOR, "[class*='discount']")
])
The find_element()
function returns the first element that matches the specified CSS selector. You can then use the text attribute to extract its text content.
Add the scraped data to the products array and use it to populate a product dictionary:
products.append({
"image_url": img_element.get_attribute("src"),
"product_url": url_element.get_attribute("href"),
"product_title": title_element.text.strip(),
"product_price": price_element.text.strip(),
"product_discount": discount_element.text.strip() if discount_element else "N/A"
})
Your data extraction logic is now complete and ready for use.
Step #7: Export the Scraped Data to CSV
In your current setup, the scraped data is stored in the products array. To make it shareable and accessible to others, you need to export it into a human-readable format such as a CSV file. Here is how you can create and populate a CSV file with the scraped data:
# Write data to CSV
csv_file_name = "aliexpress_products.csv"
with open(csv_file_name, mode="w", newline="", encoding="utf-8") as csv_file:
fieldnames = ["image_url", "product_url", "product_title", "product_price", "product_discount"]
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()
for product in products:
writer.writerow(product)
This code creates a CSV file that works like a spreadsheet – each product gets its own row, and different details about the product (image, URL, title, price, and any discount) go into separate columns. When you open the final aliexpress_products.csv
file you will see all your scraped AliExpress product information laid out neatly in columns.
Lastly, from the Python Standard Library, import the csv
library in your script :
import csv
Step #8: Put It All Together
This is what your final scraping script should look like after putting the all the code together:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import csv
def find_element_smart(parent, by_list):
"""Try multiple selectors until an element is found"""
for by_type, selector in by_list:
try:
element = parent.find_element(by_type, selector)
if element.is_displayed():
return element
except:
continue
return None
# Initialize driver
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
# Target URL
url = "https://www.aliexpress.com/w/wholesale-ergonomic-chair.html?spm=a2g0o.productlist.search.0"
driver.get(url)
# Wait for initial products to load
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div[class*='gallery']")))
# Where to store the scraped data
products = []
# Find products using patterns first, then fallback to class patterns
product_selectors = [
(By.XPATH, "//div[.//img and .//a[contains(@href, 'item')]]"),
(By.XPATH, "//div[.//img and .//*[contains(text(), '$')]]"),
(By.CSS_SELECTOR, "div[class*='gallery']")
]
# Wait for and get products
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div[class*='gallery']")))
products_found = []
for selector_type, selector in product_selectors:
try:
elements = wait.until(EC.presence_of_all_elements_located((selector_type, selector)))
if elements:
products_found = elements
break
except:
continue
# Iterate over the product found and scrape data from them
for product in products_found:
# Wait for product to be visible and interactable
wait.until(EC.visibility_of(product))
# Get image - look for product images by source patterns
img_element = find_element_smart(product, [
(By.XPATH, ".//img[contains(@src, 'item') or contains(@src, 'product')]"),
(By.CSS_SELECTOR, "img[src*='item']"),
(By.CSS_SELECTOR, "img[class*='image']")
])
# Get URL - look for product links
url_element = find_element_smart(product, [
(By.CSS_SELECTOR, "a[href*='item']"),
(By.XPATH, ".//a[contains(@href, 'product')]")
])
# Get title - look for longest text element first
title_element = find_element_smart(product, [
(By.XPATH, ".//div[string-length(text()) > 20]"),
(By.XPATH, ".//*[contains(@class, 'title')]"),
(By.CSS_SELECTOR, "[class*='name']")
])
# Get price
price_element = find_element_smart(product, [
(By.CSS_SELECTOR, "*[class*='price-sale']"),
(By.CSS_SELECTOR, "*[class*='price']"),
(By.XPATH, ".//*[contains(@class, 'price')]")
])
if all([img_element, url_element, title_element, price_element]):
# Get discount if available
discount_element = find_element_smart(product, [
(By.XPATH, ".//*[contains(text(), '%') or contains(text(), 'OFF')]"),
(By.CSS_SELECTOR, "[class*='discount']")
])
products.append({
"image_url": img_element.get_attribute("src"),
"product_url": url_element.get_attribute("href"),
"product_title": title_element.text.strip(),
"product_price": price_element.text.strip(),
"product_discount": discount_element.text.strip() if discount_element else "N/A"
})
# Save results
csv_file_name = "aliexpress_products.csv"
with open(csv_file_name, mode="w", newline="", encoding="utf-8") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["image_url", "product_url", "product_title", "product_price", "product_discount"])
writer.writeheader()
writer.writerows(products)
driver.quit()
Now, launch the scraper with the following command:
python scraper.py
The script should run successfully, and the aliexpress_products.csv
file should contain the extracted data as shown:
There are several additional steps you can take after assembling a functional scraping script. These include automating the execution process and implementing optimizations to ensure the scraper continues to deliver valuable data over time.
Conclusion
In this guide, you explored what an AliExpress scraper is and the types of data it can extract. You also learned how to create a Python script for scraping AliExpress products with minimal code.
However, scraping AliExpress presents several challenges. The platform implements stringent anti-bot protections and uses features like pagination, which add complexity to the scraping process. Developing a formidable Alibaba scraping solution can be quite challenging.
Our AliExpress Scraper API offers a specialized solution that will enable you eliminate those challenges. With straightforward API calls, you can seamlessly fetch data from the target site while mitigating the risk of being blocked. Need the data quick?
Want to try our scraper APIs or explore our datasets? Create a Bright Data account today and start your free trial!
No credit card required