At face value, Shopify stores represent one of the most difficult challenges in data extraction. The product below represents a typical Shopify listing. The data is just about as nested as it gets.
<div class="site-box-content product-holder"><a href="/collections/ready-to-ship/products/the-eira-straight-leg" class="product-item style--one alt color--light with-secondary-image " data-js-product-item="">
<div class="box--product-image primary" style="padding-top: 120.00048000192001%"><img src="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=640" alt="The Eira - Organic Ecru" srcset="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=360 360w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=420 420w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=480 480w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=640 640w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=840 840w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=1080 1080w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=1280 1280w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=1540 1540w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=1860 1860w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=2100 2100w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=2460 2460w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&width=2820 2820w" sizes="(max-width: 768px) 50vw, (max-width: 1024px) and (orientation: portrait) 50vw, 25vw " loading="lazy" class="lazy lazyloaded" data-ratio="0.8" width="3200" height="4000" onload="this.classList.add('lazyloaded')"><span class="lazy-preloader " aria-hidden="true"><svg class="circular-loader" viewBox="25 25 50 50"><circle class="loader-path" cx="50" cy="50" r="20" fill="none" stroke-width="4"></circle></svg></span></div><div class="box--product-image secondary" style="padding-top: 120.00048000192001%"><img src="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=640" alt="The Eira - Organic Ecru" srcset="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=360 360w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=420 420w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=480 480w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=640 640w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=840 840w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=1080 1080w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=1280 1280w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=1540 1540w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=1860 1860w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=2100 2100w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=2460 2460w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&width=2820 2820w" sizes="(max-width: 768px) 50vw, (max-width: 1024px) and (orientation: portrait) 50vw, 25vw " loading="lazy" class="lazy lazyloaded" data-ratio="0.8" width="3200" height="4000" onload="this.classList.add('lazyloaded')"></div><div class="caption">
<div>
<span class="title"><span class="underline-animation">The Eira - Organic Ecru</span></span>
<span class="price text-size--smaller"><span style="display:flex;flex-direction:row">$285.00</span></span>
</div><quick-view-product class="quick-add-to-cart">
<div class="quick-add-to-cart-button">
<button class="product__add-to-cart" data-href="/products/the-eira-straight-leg" tabindex="-1">
<span class="visually-hidden">Add to cart</span>
<span class="add-to-cart__text" style="height:26px" role="img"><svg width="22" height="26" viewBox="0 0 22 26" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M6.57058 6.64336H4.49919C3.0296 6.64336 1.81555 7.78963 1.7323 9.25573L1.00454 22.0739C0.914352 23.6625 2.17916 25 3.77143 25H18.2286C19.8208 25 21.0856 23.6625 20.9955 22.0739L20.2677 9.25573C20.1844 7.78962 18.9704 6.64336 17.5008 6.64336H15.4294M6.57058 6.64336H15.4294M6.57058 6.64336V4.69231C6.57058 2.6531 8.22494 1 10.2657 1H11.7343C13.775 1 15.4294 2.6531 15.4294 4.69231V6.64336" stroke="var(--main-text)" style="fill:none!important" stroke-width="1.75"></path><path d="M10.0801 12H12.0801V20H10.0801V12Z" fill="var(--main-text)" style="stroke:none!important"></path><path d="M15.0801 15V17L7.08008 17L7.08008 15L15.0801 15Z" fill="var(--main-text)" style="stroke:none!important"></path></svg></span><span class="lazy-preloader add-to-cart__preloader" aria-hidden="true"><svg class="circular-loader" viewBox="25 25 50 50"><circle class="loader-path" cx="50" cy="50" r="20" fill="none" stroke-width="4"></circle></svg></span></button>
</div>
</quick-view-product></div><div class="product-badges-holder"></div></a></div>
It’s not impossible to extract data from the HTML above, but there’s an easier way.
Shopify Landing Pages
At https://hiutdenim.co.uk/, their landing page contains some product information, but it’s relatively limited. Scroll down far enough, and you’ll reach it.
At first glance, it seems like you’ll need to scrape every link to every section, then subsequently get and parse all these different pages. Shopify stores don’t follow the traditional methods involved in eCommerce scraping due to unique page layouts. However, there’s another way.
Shopify JSON Pages
You read that headline correctly. We can get all of the store’s products as a JSON object by default. We don’t even need BeautifulSoup or Selenium.
We just need to add /products.json
to our URL. Every Shopify site is built on top of a products.json
file.
If we can request this content (which we can), we can get all the data we could possible want. Once we’ve got it, we just need to decide which data we want to keep. You can verify this for the site we’ve been using here.
Scraping Shopify In Python
Now that we know what we’re looking for, this daunting task becomes far less difficult. Because we’re only dealing with JSON data, we have one dependency we need to install, Python Requests.
pip install requests
Individual Functions
Let’s take a look at the individual code pieces. We’ve got three separate chunks that make up the scraper.
Here’s our most important function. It actually performs the scraping logic.
def scrape_shopify(url, retries=2):
"""scrape a shopify store"""
json_url = f"{url}products.json"
items = []
success = False
while not success and retries > 0:
response = requests.get(json_url)
try:
response.raise_for_status()
products = response.json()["products"]
for product in products:
product_data = {
"title": product["title"],
"tags": product["tags"],
"id": product["id"],
"variants": product["variants"],
"images": product["images"],
"options": product["options"]
}
items.append(product_data)
success = True
except requests.RequestException as e:
print(f"Error during request: {e}, failed to get {json_url}")
except KeyError as key_error:
print(f"Failed to parse json: {key_error}")
except json.JSONDecodeError as e:
print(f"json error: {e}")
except Exception as e:
print(f"Unforeseen error: {e}")
retries-=1
print(f"Retries left: ", retries)
return items
- First, we append
products.json
to our url:json_url = f"{url}products.json"
. - We initialize an empty array,
items
. As we scrape our items, we’re going to append them to this array. Once the scrape is finished, we return the array of parsed items. - As long as we receive a good response, we retrieve the
"products"
key to get all of our products. - We pull various pieces of data from each product to create a
dict
,product_data
. product_data
gets appended to the array.- This process repeats until we’ve parsed all the products from the page.
We now have a function that performs our scrape and returns an array of products. Now, we need one that takes this array of products and writes it to a file. We could use CSV here, however this structure gets pretty nested, so we’ll use JSON. It supports more flexible data structures for later use and analysis.
def json2file(json_data, filename):
"""save json data to a file"""
try:
with open(filename, "w", encoding="utf-8") as file:
json.dump(json_data, file, indent=4)
print(f"Data successfully saved: {filename}")
except Exception as e:
print(f"failed to write json data to {filename}, ERROR: {e}")
That’s the actual code that we’re going to use. Now, we create a main
block to run our scraper.
if __name__ == "__main__":
shop_url = "https://hiutdenim.co.uk/"
items = scrape_shopify(shop_url)
json2file(items, "output.json")
Putting Everything Together
When we put it all together, our scraper looks like this. What seemed like an intricate parsing project is now a fully functional scraper that only takes up about 50 lines code.
import requests
import json
def json2file(json_data, filename):
"""save json data to a file"""
try:
with open(filename, "w", encoding="utf-8") as file:
json.dump(json_data, file, indent=4)
print(f"Data successfully saved: {filename}")
except Exception as e:
print(f"failed to write json data to {filename}, ERROR: {e}")
def scrape_shopify(url, retries=2):
"""scrape a shopify store"""
json_url = f"{url}products.json"
items = []
success = False
while not success and retries > 0:
response = requests.get(json_url)
try:
response.raise_for_status()
products = response.json()["products"]
for product in products:
product_data = {
"title": product["title"],
"tags": product["tags"],
"id": product["id"],
"variants": product["variants"],
"images": product["images"],
"options": product["options"]
}
items.append(product_data)
success = True
except requests.RequestException as e:
print(f"Error during request: {e}, failed to get {json_url}")
except KeyError as key_error:
print(f"Failed to parse json: {key_error}")
except json.JSONDecodeError as e:
print(f"json error: {e}")
except Exception as e:
print(f"Unforeseen error: {e}")
retries-=1
return items
if __name__ == "__main__":
shop_url = "https://hiutdenim.co.uk/"
items = scrape_shopify(shop_url)
json2file(items, "output.json")
The Return Data
Our data gets returned in an array of JSON objects. Each product holds a list of variants
and images
. These would be pretty difficult to accurately represent in CSV. The snippet you see below is one single product from our scrape.
{
"title": "The Valerie - Organic Denim",
"tags": [
"The Valerie",
"Women"
],
"id": 14874183401848,
"variants": [
{
"id": 54902462808440,
"title": "UK10-29 / 30",
"option1": "UK10-29",
"option2": "30",
"option3": null,
"sku": null,
"requires_shipping": true,
"taxable": true,
"featured_image": null,
"available": true,
"price": "220.00",
"grams": 0,
"compare_at_price": null,
"position": 1,
"product_id": 14874183401848,
"created_at": "2025-01-21T14:04:58+00:00",
"updated_at": "2025-02-12T17:17:54+00:00"
},
{
"id": 54902462939512,
"title": "UK12-30 / 32",
"option1": "UK12-30",
"option2": "32",
"option3": null,
"sku": null,
"requires_shipping": true,
"taxable": true,
"featured_image": null,
"available": true,
"price": "220.00",
"grams": 0,
"compare_at_price": null,
"position": 2,
"product_id": 14874183401848,
"created_at": "2025-01-21T14:04:58+00:00",
"updated_at": "2025-02-12T17:17:54+00:00"
},
{
"id": 54902463070584,
"title": "UK14-32 / 28",
"option1": "UK14-32",
"option2": "28",
"option3": null,
"sku": null,
"requires_shipping": true,
"taxable": true,
"featured_image": null,
"available": true,
"price": "220.00",
"grams": 0,
"compare_at_price": null,
"position": 3,
"product_id": 14874183401848,
"created_at": "2025-01-21T14:04:58+00:00",
"updated_at": "2025-02-12T17:17:54+00:00"
},
{
"id": 54902463496568,
"title": "UK18-36 / 30",
"option1": "UK18-36",
"option2": "30",
"option3": null,
"sku": null,
"requires_shipping": true,
"taxable": true,
"featured_image": null,
"available": true,
"price": "220.00",
"grams": 0,
"compare_at_price": null,
"position": 4,
"product_id": 14874183401848,
"created_at": "2025-01-21T14:04:58+00:00",
"updated_at": "2025-02-12T17:17:54+00:00"
}
],
"images": [
{
"id": 31828166443078,
"created_at": "2024-06-17T12:05:49+01:00",
"position": 1,
"updated_at": "2024-06-17T12:05:50+01:00",
"product_id": 14874183401848,
"variant_ids": [],
"src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_45_3_c547ba8a-681b-4486-8cd7-884000e43302.jpg?v=1718622350",
"width": 4000,
"height": 4000
},
{
"id": 31828166541382,
"created_at": "2024-06-17T12:05:49+01:00",
"position": 2,
"updated_at": "2024-06-17T12:05:51+01:00",
"product_id": 14874183401848,
"variant_ids": [],
"src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_Back_2_5909adb3-c2ab-4810-8b66-a486e8d827a8.jpg?v=1718622351",
"width": 4000,
"height": 4000
},
{
"id": 31828166508614,
"created_at": "2024-06-17T12:05:49+01:00",
"position": 3,
"updated_at": "2024-06-17T12:05:51+01:00",
"product_id": 14874183401848,
"variant_ids": [],
"src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_Front_3_4316907a-9fd8-4649-894c-4028877370e1.jpg?v=1718622351",
"width": 4000,
"height": 4000
},
{
"id": 31828166475846,
"created_at": "2024-06-17T12:05:49+01:00",
"position": 4,
"updated_at": "2024-06-17T12:05:51+01:00",
"product_id": 14874183401848,
"variant_ids": [],
"src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_Side_2_ea21477b-c1ba-4c8a-b75e-75c6427b4977.jpg?v=1718622351",
"width": 4000,
"height": 4000
}
],
"options": [
{
"name": "Waist",
"position": 1,
"values": [
"UK10-29",
"UK12-30",
"UK14-32",
"UK18-36"
]
},
{
"name": "Leg Length",
"position": 2,
"values": [
"30",
"32",
"28"
]
}
]
},
Advanced Techniques
The world isn’t perfect and it’s possible that you run into difficulty with the scraper above. you might need to scrape multiple pages, or you sometimes your scraper might get blocked.
Pagination
When you’re scraping larger stores, you’ll often run into stores with paginated results. To handle pagination, first, we want the maximum results per page. We can add the following query param: page=<PAGE_NUMBER>
to control our result pages.
We can slightly modify our scraping function to take a page in the URL and the page number.
def scrape_shopify(url, retries=2):
"""scrape a shopify store"""
json_url = f"{url}products.json"
Then, we can adjust our main
to reflect these changes.
if __name__ == "__main__":
shop_url = "https://www.allbirds.com/"
PAGES = 3
for page in range(PAGES):
items = scrape_shopify(shop_url, page=page+1)
json2file(items, f"page{page}output.json")
Proxy Integration
Sometimes you might need to use a proxy service to prevent your scraper from getting blocked. With our Shopify Proxies, it’s as simple as creating a URL with your credentials.
PROXY_URL = "http://brd-customer-<YOUR-USERNAME>-zone-<YOUR-ZONE>:<YOUR-PASSWORD>@brd.superproxy.io:33335"
proxies = {
"http": PROXY_URL,
"https": PROXY_URL
}
response = requests.get(json_url, proxies=proxies, verify="brd.crt")
Other Solutions from Bright Data
Bright Data offers powerful turnkey alternatives that eliminate the need to build complex scrapers from scratch. Use our fully optimized Shopify Scraper for seamless data extraction or access our extensive library of pre-collected datasets available in multiple formats to jumpstart your projects immediately.
Conclusion
Scraping a Shopify store doesn’t need to be an impossible task. By simply leveraging their API with products.json
, you can harvest a large amount of detailed product data quickly. You don’t even need to use an HTML parser! If you want, you can reduce development time with one of our premade scrapers, or you can get to work immediately with our datasets.
All our products come with a free trial, sign up now!
No credit card required