How To Scrape Shopify Stores With Python

Simplify Shopify data extraction by leveraging products.json and effective scraping methods.
9 min read
How to Scrape Shopify blog image

At face value, Shopify stores represent one of the most difficult challenges in data extraction. The product below represents a typical Shopify listing. The data is just about as nested as it gets.

<div class="site-box-content product-holder"><a href="/collections/ready-to-ship/products/the-eira-straight-leg" class="product-item style--one alt color--light   with-secondary-image " data-js-product-item="">

  <div class="box--product-image primary" style="padding-top: 120.00048000192001%"><img src="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=640" alt="The Eira - Organic Ecru" srcset="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=360 360w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=420 420w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=480 480w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=640 640w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=840 840w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=1080 1080w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=1280 1280w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=1540 1540w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=1860 1860w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=2100 2100w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=2460 2460w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-01_91c15dbe-7412-47b6-8f76-bdb434199203.jpg?v=1731517834&amp;width=2820 2820w" sizes="(max-width: 768px) 50vw, (max-width: 1024px) and (orientation: portrait) 50vw, 25vw " loading="lazy" class="lazy lazyloaded" data-ratio="0.8" width="3200" height="4000" onload="this.classList.add('lazyloaded')"><span class="lazy-preloader " aria-hidden="true"><svg class="circular-loader" viewBox="25 25 50 50"><circle class="loader-path" cx="50" cy="50" r="20" fill="none" stroke-width="4"></circle></svg></span></div><div class="box--product-image secondary" style="padding-top: 120.00048000192001%"><img src="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=640" alt="The Eira - Organic Ecru" srcset="//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=360 360w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=420 420w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=480 480w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=640 640w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=840 840w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=1080 1080w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=1280 1280w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=1540 1540w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=1860 1860w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=2100 2100w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=2460 2460w,//hiutdenim.co.uk/cdn/shop/files/Hiut-EiraEcru-02.jpg?v=1731517834&amp;width=2820 2820w" sizes="(max-width: 768px) 50vw, (max-width: 1024px) and (orientation: portrait) 50vw, 25vw " loading="lazy" class="lazy lazyloaded" data-ratio="0.8" width="3200" height="4000" onload="this.classList.add('lazyloaded')"></div><div class="caption">

    <div>
      <span class="title"><span class="underline-animation">The Eira - Organic Ecru</span></span>
      <span class="price text-size--smaller"><span style="display:flex;flex-direction:row">$285.00</span></span>

    </div><quick-view-product class="quick-add-to-cart">
          <div class="quick-add-to-cart-button">
            <button class="product__add-to-cart" data-href="/products/the-eira-straight-leg" tabindex="-1">
              <span class="visually-hidden">Add to cart</span>
              <span class="add-to-cart__text" style="height:26px" role="img"><svg width="22" height="26" viewBox="0 0 22 26" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M6.57058 6.64336H4.49919C3.0296 6.64336 1.81555 7.78963 1.7323 9.25573L1.00454 22.0739C0.914352 23.6625 2.17916 25 3.77143 25H18.2286C19.8208 25 21.0856 23.6625 20.9955 22.0739L20.2677 9.25573C20.1844 7.78962 18.9704 6.64336 17.5008 6.64336H15.4294M6.57058 6.64336H15.4294M6.57058 6.64336V4.69231C6.57058 2.6531 8.22494 1 10.2657 1H11.7343C13.775 1 15.4294 2.6531 15.4294 4.69231V6.64336" stroke="var(--main-text)" style="fill:none!important" stroke-width="1.75"></path><path d="M10.0801 12H12.0801V20H10.0801V12Z" fill="var(--main-text)" style="stroke:none!important"></path><path d="M15.0801 15V17L7.08008 17L7.08008 15L15.0801 15Z" fill="var(--main-text)" style="stroke:none!important"></path></svg></span><span class="lazy-preloader add-to-cart__preloader" aria-hidden="true"><svg class="circular-loader" viewBox="25 25 50 50"><circle class="loader-path" cx="50" cy="50" r="20" fill="none" stroke-width="4"></circle></svg></span></button>
          </div>
        </quick-view-product></div><div class="product-badges-holder"></div></a></div>

It’s not impossible to extract data from the HTML above, but there’s an easier way.

Shopify Landing Pages

At https://hiutdenim.co.uk/, their landing page contains some product information, but it’s relatively limited. Scroll down far enough, and you’ll reach it.

Front Page of a Shopify Store

At first glance, it seems like you’ll need to scrape every link to every section, then subsequently get and parse all these different pages. Shopify stores don’t follow the traditional methods involved in eCommerce scraping due to unique page layouts. However, there’s another way.

Shopify JSON Pages

You read that headline correctly. We can get all of the store’s products as a JSON object by default. We don’t even need BeautifulSoup or Selenium.

We just need to add /products.json to our URL. Every Shopify site is built on top of a products.json file.

Shopify JSON Page

If we can request this content (which we can), we can get all the data we could possible want. Once we’ve got it, we just need to decide which data we want to keep. You can verify this for the site we’ve been using here.

Scraping Shopify In Python

Now that we know what we’re looking for, this daunting task becomes far less difficult. Because we’re only dealing with JSON data, we have one dependency we need to install, Python Requests.

pip install requests

Individual Functions

Let’s take a look at the individual code pieces. We’ve got three separate chunks that make up the scraper.

Here’s our most important function. It actually performs the scraping logic.

def scrape_shopify(url, retries=2):
    """scrape a shopify store"""
    json_url = f"{url}products.json"
    items = []
    success = False
    while not success and retries > 0:
        response = requests.get(json_url)
        try:
            response.raise_for_status()
            products = response.json()["products"]
            for product in products:
                product_data = {
                    "title": product["title"],
                    "tags": product["tags"],
                    "id": product["id"],
                    "variants": product["variants"],
                    "images": product["images"],
                    "options": product["options"]
                }
                items.append(product_data)
            success = True
        except requests.RequestException as e:
            print(f"Error during request: {e}, failed to get {json_url}")
        except KeyError as key_error:
            print(f"Failed to parse json: {key_error}")
        except json.JSONDecodeError as e:
            print(f"json error: {e}")
        except Exception as e:
            print(f"Unforeseen error: {e}")
        retries-=1
        print(f"Retries left: ", retries)
    return items
  • First, we append products.json to our url: json_url = f"{url}products.json".
  • We initialize an empty array, items. As we scrape our items, we’re going to append them to this array. Once the scrape is finished, we return the array of parsed items.
  • As long as we receive a good response, we retrieve the "products" key to get all of our products.
  • We pull various pieces of data from each product to create a dictproduct_data.
  • product_data gets appended to the array.
  • This process repeats until we’ve parsed all the products from the page.

We now have a function that performs our scrape and returns an array of products. Now, we need one that takes this array of products and writes it to a file. We could use CSV here, however this structure gets pretty nested, so we’ll use JSON. It supports more flexible data structures for later use and analysis.

def json2file(json_data, filename):
    """save json data to a file"""
    try:
        with open(filename, "w", encoding="utf-8") as file:
            json.dump(json_data, file, indent=4)
            print(f"Data successfully saved: {filename}")
    except Exception as e:
        print(f"failed to write json data to {filename}, ERROR: {e}")

That’s the actual code that we’re going to use. Now, we create a main block to run our scraper.

if __name__ == "__main__":
    shop_url = "https://hiutdenim.co.uk/"
    items = scrape_shopify(shop_url)

    json2file(items, "output.json")

Putting Everything Together

When we put it all together, our scraper looks like this. What seemed like an intricate parsing project is now a fully functional scraper that only takes up about 50 lines code.

import requests
import json

def json2file(json_data, filename):
    """save json data to a file"""
    try:
        with open(filename, "w", encoding="utf-8") as file:
            json.dump(json_data, file, indent=4)
            print(f"Data successfully saved: {filename}")
    except Exception as e:
        print(f"failed to write json data to {filename}, ERROR: {e}")

def scrape_shopify(url, retries=2):
    """scrape a shopify store"""
    json_url = f"{url}products.json"
    items = []
    success = False
    while not success and retries > 0:
        response = requests.get(json_url)
        try:
            response.raise_for_status()
            products = response.json()["products"]
            for product in products:
                product_data = {
                    "title": product["title"],
                    "tags": product["tags"],
                    "id": product["id"],
                    "variants": product["variants"],
                    "images": product["images"],
                    "options": product["options"]
                }
                items.append(product_data)
            success = True
        except requests.RequestException as e:
            print(f"Error during request: {e}, failed to get {json_url}")
        except KeyError as key_error:
            print(f"Failed to parse json: {key_error}")
        except json.JSONDecodeError as e:
            print(f"json error: {e}")
        except Exception as e:
            print(f"Unforeseen error: {e}")
        retries-=1
    return items


if __name__ == "__main__":
    shop_url = "https://hiutdenim.co.uk/"
    items = scrape_shopify(shop_url)

    json2file(items, "output.json")

The Return Data

Our data gets returned in an array of JSON objects. Each product holds a list of variants and images. These would be pretty difficult to accurately represent in CSV. The snippet you see below is one single product from our scrape.

{
        "title": "The Valerie - Organic Denim",
        "tags": [
            "The Valerie",
            "Women"
        ],
        "id": 14874183401848,
        "variants": [
            {
                "id": 54902462808440,
                "title": "UK10-29 / 30",
                "option1": "UK10-29",
                "option2": "30",
                "option3": null,
                "sku": null,
                "requires_shipping": true,
                "taxable": true,
                "featured_image": null,
                "available": true,
                "price": "220.00",
                "grams": 0,
                "compare_at_price": null,
                "position": 1,
                "product_id": 14874183401848,
                "created_at": "2025-01-21T14:04:58+00:00",
                "updated_at": "2025-02-12T17:17:54+00:00"
            },
            {
                "id": 54902462939512,
                "title": "UK12-30 / 32",
                "option1": "UK12-30",
                "option2": "32",
                "option3": null,
                "sku": null,
                "requires_shipping": true,
                "taxable": true,
                "featured_image": null,
                "available": true,
                "price": "220.00",
                "grams": 0,
                "compare_at_price": null,
                "position": 2,
                "product_id": 14874183401848,
                "created_at": "2025-01-21T14:04:58+00:00",
                "updated_at": "2025-02-12T17:17:54+00:00"
            },
            {
                "id": 54902463070584,
                "title": "UK14-32 / 28",
                "option1": "UK14-32",
                "option2": "28",
                "option3": null,
                "sku": null,
                "requires_shipping": true,
                "taxable": true,
                "featured_image": null,
                "available": true,
                "price": "220.00",
                "grams": 0,
                "compare_at_price": null,
                "position": 3,
                "product_id": 14874183401848,
                "created_at": "2025-01-21T14:04:58+00:00",
                "updated_at": "2025-02-12T17:17:54+00:00"
            },
            {
                "id": 54902463496568,
                "title": "UK18-36 / 30",
                "option1": "UK18-36",
                "option2": "30",
                "option3": null,
                "sku": null,
                "requires_shipping": true,
                "taxable": true,
                "featured_image": null,
                "available": true,
                "price": "220.00",
                "grams": 0,
                "compare_at_price": null,
                "position": 4,
                "product_id": 14874183401848,
                "created_at": "2025-01-21T14:04:58+00:00",
                "updated_at": "2025-02-12T17:17:54+00:00"
            }
        ],
        "images": [
            {
                "id": 31828166443078,
                "created_at": "2024-06-17T12:05:49+01:00",
                "position": 1,
                "updated_at": "2024-06-17T12:05:50+01:00",
                "product_id": 14874183401848,
                "variant_ids": [],
                "src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_45_3_c547ba8a-681b-4486-8cd7-884000e43302.jpg?v=1718622350",
                "width": 4000,
                "height": 4000
            },
            {
                "id": 31828166541382,
                "created_at": "2024-06-17T12:05:49+01:00",
                "position": 2,
                "updated_at": "2024-06-17T12:05:51+01:00",
                "product_id": 14874183401848,
                "variant_ids": [],
                "src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_Back_2_5909adb3-c2ab-4810-8b66-a486e8d827a8.jpg?v=1718622351",
                "width": 4000,
                "height": 4000
            },
            {
                "id": 31828166508614,
                "created_at": "2024-06-17T12:05:49+01:00",
                "position": 3,
                "updated_at": "2024-06-17T12:05:51+01:00",
                "product_id": 14874183401848,
                "variant_ids": [],
                "src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_Front_3_4316907a-9fd8-4649-894c-4028877370e1.jpg?v=1718622351",
                "width": 4000,
                "height": 4000
            },
            {
                "id": 31828166475846,
                "created_at": "2024-06-17T12:05:49+01:00",
                "position": 4,
                "updated_at": "2024-06-17T12:05:51+01:00",
                "product_id": 14874183401848,
                "variant_ids": [],
                "src": "https://cdn.shopify.com/s/files/1/0065/4242/files/HDC_0723_JapanInd_Valerie_Side_2_ea21477b-c1ba-4c8a-b75e-75c6427b4977.jpg?v=1718622351",
                "width": 4000,
                "height": 4000
            }
        ],
        "options": [
            {
                "name": "Waist",
                "position": 1,
                "values": [
                    "UK10-29",
                    "UK12-30",
                    "UK14-32",
                    "UK18-36"
                ]
            },
            {
                "name": "Leg Length",
                "position": 2,
                "values": [
                    "30",
                    "32",
                    "28"
                ]
            }
        ]
    },

Advanced Techniques

The world isn’t perfect and it’s possible that you run into difficulty with the scraper above. you might need to scrape multiple pages, or you sometimes your scraper might get blocked.

Pagination

When you’re scraping larger stores, you’ll often run into stores with paginated results. To handle pagination, first, we want the maximum results per page. We can add the following query param: page=<PAGE_NUMBER> to control our result pages.

We can slightly modify our scraping function to take a page in the URL and the page number.

def scrape_shopify(url, retries=2):
    """scrape a shopify store"""
    json_url = f"{url}products.json"

Then, we can adjust our main to reflect these changes.

if __name__ == "__main__":
    shop_url = "https://www.allbirds.com/"
    PAGES = 3

    for page in range(PAGES):
        items = scrape_shopify(shop_url, page=page+1)

        json2file(items, f"page{page}output.json")

Proxy Integration

Sometimes you might need to use a proxy service to prevent your scraper from getting blocked. With our Shopify Proxies, it’s as simple as creating a URL with your credentials.

PROXY_URL = "http://brd-customer-<YOUR-USERNAME>-zone-<YOUR-ZONE>:<YOUR-PASSWORD>@brd.superproxy.io:33335"
proxies = {
    "http": PROXY_URL,
    "https": PROXY_URL
}
response = requests.get(json_url, proxies=proxies, verify="brd.crt")

Other Solutions from Bright Data

Bright Data offers powerful turnkey alternatives that eliminate the need to build complex scrapers from scratch. Use our fully optimized Shopify Scraper for seamless data extraction or access our extensive library of pre-collected datasets available in multiple formats to jumpstart your projects immediately.

Conclusion

Scraping a Shopify store doesn’t need to be an impossible task. By simply leveraging their API with products.json, you can harvest a large amount of detailed product data quickly. You don’t even need to use an HTML parser! If you want, you can reduce development time with one of our premade scrapers, or you can get to work immediately with our datasets.

All our products come with a free trial, sign up now!

No credit card required