Web Scraping Using Bright Data’s Scraping Browser on Apify

Discover how to integrate Bright Data’s Scraping Browser with Apify to improve web scraping efficiency, reduce costs, and bypass anti-bot challenges.
17 min read
Scraping Browser + Apify blog image

In this blog post, you will learn:

  • What Apify is
  • Why using Scraping Browser with Apify is a win-win scenario
  • How to integrate Bright Data’s Scraping Browser into an Apify Python script
  • How to use Bright Data’s proxies on Apify

Let’s dive in!

What Is Apify?

Apify is a full-stack web scraping and data extraction platform. It allows you to create and run custom web scraping tools—known as Actors—in the cloud. These Actors automate tasks related to data collection, processing, and automation.

On Apify, you can monetize your scraping scripts by making them public and available to other users. Whether you plan to utilize your Actor privately or make it public, Bright Data’s scraping solutions will help make your scraper more reliable and effective.

Why Use Bright Data’s Scraping Browser on Apify

To appreciate the value of Bright Data’s Scraping Browser, you must understand what the tool is and what it offers.

The biggest limitation of browser automation tools is not their APIs, but rather the browsers they control. Scraping Browser is a next-generation web browser specifically designed for web scraping. In particular, it comes with the following key features:

  • Reliable TLS fingerprints to avoid detection
  • Unlimited scalability for large-scale data extraction
  • Built-in IP rotation powered by a 72-million IP proxy network
  • Automatic retries to handle failed requests
  • CAPTCHA-solving capabilities

Scraping Browser is compatible with all major browser automation frameworks—including Playwright, Puppeteer, and Selenium. So, you do not need to learn a new API or install third-party dependencies. You can simply integrate it directly into your existing browser automation scraping script.

Now, using Scraping Browser with Apify brings even more advantages, resulting in the following benefits:

  • Reduced cloud costs: Browsers consume significant resources, which leads to higher CPU and RAM usage. Scraping Browser, hosted on the cloud with guaranteed unlimited scalability, reduces cloud costs during actor runs on Apify. Since Apify charges by server usage, even when factoring in Scraping Browser fees, this setup can result in cost savings.
  • All-in-one anti-bot bypass tool: Scraping Browser tackles IP bans, CAPTCHA challenges, browser fingerprint issues, and other anti-scraping barriers. That makes your scraping process more efficient and less prone to disruption.
  • Built-in Proxy Integration: Scraping Browser includes proxy management, so you no longer need to worry about maintaining and manually rotating proxies.
  • Apify Benefits: Using Scraping Browser on a cloud Apify actor (instead of a generic script) offers additional benefits, such as:
    • Easy deployment
    • Programmatic data access via API
    • Simple data export
    • Easy input argument configurations
    • Scalability for large projects

The Bright Data + Apify integration not only simplifies your scraping workflow but also improves reliability. Also, it reduces the time and effort needed to bring your web scraping bot online.

How to Integrate Bright Data’s Scraping Browser on Apify: Step-by-Step Guide

The target site for this section will be Amazon, a platform rich in information but notorious for its strict anti-bot measures. Without the right tools, you are likely to encounter the infamous Amazon CAPTCHA, blocking your scraping attempts:

The Amazon CAPTCHA blocking your script

In this section, we will build a scraping Actor that leverages Bright Data’s Scraping Browser to extract data from a generic Amazon product search page:

The target Amazon product search page

Note: The Actor will be written in Python, but remember that Apify also supports JavaScript.

Follow the steps below to learn how to integrate Bright Data’s scraping tools with Apify!

Prerequisites

To follow this tutorial, you need to meet the following prerequisites:

Step #1: Project Setup

The easiest way to set up a new Apify Actor project is by using the Apify CLI. First, install it globally via Node.js with the following command:

npm install -g apify-cli

Then, create a new Apify project by running:

npx apify-cli create

You will be prompted to answer a few questions. Answer as follows:

✔ Name of your new Actor: amazon-scraper
✔ Choose the programming language of your new Actor: Python
✔ Choose a template for your new Actor. Detailed information about the template will be shown in the next step.
Playwright + Chrome

This way, the Apify CLI will create a new Python Actor in the amazon-scraper folder using the “Playwright + Chrome” template. If you are not familiar with those tools, read our guide on Playwright web scraping.

Note: A Selenium or Puppeteer template would also work, as Bright Data’s Scraping Browser integrates with any browser automation tool.

Your Apify Actor project will have the following structure:

amazon-scraper
│── .dockerignore
│── .gitignore
│── README.md
│── requirements.txt
│
├── .venv/
│   └── ...
│
├── .actor/
│   │── actor.json
│   │── Dockerfile
│   └── input_schema.json
│
├── src/
│   │── main.py
│   │── __init__.py
│   │── __main__.py
│   └── py.typed
│
└── storage/
    └── ...

Load the amazon-scraper folder in your preferred Python IDE, such as Visual Studio Code with the Python extension or PyCharm Community Edition.

Now, keep in mind that to run the Actor locally, Playwright’s browsers must be installed. To do so, first activate the virtual environment folder (.venv) inside your project directory. On Windows, run:

.venv/Scripts/activate

Equivalently, on Linux/macOS, launch:

source .venv/bin/activate

Then, install the required Playwright dependencies by executing:

playwright install --with-deps

Wonderful! You can now run your Actor locally with:

apify run

Your Apify project is now fully set up and ready to be integrated with Bright Data’s Scraping Browser!

Step #2: Connect to the Target Page

If you take a look at the URL of an Amazon search results page, you will notice it follows this format:

https://www.amazon.com/search/s?k=<keyword>

For example:

Note the URL of the target page

The target URL of your script should use that format, where <keyword> can be dynamically set using an Apify input argument. The input parameters that an Actor accepts are defined in the input_schema.json file, located in the .actor directory.

Defining the keyword argument makes the script customizable, allowing users to specify the search term they prefer. To define that parameter, replace the contents of input_schema.json with the following one:

{
    "title": "Amazon Scraper",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "keyword": {
            "title": "Search keyword",
            "type": "string",
            "description": "The keyword used to search products on Amazon",
            "editor": "textfield"
        }
    },
    "required": ["keyword"]
}

This configuration defines a required keyword parameter of type string.

To set the keyword argument when running the Actor locally, modify the INPUT.json file inside storage/key_value_stores/default as follows:

{
    "keyword": "laptop"
}

This way, the Actor will read the keyword input argument, using "laptop" as the search term.

Once the Actor is deployed to the Apify platform, you will see an input field where you can customize this parameter before running the Actor:

The configured keyword text field in the Apify Console

Keep in mind that the entry file of an Apify Actor is main.py, located in the src folder. Open this file and modify it to:

  1. Read the keyword parameter from the input arguments
  2. Construct the target URL for the Amazon search page
  3. Use Playwright to navigate to that page

By the end of this step, your main.py file should contain the Python logic below:

from apify import Actor
from playwright.async_api import async_playwright


async def main() -> None:
    # Enter the context of the Actor
    async with Actor:
        # Retrieve the Actor input, and use default values if not provided
        actor_input = await Actor.get_input() or {}
        # Reading the "keyword" argument from the input data, assigning it the
        # value "laptop" as a default value
        keyword = actor_input.get("keyword")

        # Building the target url
        target_url = f"https://www.amazon.com/search/s?k={keyword}"

        # Launch Playwright and open a new browser context
        async with async_playwright() as playwright:
            # Configure the browser to launch in headless mode as per Actor configuration
            browser = await playwright.chromium.launch(
                headless=Actor.config.headless,
                args=["--disable-gpu"],
            )
            context = await browser.new_context()

            try:
                # Open a new page in the browser context and navigate to the URL
                page = await context.new_page()
                await page.goto(target_url)

                # Scraping logic...

            except Exception:
                Actor.log.exception(f"Cannot extract data from {target_url}")

            finally:
                await page.close()

The above code:

  1. Initializes an Apify Actor to manage the script lifecycle
  2. Retrieves input arguments using Actor.get_input()
  3. Extracts the keyword argument from the input data
  4. Constructs the target URL using a Python f-string
  5. Launches Playwright and starts a headless Chromium browser with GPU disabled
  6. Creates a new browser context, opens a page, and navigates to the target URL using page.goto()
  7. Logs any errors with Actor.log.exception()
  8. Ensures the Playwright page is closed after execution

Perfect! Your Apify Actor is ready to leverage Bright Data’s Scraping Browser for efficient web scraping.

Step #3: Integrate Bright Data’s Scraping Browser

Now, use the Playwright API to capture a screenshot after connecting to the target page:

await page.screenshot(path="screenshot.png")

Run your Actor locally, and it will generate a screenshot.png file in the project folder. Open it, and you will likely see something like this:

The Amazon CAPTCHA blocking your script

Similarly, you might get the following Amazon error page:

The Amazon error page

As you can see, your web scraping bot has been blocked by Amazon’s anti-bot measures. This is just one of many challenges you may encounter when scraping Amazon or other popular websites.

Forget about those challenges by using Bright Data’s Scraping Browser—a cloud-based scraping solution that provides unlimited scalability, automatic IP rotation, CAPTCHA solving, and anti-scraping bypass.

To get started, if you have not already, create a Bright Data account. Then, log into the platform. In the “User Dashboard” section, click the “Get proxy products” button:

Pressing the "Get proxy products" button

In the “My Zones” table on the “Proxies & Scraping Infrastructure” page, select the “scraping_browser” row:

Selecting the Scraping Browser product

Enable the product by toggling the on/off switch:

Enabling Scraping Browser

Now, in the “Configuration” tab, verify that both “Premium domains” and “CAPTCHA Solver” options are enabled for maximum effectiveness:

Enabling the "Premium domains" and "CAPTCHA Solver" options

In the “Overview” tab, copy the Playwright Scraping Browser connection string:

Copying the Puppeteer / Playwright Scraping Browser connection string to the clipboard

Add the connection string to your main.py file as a constant:

SBR_WS_CDP = "<YOUR_PLAYWRIGHT_SCRAPING_BROWSER_CONNECTION_STRING>"

Replace <YOUR_PLAYWRIGHT_SCRAPING_BROWSER_CONNECTION_STRING> with the connection string you copied before.

Note: If you plan to make your Actor public on Apify, you should define SBR_WS_CDP as an Apify Actor input argument. That way, users adopting your Actor will be able to integrate their own Scraping Browser connection strings.

Now, update the browser definition in main.py to use Scraping Browser with Playwright:

browser = await playwright.chromium.connect_over_cdp(SBR_WS_CDP, timeout=120000)

Note that the connection timeout should be set to a higher value than usual, as IP rotation through proxies and CAPTCHA solving can take some time.

Done! You successfully integrated Scraping Browser into Playwright within an Apify Actor.

Step #4: Prepare to Scrape All Product Listings

To scrape product listings from Amazon, you first need to inspect the page to understand its HTML structure. To do so, right-click on one of the product elements on the page and select the “Inspect” option. The following DevTools section will appear:

Inspecting the product listing element

Here, you can see that each product listing element can be selected using this CSS selector:

[data-component-type=\"s-search-result\"]

Targeting custom data-* attributes is ideal because these attributes are generally employed for testing or monitoring. Thus, they tend to remain consistent over time.

Now, use a Playwright locator to retrieve all product elements on the page:

product_elements = page.locator("[data-component-type=\"s-search-result\"]")

Next, iterate over the product elements, and prepare to extract data from them:

for product_element in await product_elements.all():
    # Data extraction logic...

Amazing! Time to implement the Amazon data extraction logic.

Step #5: Implement the Scraping Logic

First, inspect an individual product listing element:

The product image element

From this section, you can retrieve the product image from the src attribute of the .s-image element:

image_element = product_element.locator(".s-image").nth(0)
image = await image_element.get_attribute("src")

Note that nth(0) is required to get the first HTML element matching the locator.

Next, inspect the product title:

The product title element

You can gather the product URL and title from the <a> and <h2> elements inside the [data-cy="title-recipe"] element, respectively:

title_header_element = product_element.locator("[data-cy=\"title-recipe\"]").nth(0)

link_element = title_header_element.locator("a").nth(0)
url = None if url_text == "javascript:void(0)" else "https://amazon.com" + url_text

title_element = title_header_element.locator("h2").nth(0)
title = await title_element.get_attribute("aria-label")

Note the logic used to ignore “javascript:void(0)” URLs (which appear on special ad products) and the handling to convert the product URLs into absolute ones.

Then, look at the review section:

The product review element

From [data-cy="reviews-block"], you can get the review rating from the aria-label of the <a> element:

rating_element =  product_element.locator("[data-cy=\"reviews-block\"] a").nth(0)
rating_text = await rating_element.get_attribute("aria-label")
rating_match = re.search(r"(\d+(\.\d+)?) out of 5 stars", rating_text)
if rating_match:
    rating = rating_match.group(1)
else:
    rating = None

Since the rating text in aria-label is in the “X out of 5 stars” format, you can extract the rating value X with a simple regex. See how to use regex for web scraping.

Do not forget to import re from the Python Standard Library:

import re

Now, inspect the review count element:

The review count element

Extract the number of reviews from the <a> element within [data-component-type="s-client-side-analytics"]:

review_count_element = product_element.locator("[data-component-type=\"s-client-side-analytics\"] a").nth(0)
review_count_text = await review_count_element.text_content()
review_count = int(review_count_text.replace(",", ""))

Notice the straightforward logic to convert a string like “2,539” into a numerical value in Python.

Finally, inspect the product price node:

The product price element

Collect the product price from the .a-offscreen element inside [data-cy="price-recipe"]:

price_element_locator = product_element.locator("[data-cy=\"price-recipe\"] .a-offscreen")
# If the price element is on the product element
if await price_element_locator.count() > 0:
    price = await price_element_locator.nth(0).text_content()
else:
    price = None

Since not all products have a price element, you should handle that scenario by checking the count of the price element before attempting to retrieve its value.

To make the script work, update the Playwright import with:

from playwright.async_api import async_playwright, TimeoutError

Beautiful! The Amazon product data scraping logic is complete.

Note that the goal of this article is not to dive deep into Amazon’s scraping logic. For more guidance, follow our guide on how to scrape Amazon product data in Python.

Step #6: Collect the Scraped Data

As the last instruction of the for loop, populate a product object with the scraped data:

product = {
    "image": image,
    "url": url,
    "title": title,
    "rating": rating,
    "review_count": review_count,
    "price": price
}

Then, push it to the Apify dataset:

await Actor.push_data(product)

push_data() guarantees that the scraped data is registered on Apify, allowing you to access it via API or export it in one of the many supported formats (e.g., CSV, JSON, Excel, JSONL, etc.).

Step #7: Put It All Together

This is what your final Apify + Bright Data Actor main.py should contain:

from apify import Actor
from playwright.async_api import async_playwright, TimeoutError
import re

async def main() -> None:
    # Enter the context of the Actor
    async with Actor:
        # Retrieve the Actor input, and use default values if not provided
        actor_input = await Actor.get_input() or {}
        # Reading the "keyword" argument from the input data, assigning it the
        # value "laptop" as a default value
        keyword = actor_input.get("keyword")

        # Building the target url
        target_url = f"https://www.amazon.com/search/s?k={keyword}"

        # Launch Playwright and open a new browser context
        async with async_playwright() as playwright:
            # Your Bright Data Scraping API connection string
            SBR_WS_CDP = "wss://brd-customer-hl_4bcb8ada-zone-scraping_browser:w27x5sym33v7@brd.superproxy.io:9222"

            # Configure Playwright to connect to Scraping Browser and open a new context
            browser = await playwright.chromium.connect_over_cdp(SBR_WS_CDP, timeout=120000)
            context = await browser.new_context()

            try:
                # Open a new page in the browser context and navigate to the URL
                page = await context.new_page()
                await page.goto(target_url)

                # Use a locator to select all product elements
                product_elements = page.locator("[data-component-type=\"s-search-result\"]")

                # Iterate over all product elements and scrape data from them
                for product_element in await product_elements.all():
                    # Product scraping logic
                    image_element = product_element.locator(".s-image").nth(0)
                    image = await image_element.get_attribute("src")

                    title_header_element = product_element.locator("[data-cy=\"title-recipe\"]").nth(0)

                    link_element = title_header_element.locator("a").nth(0)
                    url_text = await link_element.get_attribute("href")
                    url = None if url_text == "javascript:void(0)" else "https://amazon.com" + url_text

                    title_element = title_header_element.locator("h2").nth(0)
                    title = await title_element.get_attribute("aria-label")

                    rating_element =  product_element.locator("[data-cy=\"reviews-block\"] a").nth(0)
                    rating_text = await rating_element.get_attribute("aria-label")
                    rating_match = re.search(r"(\d+(\.\d+)?) out of 5 stars", rating_text)
                    if rating_match:
                        rating = rating_match.group(1)
                    else:
                        rating = None

                    review_count_element = product_element.locator("[data-component-type=\"s-client-side-analytics\"] a").nth(0)
                    review_count_text = await review_count_element.text_content()
                    review_count = int(review_count_text.replace(",", ""))

                    price_element_locator = product_element.locator("[data-cy=\"price-recipe\"] .a-offscreen")
                    # If the price element is on the product element
                    if await price_element_locator.count() > 0:
                        price = await price_element_locator.nth(0).text_content()
                    else:
                        price = None

                    # Populate a new dictionary with the scraped data
                    product = {
                        "image": image,
                        "url": url,
                        "title": title,
                        "rating": rating,
                        "review_count": review_count,
                        "price": price
                    }
                    # Add it to the Actor dataset
                    await Actor.push_data(product)
            except Exception:
                Actor.log.exception(f"Cannot extract data from {target_url}")

            finally:
                await page.close()

As you can see, integrating Bright Data’s Scraping Browser with the Apify’s “Playwright + Chrome” template is simple and requires only a few lines of code.

Step #8: Deploy to Apify and Run the Actor

To deploy your local Actor to Apify, run the following command in your project folder:

apify push

If you have not logged in yet, you will be prompted to authenticate via the Apify CLI.

Once the deployment is complete, you will be asked the following question:

✔ Do you want to open the Actor detail in your browser?

Respond with “Y” or “yes” to be redirected to the Actor page in your Apify Console:

The Apify page of your Actor

If you prefer, you can manually access the same page by:

  1. Logging into Apify in your browser
  2. Navigating to the Console
  3. Visiting the “Actor” page

Click the “Start Actor” button to launch your Amazon Scraper Actor. As expected, you will be asked to provide a keyword. Try something like “gaming chair”:

Filling out the keyword text field input element

Afterward, press “Save & Start” to run the Actor and scrape “gaming chair” product listings from Amazon.

Once the scraping is complete, you will see the retrieved data in the Output section:

The scraped data in tabular format

To export the data, go to the “Storage” tab, select the “CSV” option, and press the “Download” button:

Pressing the "Download" button

The downloaded CSV file will contain the following data:

The scraped data in CSV format

Et voilà! Bright Data’s Scraping Browser + Apify integration works like a charm. No more CAPTCHAs or blocks when scraping Amazon or any other site.

[Extra] Bright Data Proxy Integration on Apify

Using a scraping product like Scraping Browser or Web Unlocker directly on Apify is useful and straightforward.

At the same time, suppose you already have an Actor on Apify and just need to enhance it with proxies (e.g., to avoid IP bans). Well, remember that you can integrate Bright Data proxies directly into your Apify Actor, as described in our documentation or integration guide.

Conclusion

In this tutorial, you learned how to build an Apify Actor that integrates with Scraping Browser in Playwright to programmatically gather data from Amazon. We started from scratch, walking through all the steps to build a local scraping script and then deploy it to Apify.

Now, you understand the benefits of using a professional scraping tool like Scraping Browser for your cloud scraping on Apify. Following the same or similar procedures, Apify supports all other Bright Data products:

  • Proxy Services: 4 different types of proxies to bypass location restrictions, including 72 million+ residential IPs
  • Web Scraper APIs: Dedicated endpoints for extracting fresh, structured web data from over 100 popular domains.
  • SERP API: API to handle all ongoing unlocking management for SERP and extract one page

Sign up now to Bright Data and test our interoperable proxy services and scraping products for free!

No credit card required