In this blog post, you will learn:
- What Apify is
- Why using Scraping Browser with Apify is a win-win scenario
- How to integrate Bright Data’s Scraping Browser into an Apify Python script
- How to use Bright Data’s proxies on Apify
Let’s dive in!
What Is Apify?
Apify is a full-stack web scraping and data extraction platform. It allows you to create and run custom web scraping tools—known as Actors—in the cloud. These Actors automate tasks related to data collection, processing, and automation.
On Apify, you can monetize your scraping scripts by making them public and available to other users. Whether you plan to utilize your Actor privately or make it public, Bright Data’s scraping solutions will help make your scraper more reliable and effective.
Why Use Bright Data’s Scraping Browser on Apify
To appreciate the value of Bright Data’s Scraping Browser, you must understand what the tool is and what it offers.
The biggest limitation of browser automation tools is not their APIs, but rather the browsers they control. Scraping Browser is a next-generation web browser specifically designed for web scraping. In particular, it comes with the following key features:
- Reliable TLS fingerprints to avoid detection
- Unlimited scalability for large-scale data extraction
- Built-in IP rotation powered by a 72-million IP proxy network
- Automatic retries to handle failed requests
- CAPTCHA-solving capabilities
Scraping Browser is compatible with all major browser automation frameworks—including Playwright, Puppeteer, and Selenium. So, you do not need to learn a new API or install third-party dependencies. You can simply integrate it directly into your existing browser automation scraping script.
Now, using Scraping Browser with Apify brings even more advantages, resulting in the following benefits:
- Reduced cloud costs: Browsers consume significant resources, which leads to higher CPU and RAM usage. Scraping Browser, hosted on the cloud with guaranteed unlimited scalability, reduces cloud costs during actor runs on Apify. Since Apify charges by server usage, even when factoring in Scraping Browser fees, this setup can result in cost savings.
- All-in-one anti-bot bypass tool: Scraping Browser tackles IP bans, CAPTCHA challenges, browser fingerprint issues, and other anti-scraping barriers. That makes your scraping process more efficient and less prone to disruption.
- Built-in Proxy Integration: Scraping Browser includes proxy management, so you no longer need to worry about maintaining and manually rotating proxies.
- Apify Benefits: Using Scraping Browser on a cloud Apify actor (instead of a generic script) offers additional benefits, such as:
- Easy deployment
- Programmatic data access via API
- Simple data export
- Easy input argument configurations
- Scalability for large projects
The Bright Data + Apify integration not only simplifies your scraping workflow but also improves reliability. Also, it reduces the time and effort needed to bring your web scraping bot online.
How to Integrate Bright Data’s Scraping Browser on Apify: Step-by-Step Guide
The target site for this section will be Amazon, a platform rich in information but notorious for its strict anti-bot measures. Without the right tools, you are likely to encounter the infamous Amazon CAPTCHA, blocking your scraping attempts:
In this section, we will build a scraping Actor that leverages Bright Data’s Scraping Browser to extract data from a generic Amazon product search page:
Note: The Actor will be written in Python, but remember that Apify also supports JavaScript.
Follow the steps below to learn how to integrate Bright Data’s scraping tools with Apify!
Prerequisites
To follow this tutorial, you need to meet the following prerequisites:
- Python 3.8+ installed locally: For developing and building the local Actor script.
- Node.js installed locally: For installing the Apify CLI.
- An Apify account: To deploy the local Actor to the Apify platform.
- A Bright Data account: To access the Scraping Browser.
Step #1: Project Setup
The easiest way to set up a new Apify Actor project is by using the Apify CLI. First, install it globally via Node.js with the following command:
Then, create a new Apify project by running:
You will be prompted to answer a few questions. Answer as follows:
This way, the Apify CLI will create a new Python Actor in the amazon-scraper
folder using the “Playwright + Chrome” template. If you are not familiar with those tools, read our guide on Playwright web scraping.
Note: A Selenium or Puppeteer template would also work, as Bright Data’s Scraping Browser integrates with any browser automation tool.
Your Apify Actor project will have the following structure:
Load the amazon-scraper
folder in your preferred Python IDE, such as Visual Studio Code with the Python extension or PyCharm Community Edition.
Now, keep in mind that to run the Actor locally, Playwright’s browsers must be installed. To do so, first activate the virtual environment folder (.venv
) inside your project directory. On Windows, run:
Equivalently, on Linux/macOS, launch:
Then, install the required Playwright dependencies by executing:
Wonderful! You can now run your Actor locally with:
Your Apify project is now fully set up and ready to be integrated with Bright Data’s Scraping Browser!
Step #2: Connect to the Target Page
If you take a look at the URL of an Amazon search results page, you will notice it follows this format:
For example:
The target URL of your script should use that format, where <keyword>
can be dynamically set using an Apify input argument. The input parameters that an Actor accepts are defined in the input_schema.json
file, located in the .actor
directory.
Defining the keyword
argument makes the script customizable, allowing users to specify the search term they prefer. To define that parameter, replace the contents of input_schema.json
with the following one:
This configuration defines a required keyword
parameter of type string
.
To set the keyword argument when running the Actor locally, modify the INPUT.json
file inside storage/key_value_stores/default
as follows:
This way, the Actor will read the keyword
input argument, using "laptop"
as the search term.
Once the Actor is deployed to the Apify platform, you will see an input field where you can customize this parameter before running the Actor:
Keep in mind that the entry file of an Apify Actor is main.py
, located in the src folder. Open this file and modify it to:
- Read the keyword parameter from the input arguments
- Construct the target URL for the Amazon search page
- Use Playwright to navigate to that page
By the end of this step, your main.py
file should contain the Python logic below:
The above code:
- Initializes an Apify
Actor
to manage the script lifecycle - Retrieves input arguments using
Actor.get_input()
- Extracts the
keyword
argument from the input data - Constructs the target URL using a Python f-string
- Launches Playwright and starts a headless Chromium browser with GPU disabled
- Creates a new browser context, opens a page, and navigates to the target URL using
page.goto()
- Logs any errors with
Actor.log.exception()
- Ensures the Playwright page is closed after execution
Perfect! Your Apify Actor is ready to leverage Bright Data’s Scraping Browser for efficient web scraping.
Step #3: Integrate Bright Data’s Scraping Browser
Now, use the Playwright API to capture a screenshot after connecting to the target page:
Run your Actor locally, and it will generate a screenshot.png
file in the project folder. Open it, and you will likely see something like this:
Similarly, you might get the following Amazon error page:
As you can see, your web scraping bot has been blocked by Amazon’s anti-bot measures. This is just one of many challenges you may encounter when scraping Amazon or other popular websites.
Forget about those challenges by using Bright Data’s Scraping Browser—a cloud-based scraping solution that provides unlimited scalability, automatic IP rotation, CAPTCHA solving, and anti-scraping bypass.
To get started, if you have not already, create a Bright Data account. Then, log into the platform. In the “User Dashboard” section, click the “Get proxy products” button:
In the “My Zones” table on the “Proxies & Scraping Infrastructure” page, select the “scraping_browser” row:
Enable the product by toggling the on/off switch:
Now, in the “Configuration” tab, verify that both “Premium domains” and “CAPTCHA Solver” options are enabled for maximum effectiveness:
In the “Overview” tab, copy the Playwright Scraping Browser connection string:
Add the connection string to your main.py
file as a constant:
Replace <YOUR_PLAYWRIGHT_SCRAPING_BROWSER_CONNECTION_STRING>
with the connection string you copied before.
Note: If you plan to make your Actor public on Apify, you should define SBR_WS_CDP
as an Apify Actor input argument. That way, users adopting your Actor will be able to integrate their own Scraping Browser connection strings.
Now, update the browser
definition in main.py
to use Scraping Browser with Playwright:
Note that the connection timeout should be set to a higher value than usual, as IP rotation through proxies and CAPTCHA solving can take some time.
Done! You successfully integrated Scraping Browser into Playwright within an Apify Actor.
Step #4: Prepare to Scrape All Product Listings
To scrape product listings from Amazon, you first need to inspect the page to understand its HTML structure. To do so, right-click on one of the product elements on the page and select the “Inspect” option. The following DevTools section will appear:
Here, you can see that each product listing element can be selected using this CSS selector:
Targeting custom data-*
attributes is ideal because these attributes are generally employed for testing or monitoring. Thus, they tend to remain consistent over time.
Now, use a Playwright locator to retrieve all product elements on the page:
Next, iterate over the product elements, and prepare to extract data from them:
Amazing! Time to implement the Amazon data extraction logic.
Step #5: Implement the Scraping Logic
First, inspect an individual product listing element:
From this section, you can retrieve the product image from the src
attribute of the .s-image
element:
Note that nth(0)
is required to get the first HTML element matching the locator.
Next, inspect the product title:
You can gather the product URL and title from the <a>
and <h2>
elements inside the [data-cy="title-recipe"]
element, respectively:
Note the logic used to ignore “javascript:void(0)” URLs (which appear on special ad products) and the handling to convert the product URLs into absolute ones.
Then, look at the review section:
From [data-cy="reviews-block"]
, you can get the review rating from the aria-label
of the <a>
element:
Since the rating text in aria-label
is in the “X out of 5 stars” format, you can extract the rating value X with a simple regex. See how to use regex for web scraping.
Do not forget to import re
from the Python Standard Library:
Now, inspect the review count element:
Extract the number of reviews from the <a>
element within [data-component-type="s-client-side-analytics"]
:
Notice the straightforward logic to convert a string like “2,539” into a numerical value in Python.
Finally, inspect the product price node:
Collect the product price from the .a-offscreen
element inside [data-cy="price-recipe"]
:
Since not all products have a price element, you should handle that scenario by checking the count of the price element before attempting to retrieve its value.
To make the script work, update the Playwright import with:
Beautiful! The Amazon product data scraping logic is complete.
Note that the goal of this article is not to dive deep into Amazon’s scraping logic. For more guidance, follow our guide on how to scrape Amazon product data in Python.
Step #6: Collect the Scraped Data
As the last instruction of the for
loop, populate a product
object with the scraped data:
Then, push it to the Apify dataset:
push_data()
guarantees that the scraped data is registered on Apify, allowing you to access it via API or export it in one of the many supported formats (e.g., CSV, JSON, Excel, JSONL, etc.).
Step #7: Put It All Together
This is what your final Apify + Bright Data Actor main.py
should contain:
As you can see, integrating Bright Data’s Scraping Browser with the Apify’s “Playwright + Chrome” template is simple and requires only a few lines of code.
Step #8: Deploy to Apify and Run the Actor
To deploy your local Actor to Apify, run the following command in your project folder:
If you have not logged in yet, you will be prompted to authenticate via the Apify CLI.
Once the deployment is complete, you will be asked the following question:
Respond with “Y” or “yes” to be redirected to the Actor page in your Apify Console:
If you prefer, you can manually access the same page by:
- Logging into Apify in your browser
- Navigating to the Console
- Visiting the “Actor” page
Click the “Start Actor” button to launch your Amazon Scraper Actor. As expected, you will be asked to provide a keyword. Try something like “gaming chair”:
Afterward, press “Save & Start” to run the Actor and scrape “gaming chair” product listings from Amazon.
Once the scraping is complete, you will see the retrieved data in the Output section:
To export the data, go to the “Storage” tab, select the “CSV” option, and press the “Download” button:
The downloaded CSV file will contain the following data:
Et voilà! Bright Data’s Scraping Browser + Apify integration works like a charm. No more CAPTCHAs or blocks when scraping Amazon or any other site.
[Extra] Bright Data Proxy Integration on Apify
Using a scraping product like Scraping Browser or Web Unlocker directly on Apify is useful and straightforward.
At the same time, suppose you already have an Actor on Apify and just need to enhance it with proxies (e.g., to avoid IP bans). Well, remember that you can integrate Bright Data proxies directly into your Apify Actor, as described in our documentation or integration guide.
Conclusion
In this tutorial, you learned how to build an Apify Actor that integrates with Scraping Browser in Playwright to programmatically gather data from Amazon. We started from scratch, walking through all the steps to build a local scraping script and then deploy it to Apify.
Now, you understand the benefits of using a professional scraping tool like Scraping Browser for your cloud scraping on Apify. Following the same or similar procedures, Apify supports all other Bright Data products:
- Proxy Services: 4 different types of proxies to bypass location restrictions, including 72 million+ residential IPs
- Web Scraper APIs: Dedicated endpoints for extracting fresh, structured web data from over 100 popular domains.
- SERP API: API to handle all ongoing unlocking management for SERP and extract one page
Sign up now to Bright Data and test our interoperable proxy services and scraping products for free!
No credit card required