In this guide, you will learn:
- What SeleniumBase is and why it is useful for web scraping
- How it compares to vanilla Selenium
- The features and benefits SeleniumBase offers
- How to use it to build a simple scraper
- How to utilize it for more complex use cases
Let’s dive in!
What Is SeleniumBase?
SeleniumBase is a Python framework for browser automation. Built on top of the Selenium/WebDriver APIs, it provides a professional-grade toolkit for web automation. It supports a wide range of tasks, from testing to scraping.
SeleniumBase is an all-in-one library for testing web pages, automating workflows, and scaling web-based operations. It comes equipped with advanced features such as CAPTCHA bypassing, bot-detection avoidance, and productivity-enhancing tools.
SeleniumBase vs Selenium: Feature and API Comparison
To better understand the why behind SeleniumBase, it makes sense to compare it directly with the vanilla version of Selenium—the tool it is built upon.
For a quick Selenium vs SeleniumBase comparison, take a look at the summary table below:
Feature | SeleniumBase | Selenium |
---|---|---|
Built-in test runners | Integrates with pytest , pynose , and behave |
Requires manual setup for test integration |
Driver management | Automatically downloads the browser driver matching the browser version | Requires manual driver download and configuration |
Web automation logic | Combines multiple steps into a single method call | Requires multiple lines of code for similar functionality |
Selector handling | Automatically detects CSS or XPath selectors | Requires explicitly defining selector types in method calls |
Timeout handling | Applies default timeouts to prevent failures | Methods fail immediately if timeouts are not explicitly set |
Error outputs | Provides clean, readable error messages for easier debugging | Generates verbose and less interpretable error logs |
Dashboards and reports | Includes built-in dashboards, reports, and failure screenshots | No built-in dashboards or reporting capabilities |
Desktop GUI applications | Offers visual tools for test running | Lacks desktop GUI tools for test execution |
Test recorder | Built-in test recorder for creating scripts from manual browser actions | Requires manual script writing |
Test case management | Provides CasePlans for organizing tests and documenting steps directly in the framework | No built-in test case management tools |
Data app support | Includes ChartMaker for generating JavaScript from Python to create data apps | No additional tools for building data apps |
Time to dig into the differences!
Built-in Test Runners
SeleniumBase integrates with popular test runners like pytest
, pynose
, and behave
. These tools provide an organized structure, seamless test discovery, execution, test state tracking (e.g., passed, failed, or skipped), and command-line options for customizing settings such as browser selection.
With vanilla Selenium, you would need to manually implement an options parser or rely on third-party tools for configuring tests from the command line.
Enhanced Driver Management
By default, SeleniumBase downloads a compatible driver version that matches the major version of your browser. You can override this using the --driver-version=VER
option in your pytest
command. For example:
pytest my_script.py --driver-version=114
Instead, Selenium requires you to manually download and configure the appropriate driver. In that case, you are responsible for ensuring compatibility with the browser version.
Multi-Action Methods
SeleniumBase combines multiple steps into single methods for simplified web automation. For example, the driver.type(selector, text)
method performs the following:
- Waits for the element to be visible
- Waits for the element to be interactive
- Clears any existing text
- Types the provided text
- Submits if the text ends with
"\n"
With raw Selenium, replicating the same logic would require a few lines of code.
Simplified Selector Handling
SeleniumBase can automatically differentiate between CSS selectors and XPath expressions. That removes the need to explicitly specify selector types with By.CSS_SELECTOR
or By.XPATH
. However, you can still provide the type explicitly if preferred.
Example with SeleniumBase:
driver.click("button.submit") # Automatically detects as CSS Selector
driver.click("//button[@class='submit']") # Automatically detects as XPath
The vanilla Selenium equivalent code is:
driver.find_element(By.CSS_SELECTOR, "button.submit").click()
driver.find_element(By.XPATH, "//button[@class='submit']").click()
Default and Custom Timeout Values
SeleniumBase automatically applies a default timeout of 10 seconds to methods, ensuring elements have time to load. That prevents immediate failures, which are common in raw Selenium.
You can also set custom timeout values directly in method calls, as in the example below:
driver.click("button", timeout=20)
The equivalent Selenium code would be much more verbose and complex:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button"))).click()
Clear Error Outputs
SeleniumBase provides clean, easy-to-read error messages when scripts fail. Raw Selenium, in contrast, often generates verbose and less interpretable error logs, requiring additional effort to debug.
Dashboards, Reports, and Screenshots
SeleniumBase includes features for generating dashboards and reports for test runs. It also saves screenshots of failures in the ./latest_logs/
folder for easy debugging. Raw Selenium lacks these features out of the box.
Extra Features
Compared to Selenium, SeleniumBase includes:
- Desktop GUI applications for running tests visually, such as SeleniumBase Commander for
pytest
and SeleniumBase Behave GUI forbehave
. - A built-in Recorder / Test Generator for creating test scripts based on manual browser actions. This significantly reduces the effort required to write tests for complex workflows.
- Test case management software called CasePlans to organize tests and document step descriptions directly within the framework.
- Tools like ChartMaker to build data apps by generating JavaScript code from Python. That makes it a versatile solution beyond standard test automation.
SeleniumBase: Features, Methods, and CLI Options
See what makes SeleniumBase special by exploring its capabilities and API.
Features
This is a list of some of the most relevant SleniumBase features:
- Includes Recorder Mode for instantly generating browser tests in Python.
- Supports multiple browsers, tabs, iframes, and proxies within the same test.
- Features Test Case Management Software with Markdown technology.
- Smart waiting mechanism automatically improves reliability and reduces flaky tests.
- Compatible with
pytest
,unittest
,nose
, andbehave
for test discovery and execution. - Includes advanced logging tools for dashboards, reports, and screenshots.
- Can run tests in Headless Mode to hide the browser interface.
- Supports multithreaded test execution across parallel browsers.
- Allows tests to run using Chromium’s mobile device emulator.
- Supports running tests through a proxy server, even an authenticated one.
- Customizes the browser’s user-agent string for tests.
- Prevents detection by websites that block Selenium automation.
- Integrates with selenium-wire for inspecting browser network requests.
- Flexible command-line interface for custom test execution options.
- Global configuration file for managing test settings.
- Supports integrations with GitHub Actions, Google Cloud, Azure, S3, and Docker.
- Supports executing JavaScript from Python.
- Can interact with Shadow DOM elements by using
::shadow
in CSS selectors.
For the entire list, check out the documentation.
Methods
Below is a list of the most useful SeleniumBase methods:
driver.open(url)
: Navigate the browser window to the specified URL.driver.go_back()
: Navigate back to the previous URL.driver.type(selector, text)
: Update the field identified by the selector with the specified text.driver.click(selector)
: Click the element identified by the selector.driver.click_link(link_text)
: Click the link containing the specified text.driver.select_option_by_text(dropdown_selector, option)
: Select an option from a dropdown menu by visible text.driver.hover_and_click(hover_selector, click_selector)
: Hover over an element and click another.driver.drag_and_drop(drag_selector, drop_selector)
: Drag an element and drop it onto another element.driver.get_text(selector)
: Get the text of the specified element.driver.get_attribute(selector, attribute)
: Get the specified attribute of an element.driver.get_current_url()
: Get the current page’s URL.driver.get_page_source()
: Get the HTML source of the current page.driver.get_title()
: Get the title of the current page.driver.switch_to_frame(frame)
: Switch into the specified iframe container.driver.switch_to_default_content()
: Exit the iframe container and return to the main document.driver.open_new_window()
: Open a new browser window in the same session.driver.switch_to_window(window)
: Switch to the specified browser window.driver.switch_to_default_window()
: Return to the original browser window.driver.get_new_driver(OPTIONS)
: Open a new driver session with the specified options.driver.switch_to_driver(driver)
: Switch to the specified browser driver.driver.switch_to_default_driver()
: Return to the original browser driver.driver.wait_for_element(selector)
: Wait until the specified element is visible.driver.is_element_visible(selector)
: Check if the specified element is visible.driver.is_text_visible(text, selector)
: Check if the specified text is visible within an element.driver.sleep(seconds)
: Pause execution for the specified amount of time.driver.save_screenshot(name)
: Save a screenshot in.png
format with the given name.driver.assert_element(selector)
: Verify that the specified element is visible.driver.assert_text(text, selector)
: Verify that the specified text is present in the element.driver.assert_exact_text(text, selector)
: Verify that the specified text matches exactly in the element.driver.assert_title(title)
: Verify that the current page title matches the specified title.driver.assert_downloaded_file(file)
: Verify that the specified file has been downloaded.driver.assert_no_404_errors()
: Verify there are no broken links on the page.driver.assert_no_js_errors()
: Verify there are no JavaScript errors on the page.
For the complete list, explore the documentation.
CLI Options
SeleniumBase extends pytest
with the following command-line options:
--browser=BROWSER
: Set the web browser (default: “chrome”).--chrome
: Shortcut for--browser=chrome
.--edge
: Shortcut for--browser=edge
.--firefox
: Shortcut for--browser=firefox
.--safari
: Shortcut for--browser=safari
.--settings-file=FILE
: Override default SeleniumBase settings.--env=ENV
: Set the test environment, accessible viadriver.env
.--account=STR
: Set account, accessible viadriver.account
.--data=STRING
: Extra test data, accessible viadriver.data
.--var1=STRING
: Extra test data, accessible viadriver.var1
.--var2=STRING
: Extra test data, accessible viadriver.var2
.--var3=STRING
: Extra test data, accessible viadriver.var3
.--variables=DICT
: Extra test data, accessible viadriver.variables
.--proxy=SERVER:PORT
: Connect to a proxy server.--proxy=USERNAME:PASSWORD@SERVER:PORT
: Use an authenticated proxy server.--proxy-bypass-list=STRING
: Hosts to bypass (e.g., “*.foo.com”).--proxy-pac-url=URL
: Connect via PAC URL.--proxy-pac-url=USERNAME:PASSWORD@URL
: Authenticated proxy with PAC URL.--proxy-driver
: Use proxy for driver download.--multi-proxy
: Allow multiple authenticated proxies in multi-threading.--agent=STRING
: Modify the browser’s User-Agent string.--mobile
: Enable mobile device emulator.--metrics=STRING
: Set mobile metrics (e.g., “CSSWidth,CSSHeight,PixelRatio”).--chromium-arg="ARG=N,ARG2"
: Set Chromium arguments.--firefox-arg="ARG=N,ARG2"
: Set Firefox arguments.--firefox-pref=SET
: Set Firefox preferences.--extension-zip=ZIP
: Load Chrome Extension.zip
/.crx
files.--extension-dir=DIR
: Load Chrome Extension directories.--disable-features="F1,F2"
: Disable features.--binary-location=PATH
: Set Chromium binary path.--driver-version=VER
: Set driver version.--headless
: Default headless mode.--headless1
: Use Chrome’s old headless mode.--headless2
: Use Chrome’s new headless mode.--headed
: Enable GUI mode on Linux.--xvfb
: Run tests with Xvfb on Linux.--locale=LOCALE_CODE
: Set the browser’s language locale.--reuse-session
: Reuse browser session for all tests.--reuse-class-session
: Reuse session for class tests.--crumbs
: Delete cookies between reused sessions.--disable-cookies
: Disable cookies.--disable-js
: Disable JavaScript.--disable-csp
: Disable Content Security Policy.--disable-w
s: Disable Web Security.--enable-ws
: Enable Web Security.--log-cdp
: Log Chrome DevTools Protocol (CDP) events.--remote-debug
: Sync to Chrome Remote Debugger.--visual-baseline
: Set visual baseline for layout tests.--timeout-multiplier=MULTIPLIER
: Multiply default timeout values.
See the full list of command-line option definitions in the documentation.
Using SeleniumBase for Web Scraping: Step-By-Step Guide
Follow this step-by-step tutorial to learn how to build a SeleniumBase scraper to retrieve data from the Quotes to Scrape sandbox:
For a similar tutorial using vanilla Selenium, check out our guide on web scraping with Selenium.
Step #1: Project Initialization
Before getting started, make sure you have Python 3 installed on your machine. Otherwise, download it and install it.
Open the terminal and launch the command below to create a directory for your project:
mkdir seleniumbase-scraper
seleniumbase-scraper
will contain your SeleniumBase scraper.
Navigate inside it and initialize a virtual environment inside it:
cd seleniumbase-scraper
python -m venv env
Next, load the project folder in your favorite Python IDE. Visual Studio Code with the Python extension or PyCharm Community Edition will do.
Create a scraper.py
file in the project’s directory, which should now contain this file structure:
scraper.py
will soon contain your scraping logic.
Activate the virtual environment in the IDE’s terminal. In Linux or macOS, do that with the command below:
./env/bin/activate
Equivalently, on Windows, run:
env/Scripts/activate
In the activated environment, launch this command to install SeleniumBase:
pip install seleniumbase
Wonderful! You have a Python environment for SeleniumBase web scraping.
Step #2: SeleniumBase Test Setup
While SeleniumBase supports pytest
syntax for building tests, a web scraping bot is not a test script. You can still take advantage of all the SeleniumBase pytest
command-line extension options by using the SB
syntax:
from seleniumbase import SB
with SB() as sb:
pass
# Scraping logic...
You can now execute your test with:
python3 scraper.py
Note: On Windows, replace python3
with python
.
To execute it in headless mode, run:
python3 scraper.py --headless
Keep in mind that you can combine multiple command line options.
Step #3: Connect to the Target Page
Use the open()
method to instruct the controlled browser to visit your target page:
sb.open("https://quotes.toscrape.com/")
If you execute the scraping test script in headed mode, this is what you will see for a fraction of a second:
Note that, compared to vanilla Selenium, you do not have to manually close the driver. SeleniumBase will take care of that for you.
Step #4: Select the Quote Elements
Open the target page in incognito mode in your browser and inspect a quote element:
Since the page contains multiple quotes, create a quotes
array to store the scraped data:
quotes = []
In the DevTools section above, you can see that all quotes can be selected using the .quote
CSS selector. Use find_elements()
to select them all:
quote_elements = sb.find_elements(".quote")
Next, prepare to iterate over the elements and scrape data from each quote element. Add the scraped data to an array:
for quote_element in quote_elements:
# Scraping logic...
Great! The high-level scraping logic is now ready.
Step #5: Scrape Quote Data
Inspect a single quote element:
Note that you can scrape:
- The quote text from
.text
- The quote author from
.author
- The quote tags from
.tag
Select each node and extract data from them with the text
attribute:
text_element = quote_element.find_element(By.CSS_SELECTOR, ".text")
text = text_element.text.replace("“", "").replace("”", "")
author_element = quote_element.find_element(By.CSS_SELECTOR, ".author")
author = author_element.text
tags = []
tag_elements = quote_element.find_elements(By.CSS_SELECTOR, ".tag")
for tag_element in tag_elements:
tag = tag_element.text
tags.append(tag)
Note that find_elements()
returns vanilla Selenium WebElement
objects. So, to select elements within them, you must use Selenium’s native methods. This is why you have to specify By.CSS_SELECTOR
as the locator.
Make sure to import By
at the beginning of your script:
from selenium.webdriver.common.by import By
Notice how scraping the tags requires a loop, as a single quote can have one or more tags. Also, observe the use of the replace()
method to remove the special double quotes surrounding the text.
Step #6: Populate the Quotes Array
Populate a new quotes
object with the scraped data and add it to quotes
:
quote = {
"text": text,
"author": author,
"tags": tags
}
quotes.append(quote)
Amazing! The SelenumBase scraping logic is complete.
Step #7: Implement Crawling Logic
Remember, the target site contains multiple pages. To navigate to the next page, click the “Next →” button at the bottom:
On the last page, this button will not be present.
To implement web crawling and scrape all pages, wrap your scraping logic in a loop that clicks the “Next →” button and stops when the button is no longer available:
while sb.is_element_present(".next"):
# Scraping logic...
# Visit the next page
sb.click(".next a")
Note the use of teh special SleniumBae is_element_present()
method to check whether the button is present or not.
Perfect! Your SeleniumBase scraper will now go through the entire site.
Step #8: Export the Scraped Data
Export the scraped data in quotes
to a CSV file as follows:
with open("quotes.csv", mode="w", newline="", encoding="utf-8") as file:
writer = csv.DictWriter(file, fieldnames=["text", "author", "tags"])
writer.writeheader()
# Flatten the quote objects for CSV writing
for quote in quotes:
writer.writerow({
"text": quote["text"],
"author": quote["author"],
"tags": ";".join(quote["tags"])
})
Do not forget to import csv
from the Python standard library:
import csv
Step #9: Put It All Together
Your script.py
file should now contain the following code:
from seleniumbase import SB
from selenium.webdriver.common.by import By
import csv
with SB() as sb:
# Connect to the target page
sb.open("https://quotes.toscrape.com/")
# Where to store the scraped data
quotes = []
# Iterate over all quote pages
while sb.is_element_present(".next"):
# Select all quote elements on the page
quote_elements = sb.find_elements(".quote")
# Iterate over them and scrape data for each quote element
for quote_element in quote_elements:
# Data extraction logic
text_element = quote_element.find_element(By.CSS_SELECTOR, ".text")
text = text_element.text.replace("“", "").replace("”", "")
author_element = quote_element.find_element(By.CSS_SELECTOR, ".author")
author = author_element.text
tags = []
tag_elements = quote_element.find_elements(By.CSS_SELECTOR, ".tag")
for tag_element in tag_elements:
tag = tag_element.text
tags.append(tag)
# Populate a new quote object with the scraped data
quote = {
"text": text,
"author": author,
"tags": tags
}
# Add it to the list of scraped quotes
quotes.append(quote)
# Visit the next page
sb.click(".next a")
# Export the scraped data to CSV
with open("quotes.csv", mode="w", newline="", encoding="utf-8") as file:
writer = csv.DictWriter(file, fieldnames=["text", "author", "tags"])
writer.writeheader()
# Flatten the quote objects for CSV writing
for quote in quotes:
writer.writerow({
"text": quote["text"],
"author": quote["author"],
"tags": ";".join(quote["tags"])
})
Execute the SeleniumBase scraper in headless mode with:
python3 script.py --headless
After a few seconds, a quotes.csv
file will appear in the project folder.
Open it, and you will see:
Et voilà! Your SeleniumBase web scraping script works like a charm.
Advanced SelenimBase Scraping Use Cases
Now that you have seen the basics of SeleniumBase, you are ready to explore some more complex scenarios.
Automate Form Filling and Submission
Note: Bright Data doesn’t scrape behind login.
SeleniumBase also allows you to interact with elements on a page as a human user would. For example, suppose you need to interact with a login form as shown below:
Your goal is to fill out the “Username” and “Password” fields, and then submit the form by clicking the “Login” button. You can achieve this with a SeleniumBase test as follows:
# login.py
from seleniumbase import BaseCase
BaseCase.main(__name__, __file__)
class LoginTest(BaseCase):
def test_submit_login_form(self):
# Visit the target page
self.open("https://quotes.toscrape.com/login")
# Fill out the form
self.type("#username", "test")
self.type("#password", "test")
# Submit the form
self.click("input[type=\"submit\"]")
# Verify you are on the right page
self.assert_text("Top Ten tags")
This example is great for building a test around it, so note the use of the BaseCase
class. That allows you to create pytest
tests.
Execute the test with this command:
pytest login.py
You will see the browser open, load the login page, fill out the form, submit it, and then check for the given text to appear on the page.
The output in the terminal will look something like this:
login.py . [100%]
======================================== 1 passed in 11.20s =========================================
Bypass Simple Anti-Bot Technologies
Many sites implement advanced anti-scraping measures to prevent bots from accessing their data. These techniques include CAPTCHA challenges, rate limits, browser fingerprinting, and others. To effectively scrape websites without getting blocked, you need to bypass these protections.
SeleniumBase provides a special feature called UC Mode (Undetected-Chromedriver Mode), which helps scraping bots appear more like human users. This allows them to evade detection by anti-bot services, which might otherwise block the scraping bot directly or trigger CAPTCHAs.
UC Mode is built on undetected-chromedriver
and comes with several updates, fixes, and improvements, such as:
- Automatic User-Agent rotation to avoid detection.
- Automatic configuration of Chromium arguments as needed.
- Special
uc_*()
methods for bypassing CAPTCHAs.
Now, let’s see how to use UC Mode in SeleniumBase to bypass anti-bot challenges.
For this demonstration, you will see how to access the anti-bot page from the Scraping Course site:
To bypass the anti-bot measures and handle the CAPTCHA, enable UC Mode and use the uc_open_with_reconnect()
and uc_gui_click_captcha()
methods:
from seleniumbase import SB
with SB(uc=True) as sb:
# Target page with anti-bot measures
url = "https://www.scrapingcourse.com/antibot-challenge"
# Open the URL using UC Mode with a reconnect time of 4 seconds to avoid initial detection
sb.uc_open_with_reconnect(url, reconnect_time=4)
# Attempt to bypass the CAPTCHA
sb.uc_gui_click_captcha()
# Take a screenshot of the page
sb.save_screenshot("screenshot.png")
Now, launch the script and verifies it works as expected. Since uc_gui_click_captcha()
requires PyAutoGUI to work, SeleniumBase will install it for you on the first run:
PyAutoGUI required! Installing now...
You will see the browser automatically click on the “Verify you are human” check by moving your mouse. The screenshot.png
file in your project folder will show:
Wow! Cloudflare has been bypassed.
Bypass Complex Anti-Bot Technologies
Anti-bot solutions are becoming increasingly sophisticated, and UC Mode may not always be effective. This is why SeleniumBase also offers a special CDP Mode (Chrome DevTools Protocol Mode).
CDP Mode operates within UC Mode and allows bots to appear more human-like by controlling the browser through the CDP-Driver. While regular UC Mode cannot perform WebDriver actions when the driver
is disconnected from the browser, the CDP-Driver can still interact with the browser, overcoming this limitation.
CDP Mode is built on python-cdp
, trio-cdp
, and nodriver
. It is designed to bypass advanced anti-bot solutions from real-world sites, as in the example below:
from seleniumbase import SB
with SB(uc=True, test=True) as sb:
# Target page with advanced anti-bot measures
url = "https://gitlab.com/users/sign_in"
# Visit the page in CDP Mode
sb.activate_cdp_mode(url)
# Handle the CAPTCHA
sb.uc_gui_click_captcha()
# Wait for 2 seconds for the page to reload and the driver to retake control
sb.sleep(2)
# Take a screenshot of the page
sb.save_screenshot("screenshot.png")
The result will be:
Here we go! You are now a SeleniumBase scraping master.
Conclusion
In this article, you learned about SeleniumBase, the features and methods it offers, and how to use it for web scraping. You started with basic scenarios and then explored more complex use cases.
While UC Mode and CDP Mode are effective for bypassing certain anti-bot measures, they are not foolproof.
Websites can still block your IP if you make too many requests or challenge you with more complex CAPTCHAs that require multiple actions. A more effective solution is to use a web browser automation tool like Selenium in combination with a scraping-dedicated, cloud-based, highly scalable browser like Scraping Browser from Bright Data.
Scraping Browser is a browser that works with Playwright, Puppeteer, Selenium, and others. It automatically rotates exit IPs with every request and can handle browser fingerprinting, retries, CAPTCHA resolution, and much more. Forget about getting blocked and streamline your scraping operation.
Sign up now and start your free trial!
No credit card required