In this guide, you will learn:
- What Ferret is and what it offers as a declarative web scraping library
- How to configure it for local use in a Go environment
- How to use it to collect data from a static website
- How to use it to scrape a dynamic site
- Ferret’s main limitations and how to work around them
Let’s dive in!
Introduction to Ferret for Web Scraping
Before seeing it in action, explore what Ferret is, how it works, what it offers, and when to use it.
What Is Ferret?
Ferret is an open-source web scraping library written in Go. Its goal is to simplify data extraction from web pages using a declarative approach. Specifically, it abstracts away the technical complexities of parsing and extraction by using its own custom declarative language: the Ferret Query Language (FQL).
With almost 6k stars on GitHub, Ferret is one of the most popular web scraping libraries for Go. It is embeddable and supports both static and dynamic web scraping.
FQL: The Ferret Query Language for Declarative Web Scraping
The Ferret Query Language (FQL) is a general-purpose query language, heavily inspired by ArangoDB’s AQL. While it is capable of more, FQL is primarily used for extracting data from web pages.
FQL follows a declarative approach, meaning it focuses on what data to retrieve rather than how to retrieve it. Like AQL, it shares similarities with SQL. But, unlike AQL, FQL is strictly read-only. Note that any form of data manipulation must be done using specific built-in functions.
For more information on FQL syntax, keywords, constructs, and supported data types, refer to the FQL documentation page.
Use Cases
As highlighted on the official GitHub page, the main use cases of Ferret include:
- UI testing: Automate testing in web applications by simulating browser interactions and validating that page elements behave and render correctly across different scenarios.
- Machine learning: Extract structured data from web pages and use that to create high-quality datasets. Those can then be used to train or validate machine learning models more effectively. See how to use web scraping for machine learning.
- Analytics: Scrape and aggregate web data—such as prices, reviews, or user activity—for generating insights, tracking trends, or powering dashboards.
At the same time, keep in mind that the potential use cases for web scraping go far beyond these examples.
Get Started With Ferret
Now that you know what Ferret is, you are ready to see it in action on both static and dynamic web pages. If you are not familiar with the difference between the two, read our guide on static vs dynamic content in web scraping.
Let’s set up an environment to use Ferret for web scraping!
Prerequisites
Make sure you have the following installed on your local machine:
- Go
- Docker
To verify that Golang is installed and ready, run the following command in the terminal:
You should see output similar to this:
If you get an error, install Golang and configure it for your operating system.
Similarly, verify that Docker is installed and properly configured for your system.
Create the Ferret Project
Now, create a folder for your Ferret web scraping project and navigate into it:
Download the Ferret CLI for your OS and unpack directly in the ferret-web-scraping/
folder. Verify that it works by running:
The output should be:
Next, open the project folder in your favorite IDE, such as Visual Studio Code. Inside the project folder, create a file named scraper.fql
:
scraper.fql
will contain your FQL declarative logic for web scraping.
Configure the Ferret Docker Setup
To use all Ferret features, you must have Chrome or Chromium installed locally or running inside Docker. The official docs recommend running Chrome/Chromium in a Docker container.
You can use any Chromium-based headless image, but the montferret/chromium
one is recommended. Retrieve it with:
Then, launch that Docker image with this command:
Note: If you want to see what is happening in the browser during the execution of your FQL scripts, launch Chrome on your host machine with remote debugging enabled with:
Scrape a Static Site with Ferret
Follow the steps below to learn how to use Ferret to scrape a static website. In this example, the target page will be the sandbox site “Books to Scrape”:
The goal is to extract key information from each book on the page using Ferret’s declarative approach via FQL.
Step #1: Connect to the Target Site
In scraper.fql
, use the DOCUMENT
function to connect to the target page:
LET
allows you to define a variable in FQL. After that instruction, doc
will contain the HTML of the target page.
Step #2: Select All Book Elements
First, get familiar with the structure of the target web page by visiting it in your browser and inspecting it. In detail, right-click on a book element and select the “Inspect” option to open the DevTools:
Note that each book element is an <article>
node inside the parent <section>
. Select all books elements with the ELEMENTS()
function:
ELEMENTS()
applies the CSS selector passed as the second argument to the document. In other words, it selects the desired HTML elements on the page.
Iterate over the list of selected elements and prepare to apply the scraping logic to them:
Amazing! Time to iterate over each book element and extract data from each.
Step #3: Extract Data from Each Quote
Now, inspect a single HTML book element:
Note that you can scrape:
- The image URL from the
src
attribute of the.image_container img
element. - The book title from the
title
attribute of theh3 a
element. - The URL to the book page from the
href
attribute of theh3 a
node. - The book price from the
.price_color
‘s text. - The availability info from the
.instock
‘s text.
Implement this data parsing logic with:
Where base_url
is a variable defined outside the for
loop:
In the above code:
ELEMENT()
enables you to select a single element on the page using a CSS selector.attributes
is a special attribute that all objects returned byELEMENT()
have. It contains the values of the HTML attributes of the current element.INNER_TEXT()
returns the text contained in the current element.TRIM()
removes leading and trailing whitespace.
Fantastic! Static scraping logic completed.
Step #4: Put It All Together
Your scraper.fql
file should look like this:
As you can see, the scraping logic focuses more on what data to extract rather than how to extract it. That is the power of declarative web scraping with Ferret!
Step #5: Execute the FQL Script
Execute your Ferret script with:
In the terminal, the output will be:
This is a JSON string containing all the book data collected from the webpage as intended. For a non-declarative approach to data parsing, take a look at our guide on web scraping with Go.
Mission accomplished!
Scrape a Dynamic Site with Ferret
Ferret also supports scraping dynamic websites that require JavaScript execution. In this section of the guide, the target site will be the JavaScript-delayed version of the “Quotes to Scrape” site:
The page uses JavaScript to dynamically inject quote elements into the DOM after a short delay. That scenario requires executing JavaScript—hence, the need to render the page in a browser. (That also is why we previously set up a Chromium Docker container.)
Follow the steps below to learn how to handle dynamic web pages using Ferret!
Step #1: Connect to the Target Page in the Browser
Use the following lines to connect to the target page via a headless browser:
Note the use of the driver
field in the DOCUMENT()
function. That is what tells Ferret to render the page in the headless Chroumium instance configured via Docker.
Step #2: Wait for the Target Elements to be on the Page
Visit the target page in your browser, wait for the quote elements to load, and inspect one of them:
Notice how the quote elements can be selected using the .quote
CSS selector. These quote elements will be rendered via JavaScript after a short delay, so you must wait for them.
Use the WAIT_ELEMENT()
function in Ferret to wait for the quote elements to appear on the page:
That is an essential construct to use when scraping dynamic web pages that rely on JavaScript to render content.
Step #3: Apply the Scraping Logic
Now, focus on the HTML structure of the info elements inside a .quote
node:
Note that you can scrape:
- The quote text from
.quote
- The author from
.author
Implement the Ferret web scraping logic with:
Awesome! Parsing logic completed.
Step #4: Assemble Everything
The scraper.fql
file should contain:
As you can see, this is not much different from the script for a static site. Again, the reason is that Ferret uses a declarative approach to web scraping.
Step #5: Run the FQL Code
Run your Ferret scraping script with:
This time, the result will be:
Et voilà! That is exactly the structured content retrieved from the JavaScript-rendered page.
Limitations of the Ferret Declarative Web Scraping Approach
Ferret is undoubtedly a powerful tool and one of the few that takes a declarative approach to web scraping. Yet, it comes with at least three major drawbacks:
- Poor documentation and infrequent updates: While the official documentation includes helpful text, it lacks comprehensive API references. That makes it difficult to build complex scripts. Additionally, the project does not receive regular updates, which means it may lag behind modern scraping techniques.
- No support for anti-scraping bypass: Ferret does not offer built-in mechanisms to handle CAPTCHAs, rate limits, or other advanced anti-scraping defenses. This makes it unsuitable for scraping more protected sites.
- Limited expressiveness: FQ, the Ferret Query Language, is still under development and does not offer the same level of flexibility or control as more modern scraping tools like Playwright or Puppeteer.
These limitations cannot be easily addressed through simple integrations. Also, do not forget that Ferret’s core focus is on retrieving web data. So, the solution is to consider a more robust alternative.
Bright Data’s AI infrastructure includes a suite of advanced services tailored for reliable and intelligent web data extraction. These enable you to retrieve data from any website and at scale.
Conclusion
In this tutorial, you learned how to use Ferret for declarative web scraping in Go. As demonstrated, this library allows you to extract data from both static and dynamic pages by focusing on what to retrieve, rather than how to retrieve it.
The problem is that Ferret has several limitations, so it might not be the best solution out there. If you are looking for a more streamlined and scalable way to retrieve web data, consider adopting Web Scraper APIs—dedicated endpoints for extracting fresh, structured, and fully compliant web data from over 120 popular websites.
Sign up for a free Bright Data account today and test our powerful web scraping infrastructure!