Declarative Web Scraping With Ferret in 2025

Discover how Ferret makes declarative web scraping in Go easy, from setup to data extraction on static and dynamic web pages.
4 min read
Ferret web scraping blog image

In this guide, you will learn:

  • What Ferret is and what it offers as a declarative web scraping library
  • How to configure it for local use in a Go environment
  • How to use it to collect data from a static website
  • How to use it to scrape a dynamic site
  • Ferret’s main limitations and how to work around them

Let’s dive in!

Introduction to Ferret for Web Scraping

Before seeing it in action, explore what Ferret is, how it works, what it offers, and when to use it.

What Is Ferret?

Ferret is an open-source web scraping library written in Go. Its goal is to simplify data extraction from web pages using a declarative approach. Specifically, it abstracts away the technical complexities of parsing and extraction by using its own custom declarative language: the Ferret Query Language (FQL).

With almost 6k stars on GitHub, Ferret is one of the most popular web scraping libraries for Go. It is embeddable and supports both static and dynamic web scraping.

FQL: The Ferret Query Language for Declarative Web Scraping

The Ferret Query Language (FQL) is a general-purpose query language, heavily inspired by ArangoDB’s AQL. While it is capable of more, FQL is primarily used for extracting data from web pages.

FQL follows a declarative approach, meaning it focuses on what data to retrieve rather than how to retrieve it. Like AQL, it shares similarities with SQL. But, unlike AQL, FQL is strictly read-only. Note that any form of data manipulation must be done using specific built-in functions.

For more information on FQL syntax, keywords, constructs, and supported data types, refer to the FQL documentation page.

Use Cases

As highlighted on the official GitHub page, the main use cases of Ferret include:

  • UI testing: Automate testing in web applications by simulating browser interactions and validating that page elements behave and render correctly across different scenarios.
  • Machine learning: Extract structured data from web pages and use that to create high-quality datasets. Those can then be used to train or validate machine learning models more effectively. See how to use web scraping for machine learning.
  • Analytics: Scrape and aggregate web data—such as prices, reviews, or user activity—for generating insights, tracking trends, or powering dashboards.

At the same time, keep in mind that the potential use cases for web scraping go far beyond these examples.

Get Started With Ferret

Now that you know what Ferret is, you are ready to see it in action on both static and dynamic web pages. If you are not familiar with the difference between the two, read our guide on static vs dynamic content in web scraping.

Let’s set up an environment to use Ferret for web scraping!

Prerequisites

Make sure you have the following installed on your local machine:

  • Go
  • Docker

To verify that Golang is installed and ready, run the following command in the terminal:

go version

You should see output similar to this:

go version go1.24.3 windows/amd64

If you get an error, install Golang and configure it for your operating system.

Similarly, verify that Docker is installed and properly configured for your system.

Create the Ferret Project

Now, create a folder for your Ferret web scraping project and navigate into it:

mkdir ferret-web-scraping
cd ferret-web-scraping

Download the Ferret CLI for your OS and unpack directly in the ferret-web-scraping/ folder. Verify that it works by running:

./ferret help

The output should be:

Usage:
  ferret [flags]
  ferret [command]

Available Commands:
  browser     Manage Ferret browsers
  config      Manage Ferret configs
  exec        Execute a FQL script or launch REPL
  help        Help about any command
  update
  version     Show the CLI version information

Flags:
  -h, --help               help for ferret
  -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")

Use "ferret [command] --help" for more information about a command.

Next, open the project folder in your favorite IDE, such as Visual Studio Code. Inside the project folder, create a file named scraper.fql:

ferret-web-scraping/
├── ferret
├── CHANGELOG.md
├── LICENSE
├── README.md
└── scraper.fql # <-- The FQL file for web scraping in Ferret

scraper.fql will contain your FQL declarative logic for web scraping.

Configure the Ferret Docker Setup

To use all Ferret features, you must have Chrome or Chromium installed locally or running inside Docker. The official docs recommend running Chrome/Chromium in a Docker container.

You can use any Chromium-based headless image, but the montferret/chromium one is recommended. Retrieve it with:

docker pull montferret/chromium

Then, launch that Docker image with this command:

docker run -d -p 9222:9222 montferret/chromium

Note: If you want to see what is happening in the browser during the execution of your FQL scripts, launch Chrome on your host machine with remote debugging enabled with:

chrome.exe --remote-debugging-port=9222

Scrape a Static Site with Ferret

Follow the steps below to learn how to use Ferret to scrape a static website. In this example, the target page will be the sandbox site “Books to Scrape”:

The target site “Quotes to Scrape”

The goal is to extract key information from each book on the page using Ferret’s declarative approach via FQL.

Step #1: Connect to the Target Site

In scraper.fql, use the DOCUMENT function to connect to the target page:

LET doc = DOCUMENT("https://books.toscrape.com/")

LET allows you to define a variable in FQL. After that instruction, doc will contain the HTML of the target page.

Step #2: Select All Book Elements

First, get familiar with the structure of the target web page by visiting it in your browser and inspecting it. In detail, right-click on a book element and select the “Inspect” option to open the DevTools:

Inspecting a book element

Note that each book element is an <article> node inside the parent <section>. Select all books elements with the ELEMENTS() function:

LET book_elements = ELEMENTS(doc, "section article")

ELEMENTS() applies the CSS selector passed as the second argument to the document. In other words, it selects the desired HTML elements on the page.

Iterate over the list of selected elements and prepare to apply the scraping logic to them:

FOR book_element IN book_elements
    // book scraping logic...

Amazing! Time to iterate over each book element and extract data from each.

Step #3: Extract Data from Each Quote

Now, inspect a single HTML book element:

Inspecting an HTML book element in detail

Note that you can scrape:

  • The image URL from the src attribute of the .image_container img element.
  • The book title from the title attribute of the h3 a element.
  • The URL to the book page from the href attribute of the h3 a node.
  • The book price from the .price_color‘s text.
  • The availability info from the .instock‘s text.

Implement this data parsing logic with:

LET image_element = ELEMENT(book_element, ".image_container img")
LET title_element = ELEMENT(book_element, "h3 a")
LET price_element = ELEMENT(book_element, ".price_color")
LET availability_element = ELEMENT(book_element, ".instock")

RETURN {
    image_url: base_url + image_element.attributes.src,
    title: base_url+ title_element.attributes.title,
    book_url: title_element.attributes.href,
    price: TRIM(INNER_TEXT(price_element)),
    availability: TRIM(INNER_TEXT(availability_element))
}

Where base_url is a variable defined outside the for loop:

LET base_url = "https://books.toscrape.com/"

In the above code:

  • ELEMENT() enables you to select a single element on the page using a CSS selector.
  • attributes is a special attribute that all objects returned by ELEMENT() have. It contains the values of the HTML attributes of the current element.
  • INNER_TEXT() returns the text contained in the current element.
  • TRIM() removes leading and trailing whitespace.

Fantastic! Static scraping logic completed.

Step #4: Put It All Together

Your scraper.fql file should look like this:

// connect to the target site
LET doc = DOCUMENT("https://books.toscrape.com/")

// select the book HTML elements
LET book_elements = ELEMENTS(doc, "section article")

// the base URL of the target site
LET base_url = "https://books.toscrape.com/"

// iterate over each book element and apply the scraping logic
FOR book_element IN book_elements
    // select all info elements
    LET image_element = ELEMENT(book_element, ".image_container img")
    LET title_element = ELEMENT(book_element, "h3 a")
    LET price_element = ELEMENT(book_element, ".price_color")
    LET availability_element = ELEMENT(book_element, ".instock")

    // scrape the data of interest
    RETURN {
        image_url: base_url + image_element.attributes.src,
        title: base_url+ title_element.attributes.title,
        book_url: title_element.attributes.href,
        price: TRIM(INNER_TEXT(price_element)),
        availability: TRIM(INNER_TEXT(availability_element))
    }

As you can see, the scraping logic focuses more on what data to extract rather than how to extract it. That is the power of declarative web scraping with Ferret!

Step #5: Execute the FQL Script

Execute your Ferret script with:

./ferret exec scraper.fql

In the terminal, the output will be:

[{"availability":"In stock","book_url":"catalogue/a-light-in-the-attic_1000/index.html","image_url":"https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg","price":"£51.77","title":"https://books.toscrape.com/A Light in the Attic"},{"availability":"In stock","book_url":"catalogue/tipping-the-velvet_999/index.html","image_url":"https://books.toscrape.com/media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg","price":"£53.74","title":"https://books.toscrape.com/Tipping the Velvet"},
// omitted for brevity...
,{"availability":"In stock","book_url":"catalogue/its-only-the-himalayas_981/index.html","image_url":"https://books.toscrape.com/media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg","price":"£45.17","title":"https://books.toscrape.com/It's Only the Himalayas"}]

This is a JSON string containing all the book data collected from the webpage as intended. For a non-declarative approach to data parsing, take a look at our guide on web scraping with Go.

Mission accomplished!

Scrape a Dynamic Site with Ferret

Ferret also supports scraping dynamic websites that require JavaScript execution. In this section of the guide, the target site will be the JavaScript-delayed version of the “Quotes to Scrape” site:

Note how the quote elements are added to the page after a delay

The page uses JavaScript to dynamically inject quote elements into the DOM after a short delay. That scenario requires executing JavaScript—hence, the need to render the page in a browser. (That also is why we previously set up a Chromium Docker container.)

Follow the steps below to learn how to handle dynamic web pages using Ferret!

Step #1: Connect to the Target Page in the Browser

Use the following lines to connect to the target page via a headless browser:

LET doc = DOCUMENT("https://quotes.toscrape.com/js-delayed/?delay=2000", {
    driver: "cdp"
})

Note the use of the driver field in the DOCUMENT() function. That is what tells Ferret to render the page in the headless Chroumium instance configured via Docker.

Step #2: Wait for the Target Elements to be on the Page

Visit the target page in your browser, wait for the quote elements to load, and inspect one of them:

Inspecting a quote element

Notice how the quote elements can be selected using the .quote CSS selector. These quote elements will be rendered via JavaScript after a short delay, so you must wait for them.

Use the WAIT_ELEMENT() function in Ferret to wait for the quote elements to appear on the page:

// wait up to 5 seconds for the quote elements to be on the page
WAIT_ELEMENT(doc, ".quote", 5000)

That is an essential construct to use when scraping dynamic web pages that rely on JavaScript to render content.

Step #3: Apply the Scraping Logic

Now, focus on the HTML structure of the info elements inside a .quote node:

Inspecting a .quote element in detail

Note that you can scrape:

  • The quote text from .quote
  • The author from .author

Implement the Ferret web scraping logic with:

// select the quote HTML elements
LET quote_elements = ELEMENTS(doc, ".quote")

// iterate over each quote element and apply the scraping logic
FOR quote_element IN quote_elements
    // select all info elements
    LET text_element = ELEMENT(quote_element, ".text")
    LET author_element = ELEMENT(quote_element, ".author")

    // scrape the data of interest
    RETURN {
        quote: TRIM(INNER_TEXT(text_element)),
        author: TRIM(INNER_TEXT(author_element))
    }

Awesome! Parsing logic completed.

Step #4: Assemble Everything

The scraper.fql file should contain:

// connect to the target site via the Chromium headless instance
LET doc = DOCUMENT("https://quotes.toscrape.com/js-delayed/?delay=2000", {
    driver: "cdp"
})

// wait up to 5 seconds for the quote elements to be on the page
WAIT_ELEMENT(doc, ".quote", 5000)

// select the quote HTML elements
LET quote_elements = ELEMENTS(doc, ".quote")

// iterate over each quote element and apply the scraping logic
FOR quote_element IN quote_elements
    // select all info elements
    LET text_element = ELEMENT(quote_element, ".text")
    LET author_element = ELEMENT(quote_element, ".author")

    // scrape the data of interest
    RETURN {
        quote: TRIM(INNER_TEXT(text_element)),
        author: TRIM(INNER_TEXT(author_element))
    }

As you can see, this is not much different from the script for a static site. Again, the reason is that Ferret uses a declarative approach to web scraping.

Step #5: Run the FQL Code

Run your Ferret scraping script with:

./ferret exec scraper.fql

This time, the result will be:

[{"author":"Albert Einstein","quote":"“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”"},{"author":"J.K. Rowling","quote":"“It is our choices, Harry, that show what we truly are, far more than our abilities.”"},{"author":"Albert Einstein","quote":"“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”"},{"author":"Jane Austen","quote":"“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”"},{"author":"Marilyn Monroe","quote":"“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”"},{"author":"Albert Einstein","quote":"“Try not to become a man of success. Rather become a man of value.”"},{"author":"André Gide","quote":"“It is better to be hated for what you are than to be loved for what you are not.”"},{"author":"Thomas A. Edison","quote":"“I have not failed. I've just found 10,000 ways that won't work.”"},{"author":"Eleanor Roosevelt","quote":"“A woman is like a tea bag; you never know how strong it is until it's in hot water.”"},{"author":"Steve Martin","quote":"“A day without sunshine is like, you know, night.”"}]

Et voilà! That is exactly the structured content retrieved from the JavaScript-rendered page.

Limitations of the Ferret Declarative Web Scraping Approach

Ferret is undoubtedly a powerful tool and one of the few that takes a declarative approach to web scraping. Yet, it comes with at least three major drawbacks:

  • Poor documentation and infrequent updates: While the official documentation includes helpful text, it lacks comprehensive API references. That makes it difficult to build complex scripts. Additionally, the project does not receive regular updates, which means it may lag behind modern scraping techniques.
  • No support for anti-scraping bypass: Ferret does not offer built-in mechanisms to handle CAPTCHAs, rate limits, or other advanced anti-scraping defenses. This makes it unsuitable for scraping more protected sites.
  • Limited expressiveness: FQ, the Ferret Query Language, is still under development and does not offer the same level of flexibility or control as more modern scraping tools like Playwright or Puppeteer.

These limitations cannot be easily addressed through simple integrations. Also, do not forget that Ferret’s core focus is on retrieving web data. So, the solution is to consider a more robust alternative.

Bright Data’s AI infrastructure includes a suite of advanced services tailored for reliable and intelligent web data extraction. These enable you to retrieve data from any website and at scale.

Conclusion

In this tutorial, you learned how to use Ferret for declarative web scraping in Go. As demonstrated, this library allows you to extract data from both static and dynamic pages by focusing on what to retrieve, rather than how to retrieve it.

The problem is that Ferret has several limitations, so it might not be the best solution out there. If you are looking for a more streamlined and scalable way to retrieve web data, consider adopting Web Scraper APIs—dedicated endpoints for extracting fresh, structured, and fully compliant web data from over 120 popular websites.

Sign up for a free Bright Data account today and test our powerful web scraping infrastructure!

Antonello Zanini

Technical Writer

5.5 years experience

Antonello Zanini is a technical writer, editor, and software engineer with 5M+ views. Expert in technical content strategy, web development, and project management.

Expertise
Web Development Web Scraping AI Integration