Blog / AI
AI

Enhancing CrewAI Agents Using SERP Scraping APIs via RAG

Enhance CrewAI agents with fresh web data by integrating a SERP scraping API for real-time, accurate AI responses.
18 min read
CrewAI + Bright Data's SERP API

In this tutorial, you will learn:

  • What CrewAI is and how it differs from other AI agent libraries.
  • Its biggest limitations and how to overcome them with an RAG workflow.
  • How to integrate it with a scraping API to provide AI agents with SERP data for more accurate responses.

Let’s dive in!

What Is CrewAI?

CrewAI is an open-source Python framework for orchestrating and managing autonomous AI agents that collaborate to complete complex tasks. Unlike single-agent systems like Browser Use, CrewAI is buit around “crews,” which are a set of agents.

In a crew, each agent has a defined role, goal, and set of tools. In detail, you can equip AI agents with custom tools for specialized tasks like web scraping, database connection, and more. This approach opens the door to specialized AI-powered problem-solving and effective decision-making.

CrewAI’s multi-agent architecture promotes both efficiency and scalability. New features are regularly added—such as support for Qwen models and parallel function calls—making it a rapidly evolving ecosystem.

CrewAI Limitations and How to Overcome Them with Fresh Web Data

CrewAI is a feature-rich framework for building multi-agent systems. However, it inherits some key limitations from the LLMs it relies on. Since LLMs are typically pre-trained on static datasets, they lack real-time awareness and cannot typically access the latest news or live web content.

This can result in outdated answers—or worse, hallucinations. These issues are especially likely if agents are not constrained or provided with up-to-date, trustworthy data in a Retrieval-Augmented Generation setup.

To address those limitations, you should supply agents (and by extension, their LLMs) with reliable external data. The Web is the most comprehensive and dynamic data source available, making it an ideal target. Therefore, one effective approach is enabling CrewAI agents to perform live search queries on platforms like Google or other search engines.

This can be done by building a custom CrewAI tool that lets agents retrieve relevant web pages to learn from. However, scraping SERPs (Search Engine Results Pages) is technically challenging—due to the need for JavaScript rendering, CAPTCHA solving, IP rotation, and ever-changing site structures.

Managing all of that in-house can be more complex than developing the CrewAI logic itself. A better solution is relying on top-tier SERP scraping APIs, such as Bright Data’s SERP API. These services handle the heavy lifting of extracting clean, structured data from the web.

By integrating such APIs into your CrewAI workflow, your agents gain access to fresh and accurate information without the operational overhead. The same strategy can also be applied to other domains by connecting agents to domain-specific scraping APIs.

How to Integrate CrewAI with SERP APIs for Real-Time Data Access

In this guided section, you will see how to give your AI agent built with CrewAI the ability to fetch data directly from SERP engines via the Bright Data SERP API.

This RAG integration allows your CrewAI agents to deliver more contextual and up-to-date results, complete with real-world links for further reading.

Follow the steps below to build a supercharged crew with Bright Data’s SERP API integration!

Prerequisites

To follow along with this tutorial, make sure you have:

For more details, check the installation page of the CrewAI documentation, which contains up-to-date prerequisites.

Do not worry if you do not yet have a Bright Data API key, as you will be guided through creating one in the next steps. As for the LLM API key, if you do not have one, we recommend setting up a Gemini API key by following Google’s official guide.

Step #1: Install CrewAI

Start by installing CrewAI globally by running the following command in your terminal:

pip install crewai

Note: This will download and configure several packages, so it may take a little while.

If you encounter issues during installation or usage, refer to the troubleshooting section in the official documentation.

Once installed, you will have access to the crewai CLI command. Verify it by running the following in your terminal:

crewai

You should see output similar to:

Usage: crewai [OPTIONS] COMMAND [ARGS]...

  Top-level command group for crewai.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  chat               Start a conversation with the Crew, collecting...
  create             Create a new crew, or flow.
  deploy             Deploy the Crew CLI group.
  flow               Flow related commands.
  install            Install the Crew.
  log-tasks-outputs  Retrieve your latest crew.kickoff() task outputs.
  login              Sign Up/Login to CrewAI+.
  replay             Replay the crew execution from a specific task.
  reset-memories     Reset the crew memories (long, short, entity,...
  run                Run the Crew.
  signup             Sign Up/Login to CrewAI+.
  test               Test the crew and evaluate the results.
  tool               Tool Repository related commands.
  train              Train the crew.
  update             Update the pyproject.toml of the Crew project to use...
  version            Show the installed version of crewai.

Great! You now have the CrewAI CLI ready to initialize your project.

Step #2: Project Setup

Run the following command to create a new CrewAI project called serp_agent:

crewai create crew serp_agent

During setup, you will be prompted to select your preferred LLM provider:

Select a provider to set up:
1. openai
2. anthropic
3. gemini
4. nvidia_nim
5. groq
6. huggingface
7. ollama
8. watson
9. bedrock
10. azure
11. cerebras
12. sambanova
13. other
q. Quit
Enter the number of your choice or 'q' to quit:

In this case, we are going to select option “3” for Gemini, as its integration via API is free.

Next, select the specific Gemini model you would like to use:

Select a model to use for Gemini:
1. gemini/gemini-1.5-flash
2. gemini/gemini-1.5-pro
3. gemini/gemini-2.0-flash-lite-001
4. gemini/gemini-2.0-flash-001
5. gemini/gemini-2.0-flash-thinking-exp-01-21
6. gemini/gemini-2.5-flash-preview-04-17
7. gemini/gemini-2.5-pro-exp-03-25
8. gemini/gemini-gemma-2-9b-it
9. gemini/gemini-gemma-2-27b-it
10. gemini/gemma-3-1b-it
11. gemini/gemma-3-4b-it
12. gemini/gemma-3-12b-it
13. gemini/gemma-3-27b-it
q. Quit

In this example, the free gemini/gemini-1.5-flash model is sufficient. So, you can select the ”1” option.

Then, you will be asked to enter your Gemini API key:

Enter your GEMINI API key from https://ai.dev/apikey (press Enter to skip):

Paste it and, if everything goes as expected, you should see an output like this:

API keys and model saved to .env file
Selected model: gemini/gemini-1.5-flash
  - Created serp_agent\.gitignore
  - Created serp_agent\pyproject.toml
  - Created serp_agent\README.md
  - Created serp_agent\knowledge\user_preference.txt
  - Created serp_agent\src\serp_agent\__init__.py
  - Created serp_agent\src\serp_agent\main.py
  - Created serp_agent\src\serp_agent\crew.py
  - Created serp_agent\src\serp_agent\tools\custom_tool.py
  - Created serp_agent\src\serp_agent\tools\__init__.py
  - Created serp_agent\src\serp_agent\config\agents.yaml
  - Created serp_agent\src\serp_agent\config\tasks.yaml
Crew serp_agent created successfully!

This procedure will generate the following project structure:

serp_agent/
├── .gitignore
├── pyproject.toml
├── README.md
├── .env
├── knowledge/
├── tests/
└── src/
    └── serp_agent/
        ├── __init__.py
        ├── main.py
        ├── crew.py
        ├── tools/
        │   ├── custom_tool.py
        │   └── __init__.py
        └── config/
            ├── agents.yaml
            └── tasks.yaml

Here:

  • main.py is the main entry point of your project.
  • crew.py is where you define your crew’s logic.
  • config/agents.yaml defines your AI agents.
  • config/tasks.yaml defines the tasks your agents will handle.
  • tools/custom_tool.py will let you add custom tools that your agents can use.
  • .env store API keys and other environment variables.

Navigate into the project folder and install the CrewAI dependencies:

cd serp_agent
crewai install

The last command will create a local virtual environment .venv folder inside your project directory. This will allow you to run your CrewAI locally.

Perfect! You now have a fully initialized CrewAI project using the Gemini API. You are ready to build and run your intelligent SERP agent.

Step #3: Get Started With SERP API

As mentioned earlier, we will use Bright Data’s SERP API to fetch content from search engine results pages and feed it to our CrewAI agents. Specifically, we will do accurate Google searches based on the user’s input and utilize the live scraped data to improve the agent’s responses.

To set up the SERP API, you can refer to the official documentation. Alternatively, follow the steps below.

If you have not already, sign up for an account on Bright Data. Otherwise, just log in. Once logged in, reach the “My Zones” section and click on the “SERP API” row:

Selecting the “SERP API” row

If you do not see that row in the table, it means you have not configured a SERP API zone yet. In that case, scroll down and click on “Create zone” under the “SERP API” section:

Configuring the SERP API zone

On the SERP API product page, toggle the “Activate” switch to enable the product:

Next, follow the official guide to generate your Bright Data API key. Then, add it to your .env file as below:

BRIGHT_DATA_API_KEY=<YOUR_BRIGHT_DATA_API_KEY>

Replace the <YOUR_BRIGHT_DATA_API_KEY> placeholder with the actual value of your Bright Data API key.

This is it! You can now use Bright Data’s SERP API in your CrewAI integration.

Step #4: Create a CrewAI SERP Search Tool

Time to define a SERP search tool that your agents can use to interact with the Bright Data SERP API and retrieve search result data.

To achieve that, open the custom_tool.py file inside the tools/ folder and replace its contents with the following:

# src/search_agent/tools/custom_tool.py

import os
import json
from typing import Type
import requests
from pydantic import BaseModel, PrivateAttr
from crewai.tools import BaseTool


class SerpSearchToolInput(BaseModel):
    query: str


class SerpSearchTool(BaseTool):
    _api_key: str = PrivateAttr()

    name: str = "Bright Data SERP Search Tool"
    description: str = """
    Uses Bright Data's SERP API to retrieve real-time Google search results based on the user's query.
    This tool fetches organic search listings to support agent responses with live data.
    """
    args_schema: Type[BaseModel] = SerpSearchToolInput

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Read the Bright Data API key from the envs
        self._api_key = os.environ.get("BRIGHT_DATA_API_KEY")

        if not self._api_key:
            raise ValueError("Missing Bright Data API key. Please set BRIGHT_DATA_API_KEY in your .env file")

    def _run(self, query: str) -> str:
        url = "https://api.brightdata.com/request"
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {self._api_key}"
        }
        payload = {
            "zone": "serp", # Replace with the name of your actual Bright Data SERP API zone
            "format": "json",
            "url": f"https://www.google.com/search?q={query}&brd_json=1"
        }

        try:
            response = requests.post(url, json=payload, headers=headers)
            # Raise exceptions in case of errors
            response.raise_for_status()

            # Parse the JSON response
            json_response = response.json()
            response_body = json.loads(json_response.get("body", "{}"))

            if "organic" not in response_body:
                return "The response did not include organic search results."

            # Return the SERP data as a JSON string
            return json.dumps(response_body["organic"], indent=4)

        except requests.exceptions.HTTPError as http_err:
            return f"HTTP error occurred while querying Bright Data SERP API: {http_err}"
        except requests.exceptions.RequestException as req_err:
            return f"Network error occurred while connecting to Bright Data: {req_err}"
        except (json.JSONDecodeError, KeyError) as parse_err:
            return f"Error parsing Bright Data SERP API response: {parse_err}"

This CrewAI tool defines a function that takes a user query and fetches SERP results from the Bright Data SERP API via requets.

Note that when the brd_json=1 query parameter is used and the format is set to json, the SERP API responds with this structure:

{
  "status_code": 200,
  "headers": {
    "content-type": "application/json",
    // omitted for brevity...
  },
  "body": "{\"general\":{\"search_engine\":\"google\",\"query\":\"pizza\",\"results_cnt\":1980000000, ...}}"
}

In particular, after parsing the body field—which contains a JSON string—you will get the following data structure:

{
  "general": {
    "search_engine": "google",
    "query": "pizza",
    "results_cnt": 1980000000,
    "search_time": 0.57,
    "language": "en",
    "mobile": false,
    "basic_view": false,
    "search_type": "text",
    "page_title": "pizza - Google Search",
    "timestamp": "2023-06-30T08:58:41.786Z"
  },
  "input": {
    "original_url": "https://www.google.com/search?q=pizza&brd_json=1",
    "request_id": "hl_1a1be908_i00lwqqxt1"
  },
  "organic": [
    {
      "link": "https://www.pizzahut.com/",
      "display_link": "https://www.pizzahut.com",
      "title": "Pizza Hut | Delivery & Carryout - No One OutPizzas The Hut!",
      "rank": 1,
      "global_rank": 1
    },
    {
      "link": "https://www.dominos.com/en/",
      "display_link": "https://www.dominos.com",
      "title": "Domino's: Pizza Delivery & Carryout, Pasta, Chicken & More",
      "description": "Order pizza, pasta, sandwiches & more online...",
      "rank": 2,
      "global_rank": 3
    },
    // ...additional organic results omitted for brevity
  ]
}

So, you are mainly interested in the organic field. That is the field accessed in the code, parsed into a JSON string, and then returned by the tool.

Awesome! Your CrewAI agent can now use this tool to retrieve fresh SERP data.

Step #5: Define the Agents

To accomplish this task, you will need two CrewAI agents, each with a distinct purpose:

  1. Researcher: Gathers search results from Google and filters useful insights.
  2. Reporting analyst: Assembles the findings into a structured and readable summary.

You can define them in your agents.yml file by filling it out like this:

# src/search_agent/configs/agents.yml

researcher:
  role: >
    Online Research Specialist
  goal: >
    Conduct smart Google searches and collect relevant, trustworthy details from the top results.
  backstory: >
    You have a knack for phrasing search queries that deliver the most accurate and insightful content.
    Your expertise lies in quickly identifying high-quality information from reputable sources.

reporting_analyst:
  role: >
    Strategic Report Creator
  goal: >
    Organize collected data into a clear, informative narrative that’s easy to understand and act on.
  backstory: >
    You excel at digesting raw information and turning it into meaningful analysis. Your work helps
    teams make sense of data by presenting it in a well-structured and strategic format.

Notice how this configuration captures what each agent is supposed to do—nothing more, nothing less. Just define their role , goal, and backstory. Very good!

Step#6: Specify the Tasks for Each Agent

Get ready to define specific tasks that clearly outline each agent’s role within the workflow. According to CrewAI’s documentation—to achieve accurate results—the task definition is more important than the agent definition.

Thus, in the tasks.yml you need to tell your agents exactly what they need to do, as below:

# src/search_agent/configs/tasks.yml

research_task:
  description: >
    Leverage SerpSearchTool to perform a targeted search based on the user's {query}.
    Build API parameters like:
    - 'query': develop a short, Google-like, keyword-optimized search phrase for search engines.
    From the returned data, identify the most relevant and factual content.

  expected_output: >
    A file containing well-structured raw JSON content with the data from search results.
    Avoid rewriting, summarizing, or modifying any content.

  agent: researcher
  output_file: output/serp_data.json

report_task:
  description: >
    Turn the collected data into a digestible, insight-rich report.
    Address the user's {query} with fact-based findings. Add links for further reading. Do not fabricate or guess any information.

  expected_output: >
    A Markdown report with key takeaways and meaningful insights.
    Keep the content brief and clearly, visually structured.

  agent: reporting_analyst
  context: [research_task]
  output_file: output/report.md

In this setup, you’re defining two tasks—one for each agent:

  • research_task: Tells the researcher how to use the Bright Data SERP API via the tool, including how to build API parameters dynamically based on the query.
  • report_task: Specifies that the final output should be a readable, informative report built strictly from the collected data.

This tasks.yml definition is all your CrewAI agents need to gather SERP data and produce a report grounded in real search results.

Time to integrate your CrewAI agents into your code and let them get to work!

Step #7: Create Your Crew

Now that all components are in place, connect everything in the crew.py file to create a fully functional crew. Specifically, this is how you can define your crew.py:

# src/search_agent/crew.py

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from .tools.custom_tool import SerpSearchTool
from crewai.agents.agent_builder.base_agent import BaseAgent
from typing import List

@CrewBase
class SerpAgent():
    """SerpAgent crew"""

    agents: List[BaseAgent]
    tasks: List[Task]

    @agent
    def researcher(self) -> Agent:
        return Agent(
            config=self.agents_config["researcher"],
            tools=[SerpSearchTool()],
            verbose=True
        )

    @agent
    def reporting_analyst(self) -> Agent:
        return Agent(
            config=self.agents_config["reporting_analyst"],
            verbose=True
        )

    @task
    def research_task(self) -> Task:
        return Task(
            config=self.tasks_config["research_task"],
            output_file="output/serp_data.json"
        )

    @task
    def report_task(self) -> Task:
        return Task(
            config=self.tasks_config["report_task"],
            output_file="output/report.md"
        )

    @crew
    def crew(self) -> Crew:
        """Creates the SerpAgent crew"""
        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,
            verbose=True,
        )

In crew.py, you need to use the CrewAI decorators (@agent, @task, @crew, in this case) to link the logic from your YAML files and wire up the actual functionality.

In this example:

  • The researcher agent is given access to the SerpSearchTool, enabling it to perform real Google search queries using Bright Data’s SERP API.
  • The reporting_analyst agent is configured to generate the final report, using the output from the researcher.
  • Each task corresponds to what was defined in your tasks.yml, and is explicitly tied to the appropriate output file.
  • The process is set to sequential, ensuring that researcher runs first and then passes its data to reporting_analyst.

Here we go! Your SerpAgent crew is now ready to execute.

Step #8: Create the Main Loop

In main.py, trigger the crew by passing the user’s query as input:

# src/search_crew/main.py

import os
from serp_agent.crew import SerpAgent

# Create the output/ folder if it doesn"t already exist
os.makedirs("output", exist_ok=True)

def run():
    try:
        # Read the user's input and pass it to the crew
        inputs = {"query": input("\nSearch for: ").strip()}

        # Start the SERP agent crew
        result = SerpAgent().crew().kickoff(
            inputs=inputs
        )
        return result
    except Exception as e:
        print(f"An error occurred: {str(e)}")

if __name__ == "__main__":
    run()

Mission complete! Your CrewAI + SERP API integration (using Gemini as the LLM) is now fully functional. Just run main.py, enter a search query, and watch the crew collect and analyze SERP data to produce a report.

Step #9: Run Your AI Agent

In your project’s folder, run your CrewAI application with the following command:

crewai run

Now, enter a query such as:

"What are the new AI protocols?"

This is the kind of question a typical LLM might struggle to answer accurately. The reason is that most of the latest AI protocols, like CMP, A2A, AGP, and ACP, did not exist when the model was originally trained.

Here is what will happen in detail:

As you can notice above, CrewAI handles the request this way:

  1. The research agent is executed, which:
    1. Transforms the user input into a structured query "new AI protocols"
    2. Sends the query to Bright Data’s SERP API via the SerpSearchTool.
    3. Receives the results from the API and saves them to the output/serp_data.json file.
  2. The reporting_analyst agent is then triggered, which:
    1. Reads the structured data from the serp_data.json file.
    2. Uses that fresh information to generate a context-aware report in Markdown.
    3. Saves the final structured report to output/report.md.

If you open report.md using a Markdown viewer, you will see something like this:

The final Markdown report produced by CrewAI

The report includes relevant contextual information and even links to help you dive deeper.

Et voilà! You just implemented an RAG workflow in CrewAI powered by the integration with a SERP API.

Next Steps

The Bright Data SERP API tool integrated into the Crew lets agents receive fresh search engine results. Given the URLs from those SERPs, you could them use them to call other scraping APIs to extract raw content from the linked pages—either in unprocessed form (to convert into Markdown and feed to the agent) or already parsed in JSON.

That idea enables agents to automatically discover some reliable sources and retrieve up-to-date information from them. Additionally, you could integrate a solution like Agent Browser to allow agents to interact dynamically with any live webpage.

These are just a few examples, but the potential scenarios and use cases are virtually limitless.

Conclusion

In this blog post, you learned how to make your CrewAI agents more context-aware by integrating a RAG setup using Bright Data’s SERP API.

As explained, this is just one of many possibilities you can explore by connecting your agents with external scraping APIs or automation tools. In particular, Bright Data’s solutions can serve as powerful building blocks for intelligent AI workflows.

Level up your AI infrastructure with Bright Data’s tools:

  • Autonomous AI agents: Search, access, and interact with any website in real-time using a powerful set of APIs.
  • Vertical AI apps: Build reliable, custom data pipelines to extract web data from industry-specific sources.
  • Foundation models: Access compliant, web-scale datasets to power pre-training, evaluation, and fine-tuning.
  • Multimodal AI: Tap into the world’s largest repository of images, videos, and audio—optimized for AI.
  • Data providers: Connect with trusted providers to source high-quality, AI-ready datasets at scale.
  • Data packages: Get curated, ready-to-use, structured, enriched, and annotated datasets.

For more information, explore our AI hub.

Create a Bright Data account and try all our products and services for AI agent development!

No credit card required