In this tutorial, you will learn:
- What CrewAI is and how it differs from other AI agent libraries.
- Its biggest limitations and how to overcome them with an RAG workflow.
- How to integrate it with a scraping API to provide AI agents with SERP data for more accurate responses.
Let’s dive in!
What Is CrewAI?
CrewAI is an open-source Python framework for orchestrating and managing autonomous AI agents that collaborate to complete complex tasks. Unlike single-agent systems like Browser Use, CrewAI is buit around “crews,” which are a set of agents.
In a crew, each agent has a defined role, goal, and set of tools. In detail, you can equip AI agents with custom tools for specialized tasks like web scraping, database connection, and more. This approach opens the door to specialized AI-powered problem-solving and effective decision-making.
CrewAI’s multi-agent architecture promotes both efficiency and scalability. New features are regularly added—such as support for Qwen models and parallel function calls—making it a rapidly evolving ecosystem.
CrewAI Limitations and How to Overcome Them with Fresh Web Data
CrewAI is a feature-rich framework for building multi-agent systems. However, it inherits some key limitations from the LLMs it relies on. Since LLMs are typically pre-trained on static datasets, they lack real-time awareness and cannot typically access the latest news or live web content.
This can result in outdated answers—or worse, hallucinations. These issues are especially likely if agents are not constrained or provided with up-to-date, trustworthy data in a Retrieval-Augmented Generation setup.
To address those limitations, you should supply agents (and by extension, their LLMs) with reliable external data. The Web is the most comprehensive and dynamic data source available, making it an ideal target. Therefore, one effective approach is enabling CrewAI agents to perform live search queries on platforms like Google or other search engines.
This can be done by building a custom CrewAI tool that lets agents retrieve relevant web pages to learn from. However, scraping SERPs (Search Engine Results Pages) is technically challenging—due to the need for JavaScript rendering, CAPTCHA solving, IP rotation, and ever-changing site structures.
Managing all of that in-house can be more complex than developing the CrewAI logic itself. A better solution is relying on top-tier SERP scraping APIs, such as Bright Data’s SERP API. These services handle the heavy lifting of extracting clean, structured data from the web.
By integrating such APIs into your CrewAI workflow, your agents gain access to fresh and accurate information without the operational overhead. The same strategy can also be applied to other domains by connecting agents to domain-specific scraping APIs.
How to Integrate CrewAI with SERP APIs for Real-Time Data Access
In this guided section, you will see how to give your AI agent built with CrewAI the ability to fetch data directly from SERP engines via the Bright Data SERP API.
This RAG integration allows your CrewAI agents to deliver more contextual and up-to-date results, complete with real-world links for further reading.
Follow the steps below to build a supercharged crew with Bright Data’s SERP API integration!
Prerequisites
To follow along with this tutorial, make sure you have:
- A Bright Data API key.
- An API key to connect to an LLM (we will use Gemini in this tutorial).
- Python 3.10 or higher installed locally.
For more details, check the installation page of the CrewAI documentation, which contains up-to-date prerequisites.
Do not worry if you do not yet have a Bright Data API key, as you will be guided through creating one in the next steps. As for the LLM API key, if you do not have one, we recommend setting up a Gemini API key by following Google’s official guide.
Step #1: Install CrewAI
Start by installing CrewAI globally by running the following command in your terminal:
pip install crewai
Note: This will download and configure several packages, so it may take a little while.
If you encounter issues during installation or usage, refer to the troubleshooting section in the official documentation.
Once installed, you will have access to the crewai
CLI command. Verify it by running the following in your terminal:
crewai
You should see output similar to:
Usage: crewai [OPTIONS] COMMAND [ARGS]...
Top-level command group for crewai.
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
chat Start a conversation with the Crew, collecting...
create Create a new crew, or flow.
deploy Deploy the Crew CLI group.
flow Flow related commands.
install Install the Crew.
log-tasks-outputs Retrieve your latest crew.kickoff() task outputs.
login Sign Up/Login to CrewAI+.
replay Replay the crew execution from a specific task.
reset-memories Reset the crew memories (long, short, entity,...
run Run the Crew.
signup Sign Up/Login to CrewAI+.
test Test the crew and evaluate the results.
tool Tool Repository related commands.
train Train the crew.
update Update the pyproject.toml of the Crew project to use...
version Show the installed version of crewai.
Great! You now have the CrewAI CLI ready to initialize your project.
Step #2: Project Setup
Run the following command to create a new CrewAI project called serp_agent
:
crewai create crew serp_agent
During setup, you will be prompted to select your preferred LLM provider:
Select a provider to set up:
1. openai
2. anthropic
3. gemini
4. nvidia_nim
5. groq
6. huggingface
7. ollama
8. watson
9. bedrock
10. azure
11. cerebras
12. sambanova
13. other
q. Quit
Enter the number of your choice or 'q' to quit:
In this case, we are going to select option “3” for Gemini, as its integration via API is free.
Next, select the specific Gemini model you would like to use:
Select a model to use for Gemini:
1. gemini/gemini-1.5-flash
2. gemini/gemini-1.5-pro
3. gemini/gemini-2.0-flash-lite-001
4. gemini/gemini-2.0-flash-001
5. gemini/gemini-2.0-flash-thinking-exp-01-21
6. gemini/gemini-2.5-flash-preview-04-17
7. gemini/gemini-2.5-pro-exp-03-25
8. gemini/gemini-gemma-2-9b-it
9. gemini/gemini-gemma-2-27b-it
10. gemini/gemma-3-1b-it
11. gemini/gemma-3-4b-it
12. gemini/gemma-3-12b-it
13. gemini/gemma-3-27b-it
q. Quit
In this example, the free gemini/gemini-1.5-flash
model is sufficient. So, you can select the ”1” option.
Then, you will be asked to enter your Gemini API key:
Enter your GEMINI API key from https://ai.dev/apikey (press Enter to skip):
Paste it and, if everything goes as expected, you should see an output like this:
API keys and model saved to .env file
Selected model: gemini/gemini-1.5-flash
- Created serp_agent.gitignore
- Created serp_agentpyproject.toml
- Created serp_agentREADME.md
- Created serp_agentknowledgeuser_preference.txt
- Created serp_agentsrcserp_agent__init__.py
- Created serp_agentsrcserp_agentmain.py
- Created serp_agentsrcserp_agentcrew.py
- Created serp_agentsrcserp_agenttoolscustom_tool.py
- Created serp_agentsrcserp_agenttools__init__.py
- Created serp_agentsrcserp_agentconfigagents.yaml
- Created serp_agentsrcserp_agentconfigtasks.yaml
Crew serp_agent created successfully!
This procedure will generate the following project structure:
serp_agent/
├── .gitignore
├── pyproject.toml
├── README.md
├── .env
├── knowledge/
├── tests/
└── src/
└── serp_agent/
├── __init__.py
├── main.py
├── crew.py
├── tools/
│ ├── custom_tool.py
│ └── __init__.py
└── config/
├── agents.yaml
└── tasks.yaml
Here:
main.py
is the main entry point of your project.crew.py
is where you define your crew’s logic.config/agents.yaml
defines your AI agents.config/tasks.yaml
defines the tasks your agents will handle.tools/custom_tool.py
will let you add custom tools that your agents can use..env
store API keys and other environment variables.
Navigate into the project folder and install the CrewAI dependencies:
cd serp_agent
crewai install
The last command will create a local virtual environment .venv
folder inside your project directory. This will allow you to run your CrewAI locally.
Perfect! You now have a fully initialized CrewAI project using the Gemini API. You are ready to build and run your intelligent SERP agent.
Step #3: Get Started With SERP API
As mentioned earlier, we will use Bright Data’s SERP API to fetch content from search engine results pages and feed it to our CrewAI agents. Specifically, we will do accurate Google searches based on the user’s input and utilize the live scraped data to improve the agent’s responses.
To set up the SERP API, you can refer to the official documentation. Alternatively, follow the steps below.
If you have not already, sign up for an account on Bright Data. Otherwise, just log in. Once logged in, reach the “My Zones” section and click on the “SERP API” row:
If you do not see that row in the table, it means you have not configured a SERP API zone yet. In that case, scroll down and click on “Create zone” under the “SERP API” section:
On the SERP API product page, toggle the “Activate” switch to enable the product:
Next, follow the official guide to generate your Bright Data API key. Then, add it to your .env
file as below:
BRIGHT_DATA_API_KEY=<YOUR_BRIGHT_DATA_API_KEY>
Replace the <YOUR_BRIGHT_DATA_API_KEY>
placeholder with the actual value of your Bright Data API key.
This is it! You can now use Bright Data’s SERP API in your CrewAI integration.
Step #4: Create a CrewAI SERP Search Tool
Time to define a SERP search tool that your agents can use to interact with the Bright Data SERP API and retrieve search result data.
To achieve that, open the custom_tool.py
file inside the tools/
folder and replace its contents with the following:
# src/search_agent/tools/custom_tool.py
import os
import json
from typing import Type
import requests
from pydantic import BaseModel, PrivateAttr
from crewai.tools import BaseTool
class SerpSearchToolInput(BaseModel):
query: str
class SerpSearchTool(BaseTool):
_api_key: str = PrivateAttr()
name: str = "Bright Data SERP Search Tool"
description: str = """
Uses Bright Data's SERP API to retrieve real-time Google search results based on the user's query.
This tool fetches organic search listings to support agent responses with live data.
"""
args_schema: Type[BaseModel] = SerpSearchToolInput
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Read the Bright Data API key from the envs
self._api_key = os.environ.get("BRIGHT_DATA_API_KEY")
if not self._api_key:
raise ValueError("Missing Bright Data API key. Please set BRIGHT_DATA_API_KEY in your .env file")
def _run(self, query: str) -> str:
url = "https://api.brightdata.com/request"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self._api_key}"
}
payload = {
"zone": "serp", # Replace with the name of your actual Bright Data SERP API zone
"format": "json",
"url": f"https://www.google.com/search?q={query}&brd_json=1"
}
try:
response = requests.post(url, json=payload, headers=headers)
# Raise exceptions in case of errors
response.raise_for_status()
# Parse the JSON response
json_response = response.json()
response_body = json.loads(json_response.get("body", "{}"))
if "organic" not in response_body:
return "The response did not include organic search results."
# Return the SERP data as a JSON string
return json.dumps(response_body["organic"], indent=4)
except requests.exceptions.HTTPError as http_err:
return f"HTTP error occurred while querying Bright Data SERP API: {http_err}"
except requests.exceptions.RequestException as req_err:
return f"Network error occurred while connecting to Bright Data: {req_err}"
except (json.JSONDecodeError, KeyError) as parse_err:
return f"Error parsing Bright Data SERP API response: {parse_err}"
This CrewAI tool defines a function that takes a user query and fetches SERP results from the Bright Data SERP API via requets
.
Note that when the brd_json=1
query parameter is used and the format is set to json
, the SERP API responds with this structure:
{
"status_code": 200,
"headers": {
"content-type": "application/json",
// omitted for brevity...
},
"body": "{"general":{"search_engine":"google","query":"pizza","results_cnt":1980000000, ...}}"
}
In particular, after parsing the body
field—which contains a JSON string—you will get the following data structure:
{
"general": {
"search_engine": "google",
"query": "pizza",
"results_cnt": 1980000000,
"search_time": 0.57,
"language": "en",
"mobile": false,
"basic_view": false,
"search_type": "text",
"page_title": "pizza - Google Search",
"timestamp": "2023-06-30T08:58:41.786Z"
},
"input": {
"original_url": "https://www.google.com/search?q=pizza&brd_json=1",
"request_id": "hl_1a1be908_i00lwqqxt1"
},
"organic": [
{
"link": "https://www.pizzahut.com/",
"display_link": "https://www.pizzahut.com",
"title": "Pizza Hut | Delivery & Carryout - No One OutPizzas The Hut!",
"rank": 1,
"global_rank": 1
},
{
"link": "https://www.dominos.com/en/",
"display_link": "https://www.dominos.com",
"title": "Domino's: Pizza Delivery & Carryout, Pasta, Chicken & More",
"description": "Order pizza, pasta, sandwiches & more online...",
"rank": 2,
"global_rank": 3
},
// ...additional organic results omitted for brevity
]
}
So, you are mainly interested in the organic
field. That is the field accessed in the code, parsed into a JSON string, and then returned by the tool.
Awesome! Your CrewAI agent can now use this tool to retrieve fresh SERP data.
Step #5: Define the Agents
To accomplish this task, you will need two CrewAI agents, each with a distinct purpose:
- Researcher: Gathers search results from Google and filters useful insights.
- Reporting analyst: Assembles the findings into a structured and readable summary.
You can define them in your agents.yml
file by filling it out like this:
# src/search_agent/configs/agents.yml
researcher:
role: >
Online Research Specialist
goal: >
Conduct smart Google searches and collect relevant, trustworthy details from the top results.
backstory: >
You have a knack for phrasing search queries that deliver the most accurate and insightful content.
Your expertise lies in quickly identifying high-quality information from reputable sources.
reporting_analyst:
role: >
Strategic Report Creator
goal: >
Organize collected data into a clear, informative narrative that’s easy to understand and act on.
backstory: >
You excel at digesting raw information and turning it into meaningful analysis. Your work helps
teams make sense of data by presenting it in a well-structured and strategic format.
Notice how this configuration captures what each agent is supposed to do—nothing more, nothing less. Just define their role
, goal
, and backstory
. Very good!
Step#6: Specify the Tasks for Each Agent
Get ready to define specific tasks that clearly outline each agent’s role within the workflow. According to CrewAI’s documentation—to achieve accurate results—the task definition is more important than the agent definition.
Thus, in the tasks.yml
you need to tell your agents exactly what they need to do, as below:
# src/search_agent/configs/tasks.yml
research_task:
description: >
Leverage SerpSearchTool to perform a targeted search based on the user's {query}.
Build API parameters like:
- 'query': develop a short, Google-like, keyword-optimized search phrase for search engines.
From the returned data, identify the most relevant and factual content.
expected_output: >
A file containing well-structured raw JSON content with the data from search results.
Avoid rewriting, summarizing, or modifying any content.
agent: researcher
output_file: output/serp_data.json
report_task:
description: >
Turn the collected data into a digestible, insight-rich report.
Address the user's {query} with fact-based findings. Add links for further reading. Do not fabricate or guess any information.
expected_output: >
A Markdown report with key takeaways and meaningful insights.
Keep the content brief and clearly, visually structured.
agent: reporting_analyst
context: [research_task]
output_file: output/report.md
In this setup, you’re defining two tasks—one for each agent:
research_task
: Tells the researcher how to use the Bright Data SERP API via the tool, including how to build API parameters dynamically based on the query.report_task
: Specifies that the final output should be a readable, informative report built strictly from the collected data.
This tasks.yml
definition is all your CrewAI agents need to gather SERP data and produce a report grounded in real search results.
Time to integrate your CrewAI agents into your code and let them get to work!
Step #7: Create Your Crew
Now that all components are in place, connect everything in the crew.py
file to create a fully functional crew. Specifically, this is how you can define your crew.py
:
# src/search_agent/crew.py
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from .tools.custom_tool import SerpSearchTool
from crewai.agents.agent_builder.base_agent import BaseAgent
from typing import List
@CrewBase
class SerpAgent():
"""SerpAgent crew"""
agents: List[BaseAgent]
tasks: List[Task]
@agent
def researcher(self) -> Agent:
return Agent(
config=self.agents_config["researcher"],
tools=[SerpSearchTool()],
verbose=True
)
@agent
def reporting_analyst(self) -> Agent:
return Agent(
config=self.agents_config["reporting_analyst"],
verbose=True
)
@task
def research_task(self) -> Task:
return Task(
config=self.tasks_config["research_task"],
output_file="output/serp_data.json"
)
@task
def report_task(self) -> Task:
return Task(
config=self.tasks_config["report_task"],
output_file="output/report.md"
)
@crew
def crew(self) -> Crew:
"""Creates the SerpAgent crew"""
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.sequential,
verbose=True,
)
In crew.py
, you need to use the CrewAI decorators (@agent
, @task
, @crew
, in this case) to link the logic from your YAML files and wire up the actual functionality.
In this example:
- The
researcher
agent is given access to theSerpSearchTool
, enabling it to perform real Google search queries using Bright Data’s SERP API. - The
reporting_analyst
agent is configured to generate the final report, using the output from the researcher. - Each task corresponds to what was defined in your
tasks.yml
, and is explicitly tied to the appropriate output file. - The process is set to
sequential
, ensuring thatresearcher
runs first and then passes its data toreporting_analyst
.
Here we go! Your SerpAgent
crew is now ready to execute.
Step #8: Create the Main Loop
In main.py
, trigger the crew by passing the user’s query as input:
# src/search_crew/main.py
import os
from serp_agent.crew import SerpAgent
# Create the output/ folder if it doesn"t already exist
os.makedirs("output", exist_ok=True)
def run():
try:
# Read the user's input and pass it to the crew
inputs = {"query": input("nSearch for: ").strip()}
# Start the SERP agent crew
result = SerpAgent().crew().kickoff(
inputs=inputs
)
return result
except Exception as e:
print(f"An error occurred: {str(e)}")
if __name__ == "__main__":
run()
Mission complete! Your CrewAI + SERP API integration (using Gemini as the LLM) is now fully functional. Just run main.py
, enter a search query, and watch the crew collect and analyze SERP data to produce a report.
Step #9: Run Your AI Agent
In your project’s folder, run your CrewAI application with the following command:
crewai run
Now, enter a query such as:
"What are the new AI protocols?"
This is the kind of question a typical LLM might struggle to answer accurately. The reason is that most of the latest AI protocols, like CMP, A2A, AGP, and ACP, did not exist when the model was originally trained.
Here is what will happen in detail:
As you can notice above, CrewAI handles the request this way:
- The
research
agent is executed, which:- Transforms the user input into a structured query
"new AI protocols"
- Sends the query to Bright Data’s SERP API via the
SerpSearchTool
. - Receives the results from the API and saves them to the
output/serp_data.json
file.
- Transforms the user input into a structured query
- The
reporting_analyst
agent is then triggered, which:- Reads the structured data from the
serp_data.json
file. - Uses that fresh information to generate a context-aware report in Markdown.
- Saves the final structured report to
output/report.md
.
- Reads the structured data from the
If you open report.md
using a Markdown viewer, you will see something like this:
The report includes relevant contextual information and even links to help you dive deeper.
Et voilà! You just implemented an RAG workflow in CrewAI powered by the integration with a SERP API.
Next Steps
The Bright Data SERP API tool integrated into the Crew lets agents receive fresh search engine results. Given the URLs from those SERPs, you could them use them to call other scraping APIs to extract raw content from the linked pages—either in unprocessed form (to convert into Markdown and feed to the agent) or already parsed in JSON.
That idea enables agents to automatically discover some reliable sources and retrieve up-to-date information from them. Additionally, you could integrate a solution like Agent Browser to allow agents to interact dynamically with any live webpage.
These are just a few examples, but the potential scenarios and use cases are virtually limitless.
Conclusion
In this blog post, you learned how to make your CrewAI agents more context-aware by integrating a RAG setup using Bright Data’s SERP API.
As explained, this is just one of many possibilities you can explore by connecting your agents with external scraping APIs or automation tools. In particular, Bright Data’s solutions can serve as powerful building blocks for intelligent AI workflows.
Level up your AI infrastructure with Bright Data’s tools:
- Autonomous AI agents: Search, access, and interact with any website in real-time using a powerful set of APIs.
- Vertical AI apps: Build reliable, custom data pipelines to extract web data from industry-specific sources.
- Foundation models: Access compliant, web-scale datasets to power pre-training, evaluation, and fine-tuning.
- Multimodal AI: Tap into the world’s largest repository of images, videos, and audio—optimized for AI.
- Data providers: Connect with trusted providers to source high-quality, AI-ready datasets at scale.
- Data packages: Get curated, ready-to-use, structured, enriched, and annotated datasets.
For more information, explore our AI hub.
Create a Bright Data account and try all our products and services for AI agent development!