In this tutorial, you will learn:
- What CrewAI is and how it differs from other AI agent libraries.
- Its biggest limitations and how to overcome them with an RAG workflow.
- How to integrate it with a scraping API to provide AI agents with SERP data for more accurate responses.
Let’s dive in!
What Is CrewAI?
CrewAI is an open-source Python framework for orchestrating and managing autonomous AI agents that collaborate to complete complex tasks. Unlike single-agent systems like Browser Use, CrewAI is buit around “crews,” which are a set of agents.
In a crew, each agent has a defined role, goal, and set of tools. In detail, you can equip AI agents with custom tools for specialized tasks like web scraping, database connection, and more. This approach opens the door to specialized AI-powered problem-solving and effective decision-making.
CrewAI’s multi-agent architecture promotes both efficiency and scalability. New features are regularly added—such as support for Qwen models and parallel function calls—making it a rapidly evolving ecosystem.
CrewAI Limitations and How to Overcome Them with Fresh Web Data
CrewAI is a feature-rich framework for building multi-agent systems. However, it inherits some key limitations from the LLMs it relies on. Since LLMs are typically pre-trained on static datasets, they lack real-time awareness and cannot typically access the latest news or live web content.
This can result in outdated answers—or worse, hallucinations. These issues are especially likely if agents are not constrained or provided with up-to-date, trustworthy data in a Retrieval-Augmented Generation setup.
To address those limitations, you should supply agents (and by extension, their LLMs) with reliable external data. The Web is the most comprehensive and dynamic data source available, making it an ideal target. Therefore, one effective approach is enabling CrewAI agents to perform live search queries on platforms like Google or other search engines.
This can be done by building a custom CrewAI tool that lets agents retrieve relevant web pages to learn from. However, scraping SERPs (Search Engine Results Pages) is technically challenging—due to the need for JavaScript rendering, CAPTCHA solving, IP rotation, and ever-changing site structures.
Managing all of that in-house can be more complex than developing the CrewAI logic itself. A better solution is relying on top-tier SERP scraping APIs, such as Bright Data’s SERP API. These services handle the heavy lifting of extracting clean, structured data from the web.
By integrating such APIs into your CrewAI workflow, your agents gain access to fresh and accurate information without the operational overhead. The same strategy can also be applied to other domains by connecting agents to domain-specific scraping APIs.
How to Integrate CrewAI with SERP APIs for Real-Time Data Access
In this guided section, you will see how to give your AI agent built with CrewAI the ability to fetch data directly from SERP engines via the Bright Data SERP API.
This RAG integration allows your CrewAI agents to deliver more contextual and up-to-date results, complete with real-world links for further reading.
Follow the steps below to build a supercharged crew with Bright Data’s SERP API integration!
Prerequisites
To follow along with this tutorial, make sure you have:
- A Bright Data API key.
- An API key to connect to an LLM (we will use Gemini in this tutorial).
- Python 3.10 or higher installed locally.
For more details, check the installation page of the CrewAI documentation, which contains up-to-date prerequisites.
Do not worry if you do not yet have a Bright Data API key, as you will be guided through creating one in the next steps. As for the LLM API key, if you do not have one, we recommend setting up a Gemini API key by following Google’s official guide.
Step #1: Install CrewAI
Start by installing CrewAI globally by running the following command in your terminal:
Note: This will download and configure several packages, so it may take a little while.
If you encounter issues during installation or usage, refer to the troubleshooting section in the official documentation.
Once installed, you will have access to the crewai
CLI command. Verify it by running the following in your terminal:
You should see output similar to:
Great! You now have the CrewAI CLI ready to initialize your project.
Step #2: Project Setup
Run the following command to create a new CrewAI project called serp_agent
:
During setup, you will be prompted to select your preferred LLM provider:
In this case, we are going to select option “3” for Gemini, as its integration via API is free.
Next, select the specific Gemini model you would like to use:
In this example, the free gemini/gemini-1.5-flash
model is sufficient. So, you can select the ”1” option.
Then, you will be asked to enter your Gemini API key:
Paste it and, if everything goes as expected, you should see an output like this:
This procedure will generate the following project structure:
Here:
main.py
is the main entry point of your project.crew.py
is where you define your crew’s logic.config/agents.yaml
defines your AI agents.config/tasks.yaml
defines the tasks your agents will handle.tools/custom_tool.py
will let you add custom tools that your agents can use..env
store API keys and other environment variables.
Navigate into the project folder and install the CrewAI dependencies:
The last command will create a local virtual environment .venv
folder inside your project directory. This will allow you to run your CrewAI locally.
Perfect! You now have a fully initialized CrewAI project using the Gemini API. You are ready to build and run your intelligent SERP agent.
Step #3: Get Started With SERP API
As mentioned earlier, we will use Bright Data’s SERP API to fetch content from search engine results pages and feed it to our CrewAI agents. Specifically, we will do accurate Google searches based on the user’s input and utilize the live scraped data to improve the agent’s responses.
To set up the SERP API, you can refer to the official documentation. Alternatively, follow the steps below.
If you have not already, sign up for an account on Bright Data. Otherwise, just log in. Once logged in, reach the “My Zones” section and click on the “SERP API” row:
If you do not see that row in the table, it means you have not configured a SERP API zone yet. In that case, scroll down and click on “Create zone” under the “SERP API” section:
On the SERP API product page, toggle the “Activate” switch to enable the product:
Next, follow the official guide to generate your Bright Data API key. Then, add it to your .env
file as below:
Replace the <YOUR_BRIGHT_DATA_API_KEY>
placeholder with the actual value of your Bright Data API key.
This is it! You can now use Bright Data’s SERP API in your CrewAI integration.
Step #4: Create a CrewAI SERP Search Tool
Time to define a SERP search tool that your agents can use to interact with the Bright Data SERP API and retrieve search result data.
To achieve that, open the custom_tool.py
file inside the tools/
folder and replace its contents with the following:
This CrewAI tool defines a function that takes a user query and fetches SERP results from the Bright Data SERP API via requets
.
Note that when the brd_json=1
query parameter is used and the format is set to json
, the SERP API responds with this structure:
In particular, after parsing the body
field—which contains a JSON string—you will get the following data structure:
So, you are mainly interested in the organic
field. That is the field accessed in the code, parsed into a JSON string, and then returned by the tool.
Awesome! Your CrewAI agent can now use this tool to retrieve fresh SERP data.
Step #5: Define the Agents
To accomplish this task, you will need two CrewAI agents, each with a distinct purpose:
- Researcher: Gathers search results from Google and filters useful insights.
- Reporting analyst: Assembles the findings into a structured and readable summary.
You can define them in your agents.yml
file by filling it out like this:
Notice how this configuration captures what each agent is supposed to do—nothing more, nothing less. Just define their role
, goal
, and backstory
. Very good!
Step#6: Specify the Tasks for Each Agent
Get ready to define specific tasks that clearly outline each agent’s role within the workflow. According to CrewAI’s documentation—to achieve accurate results—the task definition is more important than the agent definition.
Thus, in the tasks.yml
you need to tell your agents exactly what they need to do, as below:
In this setup, you’re defining two tasks—one for each agent:
research_task
: Tells the researcher how to use the Bright Data SERP API via the tool, including how to build API parameters dynamically based on the query.report_task
: Specifies that the final output should be a readable, informative report built strictly from the collected data.
This tasks.yml
definition is all your CrewAI agents need to gather SERP data and produce a report grounded in real search results.
Time to integrate your CrewAI agents into your code and let them get to work!
Step #7: Create Your Crew
Now that all components are in place, connect everything in the crew.py
file to create a fully functional crew. Specifically, this is how you can define your crew.py
:
In crew.py
, you need to use the CrewAI decorators (@agent
, @task
, @crew
, in this case) to link the logic from your YAML files and wire up the actual functionality.
In this example:
- The
researcher
agent is given access to theSerpSearchTool
, enabling it to perform real Google search queries using Bright Data’s SERP API. - The
reporting_analyst
agent is configured to generate the final report, using the output from the researcher. - Each task corresponds to what was defined in your
tasks.yml
, and is explicitly tied to the appropriate output file. - The process is set to
sequential
, ensuring thatresearcher
runs first and then passes its data toreporting_analyst
.
Here we go! Your SerpAgent
crew is now ready to execute.
Step #8: Create the Main Loop
In main.py
, trigger the crew by passing the user’s query as input:
Mission complete! Your CrewAI + SERP API integration (using Gemini as the LLM) is now fully functional. Just run main.py
, enter a search query, and watch the crew collect and analyze SERP data to produce a report.
Step #9: Run Your AI Agent
In your project’s folder, run your CrewAI application with the following command:
Now, enter a query such as:
This is the kind of question a typical LLM might struggle to answer accurately. The reason is that most of the latest AI protocols, like CMP, A2A, AGP, and ACP, did not exist when the model was originally trained.
Here is what will happen in detail:
As you can notice above, CrewAI handles the request this way:
- The
research
agent is executed, which:- Transforms the user input into a structured query
"new AI protocols"
- Sends the query to Bright Data’s SERP API via the
SerpSearchTool
. - Receives the results from the API and saves them to the
output/serp_data.json
file.
- Transforms the user input into a structured query
- The
reporting_analyst
agent is then triggered, which:- Reads the structured data from the
serp_data.json
file. - Uses that fresh information to generate a context-aware report in Markdown.
- Saves the final structured report to
output/report.md
.
- Reads the structured data from the
If you open report.md
using a Markdown viewer, you will see something like this:
The report includes relevant contextual information and even links to help you dive deeper.
Et voilà! You just implemented an RAG workflow in CrewAI powered by the integration with a SERP API.
Next Steps
The Bright Data SERP API tool integrated into the Crew lets agents receive fresh search engine results. Given the URLs from those SERPs, you could them use them to call other scraping APIs to extract raw content from the linked pages—either in unprocessed form (to convert into Markdown and feed to the agent) or already parsed in JSON.
That idea enables agents to automatically discover some reliable sources and retrieve up-to-date information from them. Additionally, you could integrate a solution like Agent Browser to allow agents to interact dynamically with any live webpage.
These are just a few examples, but the potential scenarios and use cases are virtually limitless.
Conclusion
In this blog post, you learned how to make your CrewAI agents more context-aware by integrating a RAG setup using Bright Data’s SERP API.
As explained, this is just one of many possibilities you can explore by connecting your agents with external scraping APIs or automation tools. In particular, Bright Data’s solutions can serve as powerful building blocks for intelligent AI workflows.
Level up your AI infrastructure with Bright Data’s tools:
- Autonomous AI agents: Search, access, and interact with any website in real-time using a powerful set of APIs.
- Vertical AI apps: Build reliable, custom data pipelines to extract web data from industry-specific sources.
- Foundation models: Access compliant, web-scale datasets to power pre-training, evaluation, and fine-tuning.
- Multimodal AI: Tap into the world’s largest repository of images, videos, and audio—optimized for AI.
- Data providers: Connect with trusted providers to source high-quality, AI-ready datasets at scale.
- Data packages: Get curated, ready-to-use, structured, enriched, and annotated datasets.
For more information, explore our AI hub.
Create a Bright Data account and try all our products and services for AI agent development!
No credit card required