In this guide, you will learn:
- What LlamaIndex is.
- Why AI agents built with LlamaIndex should be able to perform web searches.
- How to create a LlamaIndex AI agent with web search capabilities.
Let’s dive in!
What Is LlamaIndex?
LlamaIndex is an open-source Python framework for building applications fueled by LLMs. It serves as a bridge between unstructured data and LLMs. In particular, it makes it easy to orchestrate LLM workflows across a variety of data sources.
With LlamaIndex, you can craft production-ready AI workflows and agents. These can search for and retrieve relevant information, synthesize insights, generate detailed reports, take automated actions, and much more.
As of this writing, it is one of the fastest-growing libraries in the AI ecosystem, with over 42k stars on GitHub.
Why Integrate Web Searching Data into Your LlamaIndex AI Agent
Compared to other AI agent frameworks, LlamaIndex has been created to solve one of the biggest limitations of LLMs. That is their lack of up-to-date, real-world knowledge.
To address that issue, LlamaIndex provides integrations with several data connectors that let you ingest content from multiple sources. Now, you might wonder: which is the most valuable data source for an AI agent?
To answer that question, it helps to consider what data sources are used to train LLMs. Successful LLMs received most of their training data from the web, the largest and most diverse source of public data.
If you want your LlamaIndex AI agent to break past its static training data, the key capability it needs is the ability to search the web and learn from what it finds. Thus, your agent should be able to extract structured information from the resulting search pages (called “SERPs”). Then, meaningfully process and learn from them.
The challenge is that SERP scraping has become much more difficult due to Google’s recent crackdowns on simple scraping scripts. Here is why you need a tool that integrates with LlamaIndex and simplifies this process. That is where LlamaIndex’s Bright Data integration comes in!
Bright Data handles the complex work of SERP scraping. Through its search_engine
tool, it lets your LlamaIndex agent perform search queries and receive structured results in Markdown or JSON format.
This is what your AI agent needs to stay prepared to answer questions, both now and in the future. See how this integration works in the next chapter!
Build a LlamaIndex Agent That Can Search the Web Using Bright Data Tools
In this step-by-step guide, you will see how to build a Python AI agent with LlamaIndex that can search the web.
By integrating with Bright Data, you will enable your agent to access fresh, contextual, rich web search data. For more details, refer to our official documentation.
Follow the steps below to create your Bright Data-powered AI SERP agent using LlamaIndex!
Prerequisites
To follow along with this tutorial, you need the following:
- Python 3.9 or higher installed on your machine (we recommend using the latest version).
- A Bright Data API key to integrate with Bright Data’s SERP APIs.
- An API key from a supported LLM. (In this guide, we will use Gemini, which supports integration via API for free. At the same time, you can use any LLM provider supported by LlamaIndex.)
Do not worry if you do not have a Gemini or Bright Data API key yet. We will show you how to create both in the next steps.
Step #1: Initialize Your Python Project
Start by launching your terminal and creating a new folder for your LlamaIndex AI agent project:
llamaindex-bright-data-serp-agent/
will hold all the code for your AI agent with web searching capabilities powered by Bright Data.
Next, navigate into the project directory and create a Python virtual environment inside it:
Now, open the project folder in your favorite Python IDE. We recommend Visual Studio Code with the Python extension or PyCharm Community Edition.
Create a new file named agent.py
in the root of your project directory. Your project structure should look like this:
In the terminal, activate the virtual environment. On Linux or macOS, run:
Equivalently, on Windows, execute:
In the next steps, you will be guided through installing the required packages. However, if you would like to install everything upfront, run:
Note: We are installing llama-index-llms-google-genai
because this tutorial uses Gemini as the LlamaIndex LLM provider. If you plan to use a different provider, be sure to install the corresponding LLM integration instead.
Good job! Your Python development environment is ready to build an AI agent with Bright Data’s SERP integration using LlamaIndex.
Step #2: Integrate Environment Variables Reading
Your LlamaIndex agent will connect to external services like Gemini and Bright Data via API. For security, you should never hardcode API keys directly into your Python code. Instead, use environment variables to keep them private.
Install the python-dotenv library to make managing environment variables easier. In your activated virtual environment, launch:
Next, open your agent.py
file and add the following lines at the top to load envs from a .env
file:
load_dotenv()
looks for a .env
file in your project’s root directory and loads its values into the environment.
Now, create a .env
file alongside your agent.py
file. Your new project file structure should look like this:
Awesome! You just set up a secure way to manage sensitive API credentials for third-party services.
Continue the initial setup by populating your .env
file with the required environment variables!
Step #3: Configure Bright Data
To connect to the Bright Data SERP APIs in LlamaIndex via the official integration package, you first have to:
- Enable the Web Unlocker solution in your Bright Data dashboard.
- Retrieve your Bright Data API token.
Follow the steps below to complete the setup!
If you do not already have a Bright Data account, [create one](). If you already have an account, log in. In the dashboard, click the “Get proxy products” button:
You will be taken to the “Proxies & Scraping Infrastructure” page:
If you already see an active Web Unlocker API zone (like in the image above), you are all set. Make note of the zone name (for example, unlocker
), as you will use it in your code later.
If you do not have a Web Unlocker zone yet, scroll down to the “Web Unlocker API” section and press the “Create zone” button:
Why use the Web Unlocker API instead of the dedicated SERP API?
Bright Data’s LlamaIndex SERP integration operates through the Web Unlocker API. Specifically, when configured properly, Web Unlocker functions the same way as the dedicated SERP APIs. In short, by setting up a Web Unlocker API zone with the LlamaIndex Bright Data integration, you automatically gain access to the SERP APIs as well.
Give your new zone a name, such as unlocker, enable any advanced features for better performance, and click “Add”:
Once created, you will be redirected to the zone’s configuration page:
Make sure the activation toggle is set to the “Active” status. This confirms that your zone is ready for use.
Next, follow the official Bright Data guide to generate your API key. Once you have your key, store it securely in your .env file like this:
Replace the <YOUR_BRIGHT_DATA_API_KEY>
placeholder with your actual API key value.
Awesome! Configure the Bright Data SERP tool in your LlamaIndex agent script.
Step #4: Access the Bright Data LlamaIndex SERP Tool
In agent.py
, start by loading your Bright Data API key from the environment:
Make sure to import os
from the Python standard library:
In your activated virtual environment, install the LlamaIndex Bright Data tools package:
Next, import the BrightDataToolSpec
class in your agent.py
file:
Create an instance of BrightDataToolSpec
, providing your API key and the name of the Web Unlocker zone:
Replace the zone value with the name of the Web Unlocker API zone you set up earlier (in this case, it is unlocker).
Note that setting verbose=True
is useful while developing. That way, the library will print helpful logs when your LlamaIndex agent makes requests through Bright Data.
Now, BrightDataToolSpec
provides several tools, but here we are focusing on the search_engine
tool. This can query Google, Bing, Yandex, and more, returning results in Markdown or JSON.
To extract just that tool, write:
The array passed to to_tool_list()
acts as a filter, including only the tool named search_engine
.
Note: By default, LlamaIndex will pick the most appropriate tool for a given user request. Thus, tool filtering is not strictly required. Since this tutorial is specifically about integrating Bright Data’s SERP capabilities, it makes sense to limit it to the search_engine
tool for clarity.
Terrific! Bright Data is now integrated and ready to power your LlamaIndex agent with web searching capabilities.
Step #5: Connect an LLM Model
The instructions in this step use Gemini as the LLM provider for this integration. A good reason for choosing Gemini is that it offers free API access to some of its models.
To get started with Gemini in LlamaIndex, install the required integration package:
Next, import the GoogleGenAI class in agent.py
:
Now, initialize the Gemini LLM like this:
In this example, we are using the gemini-2.5-flash
model. Feel free to choose any other supported Gemini model.
Behind the scenes, the GoogleGenAI class automatically looks for an environment variable named GEMINI_API_KEY
. It uses the API key read from that env to connect to the Gemini APIs.
Configure it by opening your .env file and adding:
Replace the <YOUR_GEMINI_API_KEY>
placeholder with your actual Gemini API key. If you do not have one yet, you can get it for free by following the official Gemini API retrieval guide.
Note: If you want to use a different LLM provider, LlamaIndex supports many options. Just refer to the official LlamaIndex docs for setup instructions.
Well done! You now have all the core pieces in place to build a LlamaIndex AI agent that can search the web.
Step #6: Define the LlamaIndex Agent
First, install the main LlamaIndex package:
Next, in your agent.py file, import the FunctionAgent class:
FunctionAgent
is a specialized LlamaIndex AI agent that can interact with external tools, such as the Bright Data SERP tool you set up earlier.
Initialize the agent with your LLM and Bright Data SERP tool like this:
This creates an AI agent that processes user input through your LLM and can call the Bright Data SERP tools to perform real-time web searches when needed. Note the system_prompt argument, which defines the agent’s role and behavior. Again, the verbose=True
flag is useful for inspecting internal activity.
Wonderful! The LlamaIndex + Bright Data SERP integration is complete. The next step is to implement the REPL for interactive use.
Step #7: Build the REPL
REPL, short for “Read-Eval-Print Loop,” is an interactive programming pattern where you enter commands, have them evaluated, and see the results.
In this context, the REPL works as follows:
- You describe the task you want the AI agent to handle.
- The AI agent performs the task, making online searches if required.
- You see the response printed in the terminal.
This loop continues indefinitely until you type "exit"
.
In agent.py
, add this asynchronous function to handle the REPL logic:
This REPL function:
- Accepts user input from the command line via
input()
. - Processes the input using the LlamaIndex agent powered by Gemini and Bright Data through
agent.run()
. - Displays the response back to the console.
Because agent.run()
is asynchronous, the REPL logic must be inside an async
function. Run it like this at the bottom of your file:
Do not forget to import asyncio
:
Here we go! The LlamaIndex AI agent with SERP scraping tools is ready.
Step #8: Put It All Together and Run the AI Agent
This is what your agent.py
file should contain:
Run your LlamaIndex SERP agent with:
When the script starts, you will see a prompt like this in your terminal:
Try asking your agent for something that requires fresh information, for example:
To perform this task effectively, the AI agent needs to search the web for up-to-date information.
The result will be:
That was quite fast, so let’s break down what happened:
- The agent detects the need to search for “new AI protocols” and calls the Bright Data SERP API via the search_engine tool using this input URL:
https://www.google.com/search?q=new%20AI%20protocols&num=10&brd_json=1
. - The tool asynchronously fetches SERP data in JSON format from Bright Data’s Google Search API.
- The agent passes the JSON response to the Gemini LLM.
- Gemini processes the fresh data and generates a clear, accurate Markdown report with relevant links.
In this case, the AI agent returned:
Notice that the AI agent’s response includes recent protocols and up-to-date links published after Gemini’s last training update. This highlights the value of integrating live web search capabilities.
More specifically, the response includes contextual links that closely match what you would find by searching “new ai protocols” on Google:
Notice that the response includes many of the same links you would find in the actual “new AI protocols” SERP (at the time of writing, at least).
Et voilà! You now have a LlamaIndex AI agent with search engine scraping capabilities, powered by Bright Data.
Step #9: Next Steps
The current LlamaIndex SERP AI agent is just a simple example that uses only the search_engine tool from Bright Data.
In more advanced scenarios, you probably do not want to restrict your agent to a single tool. Instead, it is better to give your agent access to all available tools and write a clear system prompt that helps the LLM decide which ones to use for each goal.
For example, you could extend your prompt to go a step further and:
- Perform multiple search queries.
- Select the top N links from the SERP results.
- Visit those pages and scrape their content in Markdown.
- Learn from that info to produce a richer, more detailed output.
For more guidance on integrating with all available tools, see our tutorial on building AI agents with LlamaIndex and Bright Data.
Conclusion
In this article, you learned how to use LlamaIndex to build an AI agent capable of searching the web via Bright Data. This integration allows your agent to run search queries on major search engines, including Google, Bing, Yandex, and many others.
Keep in mind that the example covered here is just a starting point. If you plan to develop more advanced agents, you’ll need robust tools for retrieving, validating, and transforming live web data. That is exactly what Bright Data’s AI infrastructure for agents provides.
Create a free Bright Data account and start exploring our agentic AI data tools today!