Blog / AI
AI

Using LangChain and Bright Data for Web Search

Discover how to supercharge your AI app with integrated web search using LangChain and Bright Data for up-to-date results and smarter agents.
12 min read
Using Langchain and Bright Data for Web Search blog image

Building AI agents is getting easier by the day. In this piece, we’ll go through the process of using LangChain’s new BrightDataSERP tool. If you’re not familiar with the acronym, SERP stands for “Search Engine Results Page.”

This is tutorial is beginner friendly. All you need is a basic understanding of Python. By the time you finish this guide, you can add the following skills to your toolbox.

  • Perform a basic search using BrightDataSERP
  • Customize your SERP output
  • Clean output for LLM-friendly usage
  • Create an AI agent with search capabilities

Intro: Knowledge Limitations of AI

If you’re familiar enough with LLMs, you already know that their knowledge is static. By the time they’re released to the public, they are done with training and fine-tuning — no more knowledge can be added.

Before OpenAI added search capabilities, ChatGPT had a knowledge cutoff date. LLMs still have cutoff dates, based on their last fine-tuning period. That said, models are capable of using zero shot inference. You can learn more about the overall training process here.

AI models get deployed with a static knowledge base. Through zero-shot inference, models can make sense of new data, but they will not retain the information permanently.

How LangChain Addresses Limitations

LangChain allows us to create tools and connect them to different LLMs. If you can write Python functions, you can let LLMs call those functions — at their own discretion. You give the LLM access to the tool. It does everything else. If you ask it a question it can answer with pretraining, it won’t use the tool. If you ask it a question it doesn’t know, it will use its tools to try and find the answer.

LangChain even offers prebuilt tools for all of the following needs.

  • Search
  • Code
  • Productivity
  • Web Browsing
  • Databases
  • Finance

You can view LangChain’s full list of integrated tools here. We’ve got even better news. Bright Data is one of them!

Using LangChain With Bright Data

Now that we’ve gone over what it does, let’s take a look at how to actually use LangChain with Bright Data. We’ll assume you’ve got a basic familiarity with Python. We’ll walk through what it takes to get your API keys from OpenAI and Bright Data. Before continuing, make sure to go over our web scraping with LangChain and Bright Data guide first.

Prerequisites

For starters, you need to install LangChain’s Bright Data tools. The pip install command below does exactly that.

pip install langchain-brightdata

Next, you need a Bright Data API key and you need an SERP instance called serp. You can sign up for a free trial of our SERP API here. Ensure that your zone is named serp. serp1 will not work. When you’re ready, click the “Add” button and finish setting up the tool.

Adding the SERP Zone

Next, you can get your API key from the dashboard of your new SERP zone.

Finding Your Bright Data API Key

To get your OpenAI key, open their API keys page and click the “Create new secret key” button.

Getting a New OpenAI Key

A Basic Example

We’ll start with just a simple example of how the tooling works. Swap the API key below with your own. The BrightDataSERP class does the heavy lifting here. We just set the configuration and print the results. You don’t normally need .encode("utf-8"), but we experienced some printing issues with Windows and this resolved it.

from langchain_brightdata import BrightDataSERP

api_key = "your-bright-data-api-key"

tool = BrightDataSERP(bright_data_api_key=api_key)

results = tool.invoke("Latest AI News")

print(results.encode("utf-8"))

Here’s a snippet of sample output. If you see this (or something similar), you’re on the right track.

https://api.brightdata.com/request {'zone': 'serp', 'url': 'https://www.google.com/search?q=Latest%20AI%20News&gl=us&hl=en&num=10', 'format': 'raw'} {'Authorization': 'Bearer your-api-key', 'Content-Type': 'application/json'}
b'<!doctype html><html itemscope="" itemtype="http://schema.org/SearchResultsPage" lang="en-MX"><head><meta charset="UTF-8"><meta content="origin" name="referrer"><link href="//www.gstatic.com/images/branding/searchlogo/ico/favicon.ico" rel="icon"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Latest AI News - Google Search</title><script nonce="IBYZiM7epIs5U67-92qXVg">window._hst=Date.now();</script><script nonce="IBYZiM7epIs5U67-92qXVg">
...

Advanced Usage

In the example below, we use kwargs to set a custom configuration with BrightDataSERP. You can view the full documentation on customization here. We set our search type to shop so we get more relevant shopping results.

from langchain_brightdata import BrightDataSERP


api_key = "your-bright-data-api-key"


#initialize the tool
serp_tool = BrightDataSERP(
    bright_data_api_key=api_key,
    search_engine="google",
    country="us",
    language="en",
    results_count=10,
    parse_results=True
)

#perform the search
results = serp_tool.invoke(
    {
        "query": "best electric vehicles",
        "country": "us",
        "language": "en",
        "search_type": "shop",
        "device_type": "mobile",
        "results_count": 15,
    }
)

print(results)

You can customize any of the following to refine your search results.

  • query
  • country
  • language
  • search_type
  • device_type
  • results_count

Creating an AI Agent With Bright Data And OpenAI

Now that you’ve got a basic understanding of how to use BrightDataSERP, let’s see how a real AI agent uses it. We’ll go through the code pieces and then show how it all works as a whole.

The Pieces

There are a couple more things you’ll need to install before we get started.

Install LangChain itself.

pip install langchain

Install OpenAI support for LangChain.

pip install langchain-openai

Install LangGraph to create agents.

pip install langgraph

This one might be a bit of a shock in the age of AI, but we’ll install BeautifulSoup as well. You’ll see why soon enough.

pip install beautifulsoup4

Creating A Search Function

The function below retrieves our search results — much like the example from earlier. After receiving those results, we use BeautifulSoup to pull the text from them. Now, we’ll use far fewer tokens when passing the results into our LLM. All it sees is the site text. We keep \n (newline) characters so the agent can better understand the page layout.

Once we’ve extracted the text, we return it.

#create a function to return only the text from search results
def get_cleaned_search_results(query):

    #initialize the tool
    serp_tool = BrightDataSERP(
        bright_data_api_key=bright_data_api_key,
        search_engine="google",
        country="us",
        language="en",
        results_count=5,
        parse_results=False,
    )

    #get the results
    results = serp_tool.invoke({
        "query": query,
        "country": "us",
        "language": "en",
        "results_count": 5,
    })

    #parse the text the old fashioned way----save on input tokens
    soup = BeautifulSoup(results, "html.parser")

    #return the results but keep the newlines, this lets the model see the layout without all the extra code
    return soup.get_text(separator="\n")

Turning The Function Into a Tool

Now, we’ll use LangChain’s Tool class to wrap the function. This allows our agent to call it as a tool. As you see below, it’s pretty simple. We give it a name and description. We also point the tool to a function with the func argument.

#turn the function into a langchain tool
cleaned_search_tool = Tool.from_function(
    name="CleanedBrightDataSearch",
    func=get_cleaned_search_results,
    description=(
        "Use this tool to retrieve up-to-date Google search results when answering "
        "questions that require recent information, product details, or current events. "
        "Pass in the user's natural-language query."
    ),
)

Creating The Agent

The code below creates our agent. ChatOpenAI creates an LLM instance. We pass our LLM and our tool into create_react_agent() to create the actual agent.

#start the llm
llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key=openai_api_key,
    streaming=False,
    #set the token limit arbitrarily, we used 512 because it's a small task
    max_tokens=512
)


#give the llm access to the tool
agent = create_react_agent(llm, tools=[cleaned_search_tool])

A Boring But Functional UI

Every program needs a runtime, no matter how primitive. Here, we just create a basic terminal setup for the user to interact with the agent. The user inputs a prompt. The prompt gets passed into messages, and then we stream the agent’s output.

#the user can ask the agent anything--like the chatgpt webapp
user_prompt = input("Ask me anything: ")
messages = [{"role": "user", "content": user_prompt}]

#stream the model output, the model should perform searches when necessary
for step in agent.stream({"messages": messages}, stream_mode="values"):
    step["messages"][-1].pretty_print()

Putting It All Together

The Full Code

Here’s our full code example.

from langchain_openai import ChatOpenAI
from langchain_brightdata import BrightDataSERP
from langgraph.prebuilt import create_react_agent
from langchain.tools import Tool
from bs4 import BeautifulSoup

#put your creds here
openai_api_key = "your-openai-api-key"
bright_data_api_key = "your-bright-data-api-key"

#create a function to return only the text from search results
def get_cleaned_search_results(query):

    #initialize the tool
    serp_tool = BrightDataSERP(
        bright_data_api_key=bright_data_api_key,
        search_engine="google",
        country="us",
        language="en",
        results_count=5,
        parse_results=False,
    )

    #get the results
    results = serp_tool.invoke({
        "query": query,
        "country": "us",
        "language": "en",
        "results_count": 5,
    })

    #parse the text the old fashioned way----save on input tokens
    soup = BeautifulSoup(results, "html.parser")

    #return the results but keep the newlines, this lets the model see the layout without all the extra code
    return soup.get_text(separator="\n")

#turn the function into a langchain tool
cleaned_search_tool = Tool.from_function(
    name="CleanedBrightDataSearch",
    func=get_cleaned_search_results,
    description=(
        "Use this tool to retrieve up-to-date Google search results when answering "
        "questions that require recent information, product details, or current events. "
        "Pass in the user's natural-language query."
    ),
)

#start the llm
llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key=openai_api_key,
    temperature=0.7,
    streaming=False,
    max_tokens=512
)

#give the llm access to the tool
agent = create_react_agent(llm, tools=[cleaned_search_tool])

#the user can ask the agent anything--like the chatgpt webapp
user_prompt = input("Ask me anything: ")
messages = [{"role": "user", "content": user_prompt}]

#stream the model output, the model should perform searches when necessary
for step in agent.stream({"messages": messages}, stream_mode="values"):
    step["messages"][-1].pretty_print()

What Our Agent Sees

The snippet here is what the agent sees. This contains our prompt and the page it fetches for referencing.

python bd-agent-example.py
Ask me anything: give me the latest spacex news
================================ Human Message =================================

give me the latest spacex news
================================== Ai Message ==================================
Tool Calls:
  CleanedBrightDataSearch (call_IKoaponXVrNfVSRTfonU4ewo)
 Call ID: call_IKoaponXVrNfVSRTfonU4ewo
  Args:
    __arg1: latest SpaceX news
https://api.brightdata.com/request {'zone': 'serp', 'url': 'https://www.google.com/search?q=latest%20SpaceX%20news&gl=us&hl=en&num=5', 'format': 'raw'} {'Authorization': 'Bearer d791e32cedf2d9657eaafd7a76b333f67ce5836c89d85691b4d6c07060b07b84', 'Content-Type': 'application/json'}
================================= Tool Message =================================
Name: CleanedBrightDataSearch

latest SpaceX news - Google Search

Please click
here
 if you are not redirected within a few seconds.
Accessibility Links
Skip to main content
Accessibility help
Accessibility feedback


Press
/
 to jump to the search box
latest SpaceX news
















Sign in
Filters and Topics
AI Mode
All
News
Videos
Images
Short videos
Forums
More
About 85,800,000 results
 (0.38 seconds)


Search Results
SpaceX - Updates
SpaceX
https://www.spacex.com
 › updates
SpaceX
https://www.spacex.com
 › updates
As early as this year,
Falcon 9 will launch Dragon's sixth commercial astronaut mission, Fram2
, which will be the first human spaceflight mission to explore ...
Videos
12:03
YouTube
 ·
 GREAT SPACEX
SpaceX's Solution to Launch Starship Again after Test Site ...
YouTube
 ·
 GREAT SPACEX
2 days ago
47:39
YouTube
 ·
 GREAT SPACEX
COPV Destroyed Starship S36, What next? Honda's Hopper ...
YouTube
 ·
 GREAT SPACEX
1 day ago
3:10
YouTube
 ·
 CBS News
Watch: SpaceX Starship explodes, causes massive fiery burst ...
YouTube
 ·
 CBS News
3 days ago
Feedback
View all
Top stories
USA Today
SpaceX Starship exploded again. What's next for Elon Musk's company after latest setback?
3 days ago
Soap Central
Everything to know about Elon Musk's latest SpaceX starship explosion during static fire test in Texas
3 days ago
The Guardian
SpaceX Starship breaks up over Indian Ocean in latest bumpy test
4 weeks ago
CBS News
SpaceX loses contact with its Starship on 9th test flight after last 2 went down in flames
4 weeks ago
More news
Twitter Results
SpaceX (@SpaceX) · X
X (Twitter)
https://x.com/SpaceX
Watch Falcon 9 launch Dragon and Ax-4 to the @Space_Station x.com/i/broadcasts/1YpJ…
2 hours ago
Falcon 9 delivers 27 @Starlink satellites to orbit from Florida
9 hours ago
Deployment of 27 @Starlink satellites confirmed
9 hours ago
Elon Musk promises more risky launches after sixth ...
Space
https://www.space.com
 › ... › Private Spaceflight
Space
https://www.space.com
 › ... › Private Spaceflight
1 day ago
 —
Until
last
 year, the FAA allowed
SpaceX
 to try up to five Starship launches a year. This month, the figure was increased to 25. A lot can go ...
People also search for
Latest spacex news
nasa
Latest spacex news
live
SpaceX
launch today live
SpaceX
Starship
 news
today
SpaceX
Starship launch date
SpaceX
launch tonight
SpaceX
launch today live countdown
SpaceX
recent landing
Page Navigation
1
2
3
4
5
6
7
8
9
10
Next




Footer Links
Google apps

Model Output

In the snippet below, our model has finished reviewing the results. As you can see, we have a clean summary of the search results.

================================== Ai Message ==================================

Here are some of the latest updates on SpaceX:

1. **Falcon 9 Launches**: SpaceX's Falcon 9 recently launched 27 Starlink satellites into orbit from Florida. The deployment of these satellites was confirmed about 9 hours ago.

2. **Starship Setbacks**: SpaceX's Starship program has faced some challenges recently. A Starship exploded during a test, which has been a setback for the company. Despite this, Elon Musk has indicated plans for more risky launches following the sixth astronaut mission.

3. **Increased Launch Capacity**: The FAA has increased the number of Starship launches SpaceX is permitted to conduct per year from 5 to 25, allowing for more frequent test launches.

These developments highlight ongoing progress and challenges within SpaceX's operations.

Conclusion

AI development gets easier all the time. With LangChain and Bright Data, you can use some of the best search engines around — Google, Bing and more! Our example here was pretty minimal, an automated search assistant.

You can take this project to the next level and try adding multiple tools to LangChain. You now know how to create tools, trim SERP results and how to feed them to an AI agent for enhanced output. Take your new skills and go build something.

LangChain also offers integrations with the following tools.

Here at Bright Data, we offer products of every shape and size to fit your data collection needs. Sign up for a free trial and get started today!

Jake Nulty

Technical Writer

6 years experience

Jacob Nulty is a Detroit-based software developer and technical writer exploring AI and human philosophy, with experience in Python, Rust, and blockchain.

Expertise
Data Structures Python Rust