Dify & Bright Data to Search the Web: Build a Scraping AI Agent

In this guide, you will learn:

Why Dify is a powerful platform for building AI agents.
Why web-searching capabilities are indispensable for AI agents.
How to create an AI agent in Dify that can search the Web.

Let’s dive in!

Unlocking Agentic Workflow Automation Development with Dify

Dify is an innovative, low-code/no-code platform engineered to simplify the creation of LLM applications. You can use it either on the cloud or in open-source version, and it supports agentic workflows.

It provides an intuitive visual editor, enabling you to easily build and manage complex AI logic with drag-and-drop functionality. Dify works with a wide range of LLMs, from proprietary to open-source, giving you the flexibility to choose the best model for your project.

Functioning as a BaaS (Backend-as-a-Service), it handles the AI infrastructure for you. Plus, it comes with support for extensions and plugins to further improve its capabilities. That opens the door to expanded functionalities within your AI applications via third-party integrations.

Why AI Agents Should Be Able to Search the Web

The ability of AI agents to search the Web is a fundamental necessity for achieving intelligent and up-to-date responses. Early iterations of LLMs like ChatGPT and Gemini often struggled with providing current or niche information. That is because they were limited by the static nature of their training data.

An important leap in their accuracy occurred precisely when they were equipped with the ability to search the web.

ChatGPT while searching the web to respond to a prompt

This capability allows LLM to pull information on demand, with the end goal of expanding their knowledge base to reduce hallucinations.

At the same time, built-in web-searching features for LLMs are typically exclusive to paid models. Plus, simply “searching the web” is not enough. The reason is that the sheer volume and unverified nature of Internet data can still lead to inaccuracies or irrelevant results.

The true power lies in having access to trusted and verified SERP (Search Engine Results Page) data directly from reliable search engines like Google, Bing, DuckDuckGo, and similar. That data is shaped by sophisticated ranking algorithms that include rigorous quality checks.

As a result, SERP data guarantees a far more reliable foundation for AI agents to synthesize information and generate well-informed responses. Here is why a common use case in AI is building a RAG-based chatbot that leverages SERP data.

To provide a Dify AI agent workflow with SERP data, you can use the Bright Data Dify plugin. Among the tools it offers is one called “Search Engine.” This delivers real-time search results from Google, Bing, Yandex, and other major search engines by connecting to the Bright Data SERP API.

Thanks to this integration, your no-code AI agents can tap into the vastness of the Web while benefiting from the credibility of trusted search engines.

Building an AI Agent That Can Search the Web in Dify: Step-by-Step Tutorial

In this guided section, you will build an AI agent workflow that:

Accepts a keyphrase as input.
Uses the “Search Engine” tool from the Bright Data plugin to search Google using that keyphrase.
Processes the search results with an LLM.

This entire process is fully visual, with no coding required. You will connect each node through a simple drag-and-drop interface to bring your AI agent to life.

Let’s now build your no-code, Bright Data-powered web-searching AI workflow in Dify!

Prerequisites

To follow this tutorial on building a web-searching AI agent in Dify, you will need the following:

A Dify account (a free plan is enough).
A Bright Data API key.

If you do not have these yet, use the links above and follow the setup instructions.

Note: For production use, you will also need an API key from an LLM provider (such as OpenAI, Anthropic, or Gemini).

Step #1: Set Up an LLM Integration in Dify

To use an LLM in Dify, you first need to set up the LLM integration. Start by clicking your profile image and selecting “Settings”:

Next, navigate to the “Model Provider” page. Here, for example, you can install the OpenAI provider plugin:

By default, you get 200 free message credits. To remove this limitation, after installing the plugin, configure your OpenAI settings by adding your OpenAI API key:

Alternatively, for a free, permanent LLM integration, consider using the Gemini LLM provider. Some Gemini models, like Flash 2.0, are free to use even via APIs.

Great! You are now ready to start building your Dify AI workflow with web-searching capabilities.

Step #2: Install the Bright Data Plugin

Visit the releases page on the GitHub repository for the Bright Data plugin and download the file named brightdata_plugin.difypkg.

To install it in Dify, click on “PLUGINS” to open the plugin marketplace, then select “Install from Local Package File”:

Loading the latest Bright Data plugin for Dify

Choose the local .difypkg file you downloaded earlier and click the “Install” button:

Installing the Bright Data Web Scraper plugin

That is it! The Bright Data plugin is now successfully installed in Dify.

Step #3: Devise Your New Dify Application

Now that everything is set up, you are ready to start building your AI agent. From the Dify workspace homepage, create a new application by selecting “Create from Blank” as shown below:

Selecting the “Create from Blank” option

Then, choose “Workflow” as the application type, give your AI application a name, and click “Create”:

This will generate a new, blank workflow canvas:

Before jumping into building your no-code AI agent, take a moment to outline what the agent should do and which nodes you’ll need. For this tutorial, you can achieve the goal via a simple four-step workflow with the following nodes:

A “Start” node to define the input variable (the keyphrase).
A “Search Engine” node to search the web using that keyphrase.
An “LLM” node to analyze the search results and extract useful insights using a custom prompt.
An “End” node to display the final AI-generated report.

Awesome! Time to implement your web-searching AI workflow in Dify.

Step #4: Configure the “Start” Node

Begin by clicking on the “Start” node, then select “INPUT FIELD”:

Set the “Field Type” to “Short Text”, since you will be entering a short text query as input. Name the input field search_topic. This represents the keyphrase that the AI agent will use to perform the web search.

Click “Save” to confirm:

Configuring the search_topic input variable

Good! The “Start” node is now properly configured.

Step #5: Integrate the “Search Engine” Node

Continue by clicking the “+” icon from the “Start” node. Then go to “Tools” > “Bright Data Web Scraper” > “Search Engine”:

This Bright Data plugin node serves as the bridge between your Dify workflow and the Bright Data AI infrastructure. Specifically, the “Search Engine” tool enables your AI agent to retrieve real-time search results directly from the web.

Now, click on “Authorize” and enter your Bright Data API token:

Once authorized, the Bright Data plugin will be connected to your account.

Now, pass the input variable you configured earlier. In the “Search Query” field, type “/” to view available variables, and select search_topic. The “Search Engine” node will perform a live web search based on user input:

Setting the input of the “Search Engine" node

Finally, in the “SEARCH ENGINE” dropdown, choose the search engine you would like to use (for this tutorial, we will go with Google):

Choosing a search engine among the listed ones

Terrific! The Bright Data “Search Engine” node is now in place.

Step #6: Add the “LLM” Node

From the “Search Engine” node, click the “+” icon and select the “LLM” node:

In the “MODEL” section, click “Configure model” and choose an LLM from the list (for example, gpt-4):

In the “SYSTEM” section, enter a prompt like the following one:

You are an expert SEO analyst. You have been given data containing the results from a Google search.

Based on this information, please report on the following:
- Common Talking Points: What are the most common themes and keywords you see repeated in the search result titles and descriptions?
- Dominant Content Types: Based on the titles, what kind of articles seem to be ranking? (e.g., "What is...", "Top 10...", "Beginner's Guide...", "Vs...").

Also, report the URLs.

Search Results Data:
{{Search_engine.text}}

This prompt instructs the LLM to:

Analyze the search results returned by the “Search Engine” node.
Extract recurring themes, popular content formats, and associated URLs—acting as an SEO analyst.

Note: The variable {{Search_engine.text}} passes the text output from the “Search Engine” node directly into the LLM prompt. In other words, the LLM has access to the real-time web search data returned by the “Search Engine” node.

Below is what the “LLM” node configuration will look like:

Fantastic! It only remains to add the last node to the workflow.

Step #7: Finalize the AI Workflow with an “End” Node

Complete your workflow by adding an “End” node:

This node will return the final output generated by the LLM. To configure that behavior, click on the “OUTPUT VARIABLE” section and select the text variable from the LLM node:

This setup ensures that the final response from your LLM (based on live search engine results) is returned as the output of the entire workflow.

Step #8: Run the AI Web Searching Workflow

This is your final web searching AI workflow in Dify, powered by Bright Data’s “Search Engine” tool:

To run the workflow, click on the “Run” button. In the input field for search_topic, type in the topic you want to research (e.g., “new AI protocols“). Then, press “Start Run” to launch the agent:

The workflow will now start. The Bright Data node will perform a live Google search, and the LLM node will receive the results and generate the summary as instructed.

The final output will appear in the “Result” tab. It might look something like this:

Below is the result as text:

Common Talking Points: The most frequently mentioned themes and keywords in the search results are "AI protocols", "Model Context Protocol (MCP)", "Agent2Agent (A2A) protocol", "Agent Communication Protocol (ACP)", "AI integration", "AI agent communications", "non-deterministic behavior", "secure, two-way connections", "data sources", and "AI-powered tools". These terms suggest a focus on new methodologies and standards in AI technology, particularly in terms of communication and integration.

Dominant Content Types: The search results seem to include a mix of explanatory articles, guides, and news updates. There are multiple "What is..." type articles, explaining terms like MCP, A2A, and ACP. "A developer's guide to AI protocols..." and "What Every AI Engineer Should Know About A2A, MCP &..." are examples of guide-type articles, while titles like "Introducing the Model Context Protocol Anthropic" and "AI Will Be Governed by Protocols No One Has Agreed on yet" suggest news updates or announcements.
URLs:
1. https://www.anthropic.com/news/model-context-protocol
2. https://www.infoworld.com/article/4007686/a-developers-guide-to-ai-protocols-mcp-a2a-and-acp.html
3. https://www.businessinsider.com/ai-protocol-rules-future-2025-6
4. https://www.cio.com/article/3991302/ai-protocols-set-standards-for-scalable-results.html
5. https://www.forbes.com/sites/craigsmith/2025/04/07/how-a-simple-protocol-is-changing-everything-about-ai/
6. https://hackernoon.com/mcp-a2a-agp-acp-making-sense-of-the-new-ai-protocols
7. https://www.youtube.com/watch?v=rmphqjsc4Po
8. https://www.youtube.com/watch?v=CQywdSdi5iA
9. https://www.youtube.com/watch?v=TQXG4r0U2PQ
10. https://techstrong.ai/aiops/model-context-protocol-the-new-standard-for-ai-interoperability/
11. https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
12. https://www.axios.com/2025/04/17/model-context-protocol-anthropic-open-source

As instructed, the LLM model reported the results as you prompted:

Identified common talking points such as “Model Context Protocol (MCP)” and “Agent-to-Agent (A2A) Protocol.”
Highlighted dominant content types, including developer guides and informational articles.
Listed relevant URLs for further reading.

Et voilà! You have successfully built an AI agent that can search the web for real-time information and provide custom insights.

Conclusion

In this article, you learned how to use Dify to build a no-code AI workflow capable of searching the web. This functionality is made possible thanks to the Bright Data Dify plugin, which provides a “Search Engine” tool that retrieves real-time SERP data from major search engines.

While this was just one example, many other use cases are possible. Regardless of your specific AI workflow goals, effective agents need access to tools for retrieving, validating, and transforming web data. That is precisely what Bright Data’s AI infrastructure provides.

Create a free Bright Data account and start experimenting with our AI-ready data tools today!

Start free trial

Start free with Google

Federico Trotta

Technical Writer

3 years experience

Federico Trotta is a technical writer, editor, and data scientist. Expert in technical content management, data analysis, machine learning, and Python development.

Expertise

Data Analysis AI Web Scraping

View all articles

Using Dify and Bright Data for Web Search