In this guide, we’ll walk through building a local MCP server in Python to scrape Amazon product data on demand. You’ll learn the fundamentals of MCP, how to write and run your own server, and how to connect it to developer tools like Claude Desktop and Cursor IDE. We’ll wrap up with a real-world Bright Data MCP integration for real-time, AI-ready web data.
Let’s dive in.
The Bottleneck: Why LLMs Struggle with Real-World Interaction (and How MCP Solves It)
Large Language Models (LLMs) are incredibly powerful at processing and generating text from massive training datasets. But they come with a key limitation—they can’t natively interact with the real world. That means no access to local files, no running custom scripts, and no fetching live data from the web.
Take a simple example: ask Claude to pull product details from a live Amazon page, and it won’t be able to. Why? Because it lacks the built-in ability to browse the web or trigger external actions.
Without external tooling, LLMs can’t perform practical tasks that rely on real-time data or integration with outside systems.
This is where Anthropic’s Model Context Protocol (MCP) comes in. It lets LLMs talk to external tools—like scrapers, APIs, or scripts—in a secure and standardized way.
Here’s the difference in action. After integrating a custom MCP server, we were able to extract structured Amazon product data directly through Claude:
Don’t worry about how it works just yet—we’ll walk through everything step by step later in the guide.
Why Does MCP Matter?
- Standardization: MCP provided a standardized interface for LLM-based systems to connect with external tools and data—similar to how APIs standardized web integrations. This drastically reduces the need for custom integrations, speeding up development.
- Flexibility and Scalability: Developers can swap out LLMs or hosting platforms without rewriting tool integrations. MCP supports multiple communication transports (such as
stdio
), making it adaptable to different setups. - Enhanced LLM Capabilities: By connecting LLMs to live data and external tools, MCP allows them to go beyond static responses. They can now return current, relevant information and trigger real-world actions based on context.
Analogy: Think of MCP as a USB interface for LLMs. Just like USB allows different devices (keyboards, printers, external drives) to plug into any compatible machine without needing special drivers, MCP lets LLMs connect to a wide range of tools using a standardized protocol—no need for custom integration each time.
What Is Model Context Protocol (MCP)?
Model Context Protocol (MCP) is an open standard developed by Anthropic that lets large language models (LLMs) interact with external tools, APIs, and data sources in a consistent, secure way. It acts as a universal connector, allowing LLMs to perform real-world tasks like scraping websites, querying databases, or triggering scripts.
While Anthropic introduced it, MCP is open and extensible, meaning anyone can implement or contribute to the standard. If you’ve worked with Retrieval-Augmented Generation (RAG), you’ll appreciate the concept. MCP builds on that idea by standardizing interactions through a lightweight JSON-RPC interface so models can access live data and take action.
MCP Architecture: How It Works
At its core, MCP standardizes communication between an AI model and the external capabilities.
Core Idea: A standardized interface (usually JSON-RPC 2.0 over transports like stdio
) allows an LLM (via a client) to discover and invoke tools exposed by external servers.
MCP operates through a client-server architecture with three key components:
- MCP Host: The environment or application that initiates and manages interactions between the LLM and external tools. Examples include AI assistants like Claude Desktop or IDEs like Cursor.
- MCP Client: A component within the host that establishes and maintains connections with MCP Servers, handling the communication protocols and managing data exchange.
- MCP Server: A program (which we developers create) that implements the MCP protocol and exposes a specific set of capabilities. An MCP server might interface with a database, a web service, or, in our case, a website (Amazon). Servers expose their functionality in standardized ways:
- Tools: Callable functions (e.g. scrape_amazon_product, get_weather_data)
- Resources: Read-only endpoints for retrieving static data (e.g. fetch a file, return a JSON record)
- Prompts: Predefined templates to guide LLM interaction with tools and resources
Here’s the MCP architecture diagram:
Image Source: Model Context Protocol
In this setup, the host (Claude Desktop or Cursor IDE) spawns an MCP client, which then connects to an external MCP server. That server exposes tools, resources, and prompts, allowing the AI to interact with them as needed.
In short, the workflow operates as follows:
- The user sends a message like “Fetch product info from this Amazon link.”
- The MCP client checks for a registered tool that can handle that task
- The client sends a structured request to the MCP server
- The MCP server executes the appropriate action (e.g., launching a headless browser)
- The server returns structured results to the MCP client
- The client forwards the results to the LLM, which presents them to the user
Building a Custom MCP Server
Let’s build a Python MCP server to scrape Amazon product pages.
This server will expose two tools: one to download HTML and another to extract structured information. You’ll interact with the server via an LLM client in Cursor or Claude Desktop.
Step 1: Setting Up the Environment
First, ensure you have Python 3 installed. Then, create and activate a virtual environment:
Install the required libraries: the MCP Python SDK, Playwright, and LXML.
This installs:
- mcp: Python SDK for Model Context Protocol servers and clients that handles all the JSON-RPC communication details
- playwright: Browser automation library that provides headless browser capabilities for rendering and scraping JavaScript-heavy websites
- lxml: Fast XML/HTML parsing library that makes it easy to extract specific data elements from web pages using XPath queries
In short, the MCP Python SDK (mcp
) handles all protocol details, letting you expose tools that Claude or Cursor can call via natural-language prompts. Playwright allows us to render web pages completely (including JavaScript content), and lxml gives us powerful HTML parsing capabilities.
Step 2: Initialize the MCP Server
Create a Python file named amazon_scraper_mcp.py
. Start by importing the necessary modules and initializing the FastMCP
server:
This creates an instance of the MCP server. We’ll now add tools to it.
Step 3: Implement the fetch_page
Tool
This tool will take a URL as input, use Playwright to navigate to the page, wait for the content to load, download the HTML, and save it to our temporary file.
This asynchronous function uses Playwright to handle potential JavaScript rendering on Amazon pages. The @mcp.tool()
decorator registers this function as a callable tool within our server.
Step 4: Implement the extract_info
Tool
This tool reads the HTML file saved by fetch_page
, parses it using LXML and XPath selectors, and returns a dictionary containing the extracted product details.
This function uses LXML’s fromstring
to parse the HTML and robust XPath selectors to find the desired elements
Step 5: Run the Server
Finally, add the following lines to the end of your amazon_scraper_mcp.py
script to start the server using the stdio
transport mechanism, which is standard for local MCP servers communicating with clients like Claude Desktop or Cursor.
Complete Code (amazon_scraper_mcp.py
)
Integrating Your Custom MCP Server
Now that the server script is ready, let’s connect it to MCP clients like Claude Desktop and Cursor.
Connecting to Claude Desktop
Step 1: Open Claude Desktop.
Step 2: Navigate to Settings
-> Developer
-> Edit Config
. This will open the claude_desktop_config.json
file in your default text editor.
Step 3: Add an entry for your server under the mcpServers
key. Make sure to replace the path in args
with the absolute path to your amazon_scraper_mcp.py
file.
Step 4: Save the claude_desktop_config.json
file and completely close and reopen Claude Desktop for the changes to take effect.
Step 5: In Claude Desktop, you should now see a small tools icon (like a hammer 🔨) in the chat input area.
Step 6: Clicking it should list your “Amazon Product Scraper” with its fetch_page
and extract_info
tools.
Step 7: Send a Prompt, for example: “Get the current price, original price, and rating for this Amazon product: https://www.amazon.com/dp/B09C13PZX7“.
Step 8: Claude will detect that this requires external tools and prompt you for permission to run fetch_page
first and then extract_info
. Click “Allow for this chat” for each tool.
Step 9: After granting permissions, the MCP server will execute the tools. Claude will then receive the structured data and present it in the chat.
🔥 Great, you’ve successfully built and integrated your first MCP server!
Connecting to Cursor
The process for Cursor (an AI-first IDE) is similar.
Step 1: Open Cursor.
Step 2: Go to Settings
⚙️ and navigate the MCP
section.
Step 3: Click “+Add a new global MCP Server”. This will open the mcp.json
configuration file. Add an entry for your server, again using the absolute path to your script.
Step 4: Save the mcp.json
file and you should see your “amazon_product_scraper” listed, hopefully with a green dot indicating it’s running and connected.
Step 5: Use Cursor’s chat feature (Cmd+l
or Ctrl+l
).
Step 6: Send a Prompt, for example: “Extract all available product data from this Amazon URL: https://www.amazon.com/dp/B09C13PZX7. Format the output as a structured JSON object”.
Step 7: Similar to Claude Desktop, the Cursor will ask for permission to run the fetch_page
and extract_info
tools. Approve these requests (“Run Tool”).
Step 8: The Cursor will display the interaction flow, showing the calls to your MCP tools and finally presenting the structured JSON data returned by your extract_info
tool.
Here’s an example of JSON output from Cursor:
This shows the flexibility of MCP – the same server works seamlessly with different client applications.
Integrating Bright Data’s MCP for AI-Driven Web Data Extraction
Custom MCP servers offer full control but come with challenges, such as managing proxy infrastructure, handling sophisticated anti-bot mechanisms, and ensuring scalability. Bright Data addresses these issues with its production-grade, pre-built MCP solution, designed for seamless integration with AI agents and LLMs.
The Model Context Protocol (MCP) integration with Bright Data provides LLMs and AI Agents with seamless, real-time access to public web data—tailored for AI workflows. By connecting to Bright Data’s MCP, your apps and models can retrieve SERP results from all major search engines, and seamlessly unlock access to hard-to-reach websites.
Bright Data’s Model Context Protocol (MCP) solution connects your application to a suite of powerful web data extraction tools—including the Web Unlocker, SERP API, Web Scraper API, and Scraping Browser—providing a comprehensive infrastructure that:
- Delivers AI-Ready Data: Automatically fetches and formats web content, reducing extra pre-processing steps.
- Ensures Scalability & Reliability: Leverages a robust infrastructure to handle high volumes of requests without compromising performance.
- Bypasses Blocks & CAPTCHAs: Uses advanced anti-bot strategies to navigate and retrieve content from even the most protected websites.
- Offers Global IP Coverage: Uses a vast proxy network spanning 195 countries to access geo-restricted content.
- Simplifies Integration: Minimizes configuration effort by working seamlessly with any MCP clients.
Prerequisites for Bright Data MCP
Before you begin integrating Bright Data MCP, ensure you have the following:
- Bright Data Account: Sign up at brightdata.com. New users receive free credits for testing.
- API Token: Obtain your API token from your Bright Data account settings (User Settings Page).
- Web Unlocker Zone: Create a Web Unlocker proxy zone in your Bright Data control panel. Name it something memorable, like
mcp_unlocker
(you can override this later via environment variables if needed). - (Optional) Scraping Browser Zone: If you need advanced browser automation capabilities (e.g., for complex JavaScript interactions or screenshots), create a Scraping Browser zone. Note the authentication details (Username and Password) provided for this zone (within the Overview tab), usually in the format
brd-customer-ACCOUNT_ID-zone-ZONE_NAME:PASSWORD
.
Quickstart: Configuring Bright Data MCP for Claude Desktop
Step 1: The Bright Data MCP server is typically run using npx
, which comes with Node.js. Install Node.js if you haven’t already from the official website.
Step 2: Open Claude Desktop -> Settings
-> Developer
-> Edit Config
(claude_desktop_config.json
).
Step 3: Add the Bright Data server configuration under mcpServers
. Replace placeholders with your actual credentials.
Step 4: Save the configuration file and restart Claude Desktop.
Step 5: Hover the hammer icon (🔨) in Claude Desktop. You should now see multiple MCP tools.
Let’s try extracting data from Zillow, a site known for potentially blocking scrapers. Prompt claude with “Extract key property data in JSON format from this Zillow URL: https://www.zillow.com/apartments/arverne-ny/the-tides-at-arverne-by-the-sea/ChWHPZ/“
Allow Claude to use the necessary Bright Data MCP tools. Bright Data’s MCP server will handle the underlying complexities (proxy rotation, JavaScript rendering via Scraping Browser if needed).
Bright Data’s server performs the extraction and returns structured data, which Claude presents.
Here’s a snippet of the potential output:
🔥 This is awesome!
Another Example: Hacker News Headlines
A simpler query: “Give me the titles of the latest 5 news articles from Hacker News”.
This showcases how Bright Data’s MCP server simplifies accessing even dynamic or heavily protected web content directly within your AI workflow.
Conclusion
As we’ve explored throughout this guide, Anthropic’s Model Context Protocol represents a fundamental shift in how AI systems interact with the external world. As we’ve seen, you can build custom MCP servers for specific tasks, such as our Amazon scraper. Bright Data’s MCP integration improves this further by offering enterprise-grade web scraping capabilities that bypass anti-bot protections and provide AI-ready structured data.
We’ve also handpicked some of the best resources on AI and large language models (LLMs). Make sure to check them out to learn more in-depth:
- Top Sources for Finding LLM Training Data
- Web Scraping with LLaMA 3: Turn Any Website into Structured JSON
- Web Scraping With LangChain and Bright Data
- How To Create a RAG Chatbot With GPT-4o Using SERP Data
No credit card required