Build a Local Research MCP Server with Web Scraping
Create a private, free research assistant for Claude that searches the web and scrapes content using DuckDuckGo and Python MCP server
What You'll Learn
- How to build an MCP server using FastMCP
- Web scraping with DuckDuckGo Search API
- +3 more
Time & Difficulty
Time: 15 minutes
Level: Intermediate
What You'll Need
- Python 3.10+
- Claude Desktop
- +2 more
Prerequisites
- Python 3.10+
- Claude Desktop installed
- Basic command line knowledge
Build a Local Research MCP Server with Web Scraping
Have you ever wished Claude could access up-to-the-minute information from the web to help you write articles, research topics, or analyze recent events? While Claude’s knowledge is vast, it’s frozen in time. This guide will show you how to build your own simple, 100% free, and completely private research tool for Claude Desktop.
We’ll create a local Model Context Protocol (MCP) server that gives Claude the ability to:
- Perform web searches for any topic
- Read content from top search results
- Use fresh information to answer your questions
Best of all, this runs entirely on your computer, keeping your research private with no API fees.
What You’ll Build
By the end of this guide, you’ll have a working MCP server that:
- Uses DuckDuckGo’s free search API to find relevant web results
- Scrapes and extracts clean text content from web pages
- Integrates seamlessly with Claude Desktop
- Runs locally for complete privacy
- Requires no external API keys or subscriptions
Step 1: Set Up Your Project Environment
First, create a dedicated workspace for your research server.
Open your terminal and create a new project directory:
mkdir local-research-server
cd local-research-server
We’ll use uv, a modern Python package manager, to handle dependencies and virtual environments:
uv init
uv venv
Activate the virtual environment to isolate your project dependencies:
On macOS/Linux:
source .venv/bin/activate
On Windows:
.venv\Scripts\activate
Step 2: Install Required Dependencies
Install the three key libraries that power our research server:
uv add "mcp[cli]" duckduckgo-search trafilatura
Here’s what each library provides:
- mcp[cli]: Official MCP Python SDK with FastMCP framework for simplified server development
- duckduckgo-search: Free web search API without requiring authentication keys
- trafilatura: Intelligent content extraction that removes ads, navigation, and clutter from web pages
Step 3: Create the Research Server
Create a new file called research_server.py and add the following implementation:
# research_server.py
import asyncio
from mcp.server.fastmcp import FastMCP
from duckduckgo_search import DDGS
import trafilatura
# Initialize the MCP server with a descriptive name
mcp = FastMCP("LocalResearcher")
@mcp.tool(
title="Web Research and Content Scraper",
description="Performs a web search for a query, scrapes the clean text from the top results, and returns it as a single block of text."
)
async def research_and_scrape(query: str, num_results: int = 3) -> str:
"""
Searches the web and scrapes content from top results.
Args:
query: The topic or question to research.
num_results: The number of top search results to read (default is 3).
"""
print(f"INFO: Starting research for: '{query}'")
scraped_texts = []
try:
# Use DuckDuckGo to get search results
with DDGS() as ddgs:
results = [r for r in ddgs.text(query, max_results=num_results)]
if not results:
return "Apologies, I couldn't find any web search results for that query."
# Loop through each result and scrape its content
for i, result in enumerate(results):
url = result['href']
print(f"INFO: Scraping ({i+1}/{num_results}): {url}")
try:
# Trafilatura downloads the page and extracts the main article text
downloaded_page = trafilatura.fetch_url(url)
if downloaded_page:
main_text = trafilatura.extract(downloaded_page, favor_precision=True)
if main_text:
# Format the output for Claude
scraped_texts.append(f"--- Source {i+1}: {url} ---\n\n{main_text}\n\n")
# Polite delay between requests
await asyncio.sleep(0.5)
except Exception as e:
# Log errors but continue processing other URLs
print(f"WARN: Failed to scrape {url}. Reason: {e}")
continue
except Exception as e:
return f"An unexpected error occurred during the research process: {e}"
if not scraped_texts:
return "I found search results, but was unable to extract content from any of them."
# Combine all scraped content and return to Claude
print("INFO: Research complete. Returning content to Claude.")
return "".join(scraped_texts)
# Allow the server to be run directly from the command line
if __name__ == "__main__":
mcp.run()
Step 4: Configure Claude Desktop
Now we need to tell Claude Desktop how to connect to your research server.
Locate your Claude Desktop configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%/Claude/claude_desktop_config.json
Add the following configuration to the file. If the file already contains content, add the “local-researcher” entry inside the existing mcpServers object:
{
"mcpServers": {
"local-researcher": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/local-research-server",
"run",
"python",
"research_server.py"
]
}
}
}
Important: Replace /path/to/your/local-research-server with the full absolute path to your project directory. You can get this path by running pwd in your project folder.
Step 5: Test Your Research Server
You’re ready to use your new research capabilities!
- Completely quit and restart Claude Desktop to load the new server configuration
- When Claude reopens, you should see a plug icon (🔌) in the chat input, confirming your server is connected
- Start a new conversation and ask Claude to research something current
Try asking questions like:
- “What are the latest developments in AI research this week?”
- “Research the current state of renewable energy adoption in Europe”
- “Find recent news about cybersecurity threats”
Claude will automatically use your local server to fetch real-time information and provide well-sourced, up-to-date answers.
How It Works
Your MCP server follows these key patterns:
FastMCP Framework: Uses the official MCP Python SDK’s FastMCP framework, which handles all the protocol complexity and lets you focus on tool implementation.
Async Operations: Implements proper async/await patterns for non-blocking web requests and content processing.
Error Handling: Gracefully handles network failures, parsing errors, and individual URL scraping issues without stopping the entire research process.
Content Extraction: Uses Trafilatura’s precision mode to extract clean, readable text while filtering out navigation, ads, and other page clutter.
Privacy-First: All processing happens locally on your machine - no data is sent to third-party services except for the public web searches.
Extending Your Server
Once you have the basic version working, consider these enhancements:
Add More Sources: Integrate RSS feeds, academic databases, or industry-specific APIs alongside web search.
Implement Caching: Store recent search results to avoid re-scraping the same content.
Content Filtering: Add domain allowlists/blocklists or content type filters for specific use cases.
Output Formatting: Structure the scraped content with metadata like publication dates, authors, or content types.
Rate Limiting: Add more sophisticated delays and retry logic for production use.
Security Considerations
When building research tools, keep these security practices in mind:
- Validate Inputs: Always sanitize search queries to prevent injection attacks
- Rate Limiting: Implement delays between requests to be respectful to target websites
- Error Boundaries: Handle network failures gracefully without exposing sensitive information
- Local Operation: Keep processing local to maintain privacy and avoid external dependencies
Business Applications
This research server pattern has numerous business applications:
Market Research: Automated competitor analysis and industry trend monitoring
Content Creation: Research-backed article writing with real-time fact checking
Due Diligence: Company research for investment or partnership decisions
Regulatory Monitoring: Track regulatory changes and compliance requirements
The ROI benefits include 60-80% reduction in manual research time, access to real-time information vs static knowledge, and elimination of expensive research subscriptions.
Next Steps
You now have a powerful, private research assistant integrated directly into Claude Desktop. This server demonstrates core MCP concepts including tool definitions, async operations, and client integration.
Consider exploring other MCP server patterns like document servers for your local files, database connectors for structured data access, or API integrations for business-specific data sources.
The Model Context Protocol opens up endless possibilities for extending AI capabilities while maintaining full control over your data and processes.
Additional Resources
Related Guides
Building Your First MCP Server with Python
A step-by-step tutorial on how to create and run a basic Model Context Protocol (MCP) server using the Python SDK, FastMCP.
Connect Claude to Your Business Files with MCP
Step-by-step guide to setting up Claude AI to read, analyze, and work with your business documents and spreadsheets automatically.
Set Up Your First MCP Email Assistant
Create an AI assistant that can read emails, analyze content, and help you respond faster. Perfect for managing customer inquiries and business communications.
Want More Step-by-Step Guides?
Get weekly implementation guides and practical MCP tutorials delivered to your inbox.
Subscribe for Weekly Guides