Skip to main content
intermediate Featured
Difficulty: 3/5
Published: 6/28/2025
By: UnlockMCP Team

Build a Local Research MCP Server with Web Scraping

Create a private, free research assistant for Claude that searches the web and scrapes content using DuckDuckGo and Python MCP server

What You'll Learn

  • How to build an MCP server using FastMCP
  • Web scraping with DuckDuckGo Search API
  • +3 more

Time & Difficulty

Time: 15 minutes

Level: Intermediate

What You'll Need

  • Python 3.10+
  • Claude Desktop
  • +2 more

Prerequisites

  • Python 3.10+
  • Claude Desktop installed
  • Basic command line knowledge
mcp-server web-scraping research python fastmcp local-development

Build a Local Research MCP Server with Web Scraping

Have you ever wished Claude could access up-to-the-minute information from the web to help you write articles, research topics, or analyze recent events? While Claude’s knowledge is vast, it’s frozen in time. This guide will show you how to build your own simple, 100% free, and completely private research tool for Claude Desktop.

We’ll create a local Model Context Protocol (MCP) server that gives Claude the ability to:

  • Perform web searches for any topic
  • Read content from top search results
  • Use fresh information to answer your questions

Best of all, this runs entirely on your computer, keeping your research private with no API fees.

What You’ll Build

By the end of this guide, you’ll have a working MCP server that:

  • Uses DuckDuckGo’s free search API to find relevant web results
  • Scrapes and extracts clean text content from web pages
  • Integrates seamlessly with Claude Desktop
  • Runs locally for complete privacy
  • Requires no external API keys or subscriptions

Step 1: Set Up Your Project Environment

First, create a dedicated workspace for your research server.

Open your terminal and create a new project directory:

mkdir local-research-server
cd local-research-server

We’ll use uv, a modern Python package manager, to handle dependencies and virtual environments:

uv init
uv venv

Activate the virtual environment to isolate your project dependencies:

On macOS/Linux:

source .venv/bin/activate

On Windows:

.venv\Scripts\activate

Step 2: Install Required Dependencies

Install the three key libraries that power our research server:

uv add "mcp[cli]" duckduckgo-search trafilatura

Here’s what each library provides:

  • mcp[cli]: Official MCP Python SDK with FastMCP framework for simplified server development
  • duckduckgo-search: Free web search API without requiring authentication keys
  • trafilatura: Intelligent content extraction that removes ads, navigation, and clutter from web pages

Step 3: Create the Research Server

Create a new file called research_server.py and add the following implementation:

# research_server.py
import asyncio
from mcp.server.fastmcp import FastMCP
from duckduckgo_search import DDGS
import trafilatura

# Initialize the MCP server with a descriptive name
mcp = FastMCP("LocalResearcher")

@mcp.tool(
    title="Web Research and Content Scraper",
    description="Performs a web search for a query, scrapes the clean text from the top results, and returns it as a single block of text."
)
async def research_and_scrape(query: str, num_results: int = 3) -> str:
    """
    Searches the web and scrapes content from top results.

    Args:
        query: The topic or question to research.
        num_results: The number of top search results to read (default is 3).
    """
    print(f"INFO: Starting research for: '{query}'")
    scraped_texts = []

    try:
        # Use DuckDuckGo to get search results
        with DDGS() as ddgs:
            results = [r for r in ddgs.text(query, max_results=num_results)]

        if not results:
            return "Apologies, I couldn't find any web search results for that query."

        # Loop through each result and scrape its content
        for i, result in enumerate(results):
            url = result['href']
            print(f"INFO: Scraping ({i+1}/{num_results}): {url}")
            try:
                # Trafilatura downloads the page and extracts the main article text
                downloaded_page = trafilatura.fetch_url(url)
                if downloaded_page:
                    main_text = trafilatura.extract(downloaded_page, favor_precision=True)
                    if main_text:
                        # Format the output for Claude
                        scraped_texts.append(f"--- Source {i+1}: {url} ---\n\n{main_text}\n\n")

                # Polite delay between requests
                await asyncio.sleep(0.5)
            except Exception as e:
                # Log errors but continue processing other URLs
                print(f"WARN: Failed to scrape {url}. Reason: {e}")
                continue

    except Exception as e:
        return f"An unexpected error occurred during the research process: {e}"

    if not scraped_texts:
        return "I found search results, but was unable to extract content from any of them."

    # Combine all scraped content and return to Claude
    print("INFO: Research complete. Returning content to Claude.")
    return "".join(scraped_texts)

# Allow the server to be run directly from the command line
if __name__ == "__main__":
    mcp.run()

Step 4: Configure Claude Desktop

Now we need to tell Claude Desktop how to connect to your research server.

Locate your Claude Desktop configuration file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%/Claude/claude_desktop_config.json

Add the following configuration to the file. If the file already contains content, add the “local-researcher” entry inside the existing mcpServers object:

{
  "mcpServers": {
    "local-researcher": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/your/local-research-server",
        "run",
        "python",
        "research_server.py"
      ]
    }
  }
}

Important: Replace /path/to/your/local-research-server with the full absolute path to your project directory. You can get this path by running pwd in your project folder.

Step 5: Test Your Research Server

You’re ready to use your new research capabilities!

  1. Completely quit and restart Claude Desktop to load the new server configuration
  2. When Claude reopens, you should see a plug icon (🔌) in the chat input, confirming your server is connected
  3. Start a new conversation and ask Claude to research something current

Try asking questions like:

  • “What are the latest developments in AI research this week?”
  • “Research the current state of renewable energy adoption in Europe”
  • “Find recent news about cybersecurity threats”

Claude will automatically use your local server to fetch real-time information and provide well-sourced, up-to-date answers.

How It Works

Your MCP server follows these key patterns:

FastMCP Framework: Uses the official MCP Python SDK’s FastMCP framework, which handles all the protocol complexity and lets you focus on tool implementation.

Async Operations: Implements proper async/await patterns for non-blocking web requests and content processing.

Error Handling: Gracefully handles network failures, parsing errors, and individual URL scraping issues without stopping the entire research process.

Content Extraction: Uses Trafilatura’s precision mode to extract clean, readable text while filtering out navigation, ads, and other page clutter.

Privacy-First: All processing happens locally on your machine - no data is sent to third-party services except for the public web searches.

Extending Your Server

Once you have the basic version working, consider these enhancements:

Add More Sources: Integrate RSS feeds, academic databases, or industry-specific APIs alongside web search.

Implement Caching: Store recent search results to avoid re-scraping the same content.

Content Filtering: Add domain allowlists/blocklists or content type filters for specific use cases.

Output Formatting: Structure the scraped content with metadata like publication dates, authors, or content types.

Rate Limiting: Add more sophisticated delays and retry logic for production use.

Security Considerations

When building research tools, keep these security practices in mind:

  • Validate Inputs: Always sanitize search queries to prevent injection attacks
  • Rate Limiting: Implement delays between requests to be respectful to target websites
  • Error Boundaries: Handle network failures gracefully without exposing sensitive information
  • Local Operation: Keep processing local to maintain privacy and avoid external dependencies

Business Applications

This research server pattern has numerous business applications:

Market Research: Automated competitor analysis and industry trend monitoring Content Creation: Research-backed article writing with real-time fact checking
Due Diligence: Company research for investment or partnership decisions Regulatory Monitoring: Track regulatory changes and compliance requirements

The ROI benefits include 60-80% reduction in manual research time, access to real-time information vs static knowledge, and elimination of expensive research subscriptions.

Next Steps

You now have a powerful, private research assistant integrated directly into Claude Desktop. This server demonstrates core MCP concepts including tool definitions, async operations, and client integration.

Consider exploring other MCP server patterns like document servers for your local files, database connectors for structured data access, or API integrations for business-specific data sources.

The Model Context Protocol opens up endless possibilities for extending AI capabilities while maintaining full control over your data and processes.

Related Guides

Want More Step-by-Step Guides?

Get weekly implementation guides and practical MCP tutorials delivered to your inbox.

Subscribe for Weekly Guides