Your Documents, Your Machine: Building a Local RAG with MCP

Authors
  • avatar
    Name
    Nino
    Occupation
    Senior Tech Editor

Retrieval-Augmented Generation (RAG) has traditionally been viewed as a complex production-grade pipeline. In most enterprise settings, setting up a Local RAG with MCP involves orchestrating vector databases like Milvus or Pinecone, managing embedding APIs, and fine-tuning chunking strategies. While this infrastructure is necessary for scaling to millions of documents, it is often overkill for developers who simply want their LLM to search through their local notes, documentation, or codebases.

With the advent of the Model Context Protocol (MCP), the barrier to entry has vanished. MCP allows you to expose local tools and data sources directly to an LLM like Claude without uploading your sensitive data to the cloud. In this guide, we will demonstrate how to build a high-speed Local RAG with MCP using Python and the Gantz CLI. For developers seeking the most stable and high-performance access to the models powering these systems, using an aggregator like n1n.ai provides the necessary reliability for production-grade local deployments.

Why Local RAG with MCP?

The primary advantage of a Local RAG with MCP is data sovereignty. Your documents stay on your machine. The LLM (via MCP) searches them, finds relevant content, and uses that context to answer. This eliminates the latency and cost associated with cloud-based vector stores.

The Architecture

The workflow of our Local RAG with MCP is straightforward:

  1. User: Asks a question (e.g., "How do I configure the database?").
  2. Claude: Recognizes the need for local context and calls an MCP tool.
  3. MCP Server: Executes a local Python script to search your files.
  4. Local Files: The script reads relevant snippets from your drive.
  5. Claude: Receives the snippets and generates a grounded response.

Prerequisites

To follow this tutorial, you will need:

  • Python 3.10+ installed.
  • The Gantz CLI for managing MCP servers.
  • Access to an LLM API (we recommend n1n.ai for its unified access to Claude and GPT-4o).
  • A directory of markdown, text, or code files to index.

Step 1: Building the Simple Search Tools

We start with a "grep-style" search. This is incredibly effective for repositories or document sets under 1,000 files where keyword matching is the primary requirement.

search_docs.py

This script performs a case-insensitive search and returns context snippets.

import os, sys, re
from pathlib import Path

DOCS_DIR = os.environ.get('DOCS_DIR', './docs')

def search(query, max_results=5):
    results = []
    query_lower = query.lower()
    for path in Path(DOCS_DIR).rglob('*'):
        if path.is_file() and path.suffix in ['.md', '.txt', '.py', '.js']:
            try:
                content = path.read_text(encoding='utf-8')
                if query_lower in content.lower():
                    lines = content.split('\n')
                    for i, line in enumerate(lines):
                        if query_lower in line.lower():
                            start, end = max(0, i - 5), min(len(lines), i + 6)
                            results.append({
                                'file': str(path.relative_to(DOCS_DIR)),
                                'line': i + 1,
                                'snippet': '\n'.join(lines[start:end])
                            })
                            if len(results) >= max_results: return results
            except: continue
    return results

if __name__ == '__main__':
    query = ' '.join(sys.argv[1:])
    for r in search(query):
        print(f"## {r['file']} (L{r['line']})\n{r['snippet']}\n---\n")

Step 2: Configuring the MCP Server

Using Gantz, we can define our tools in a gantz.yaml file. This tells the LLM exactly how to interact with our local scripts.

name: local-rag
description: 'Search and read local documents'
tools:
  - name: search_docs
    description: 'Search documents for a query.'
    parameters:
      - name: query
        type: string
        required: true
    script:
      command: python3
      args: ['./scripts/search_docs.py', '{{query}}']
      working_dir: '${HOME}/rag-tools'
    environment:
      DOCS_DIR: '${HOME}/Documents/notes'

Step 3: Implementing Semantic Search (Vector RAG)

For conceptual queries (e.g., "How do I handle errors?" when the word "error" might not be explicitly used), we need vector embeddings. We will use sentence-transformers and faiss for a purely local vector store.

indexing_logic.py

from sentence_transformers import SentenceTransformer
import faiss, pickle, numpy as np

# ... Load docs, chunk them ...
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunks)
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings).astype('float32'))
# Save index and metadata to local disk

Step 4: Connecting the LLM via n1n.ai

To put this into production, use the Anthropic SDK or a unified client. By routing your requests through n1n.ai, you ensure that your Local RAG with MCP remains functional even if a specific provider experiences downtime.

import anthropic
client = anthropic.Anthropic(api_key="YOUR_N1N_API_KEY", base_url="https://api.n1n.ai/v1")

# Call Claude with MCP Toolset
response = client.beta.messages.create(
    model="claude-3-5-sonnet",
    tools=[{"type": "mcp_toolset", "mcp_server_name": "docs"}],
    messages=[{"role": "user", "content": "Summarize my auth logic."}]
)
FeatureKeyword SearchVector Search
Best ForExact IDs, Error codes, Specific functionsConcepts, Themes, FAQs
SpeedInstantDepends on Index Size
SetupZero (Scripts only)Requires Indexing Step
Local RAG with MCP FitHigh (Small sets)High (Large sets)

Pro Tips for Local RAG with MCP

  1. Dynamic Re-indexing: Use watchdog in Python to monitor your DOCS_DIR. Whenever a file changes, trigger the index_docs.py script automatically so your Local RAG with MCP is always up to date.
  2. Context Window Management: Claude 3.5 Sonnet has a massive context window, but don't waste it. Limit your search_docs output to the top 5 most relevant snippets to keep costs low and responses fast.
  3. Security: Since MCP runs scripts on your machine, ensure the working_dir is restricted and the scripts do not have write access to sensitive system directories.

Conclusion

Building a Local RAG with MCP is the most efficient way to bring the power of state-of-the-art LLMs to your private data. It bypasses the complexity of enterprise cloud infrastructure while maintaining strict privacy. By utilizing n1n.ai as your API gateway, you gain the stability needed to run these tools reliably every day.

Get a free API key at n1n.ai