Your Documents, Your Machine: Building a Local RAG with MCP
- Authors

- Name
- Nino
- Occupation
- Senior Tech Editor
Retrieval-Augmented Generation (RAG) has traditionally been viewed as a complex production-grade pipeline. In most enterprise settings, setting up a Local RAG with MCP involves orchestrating vector databases like Milvus or Pinecone, managing embedding APIs, and fine-tuning chunking strategies. While this infrastructure is necessary for scaling to millions of documents, it is often overkill for developers who simply want their LLM to search through their local notes, documentation, or codebases.
With the advent of the Model Context Protocol (MCP), the barrier to entry has vanished. MCP allows you to expose local tools and data sources directly to an LLM like Claude without uploading your sensitive data to the cloud. In this guide, we will demonstrate how to build a high-speed Local RAG with MCP using Python and the Gantz CLI. For developers seeking the most stable and high-performance access to the models powering these systems, using an aggregator like n1n.ai provides the necessary reliability for production-grade local deployments.
Why Local RAG with MCP?
The primary advantage of a Local RAG with MCP is data sovereignty. Your documents stay on your machine. The LLM (via MCP) searches them, finds relevant content, and uses that context to answer. This eliminates the latency and cost associated with cloud-based vector stores.
The Architecture
The workflow of our Local RAG with MCP is straightforward:
- User: Asks a question (e.g., "How do I configure the database?").
- Claude: Recognizes the need for local context and calls an MCP tool.
- MCP Server: Executes a local Python script to search your files.
- Local Files: The script reads relevant snippets from your drive.
- Claude: Receives the snippets and generates a grounded response.
Prerequisites
To follow this tutorial, you will need:
- Python 3.10+ installed.
- The Gantz CLI for managing MCP servers.
- Access to an LLM API (we recommend n1n.ai for its unified access to Claude and GPT-4o).
- A directory of markdown, text, or code files to index.
Step 1: Building the Simple Search Tools
We start with a "grep-style" search. This is incredibly effective for repositories or document sets under 1,000 files where keyword matching is the primary requirement.
search_docs.py
This script performs a case-insensitive search and returns context snippets.
import os, sys, re
from pathlib import Path
DOCS_DIR = os.environ.get('DOCS_DIR', './docs')
def search(query, max_results=5):
results = []
query_lower = query.lower()
for path in Path(DOCS_DIR).rglob('*'):
if path.is_file() and path.suffix in ['.md', '.txt', '.py', '.js']:
try:
content = path.read_text(encoding='utf-8')
if query_lower in content.lower():
lines = content.split('\n')
for i, line in enumerate(lines):
if query_lower in line.lower():
start, end = max(0, i - 5), min(len(lines), i + 6)
results.append({
'file': str(path.relative_to(DOCS_DIR)),
'line': i + 1,
'snippet': '\n'.join(lines[start:end])
})
if len(results) >= max_results: return results
except: continue
return results
if __name__ == '__main__':
query = ' '.join(sys.argv[1:])
for r in search(query):
print(f"## {r['file']} (L{r['line']})\n{r['snippet']}\n---\n")
Step 2: Configuring the MCP Server
Using Gantz, we can define our tools in a gantz.yaml file. This tells the LLM exactly how to interact with our local scripts.
name: local-rag
description: 'Search and read local documents'
tools:
- name: search_docs
description: 'Search documents for a query.'
parameters:
- name: query
type: string
required: true
script:
command: python3
args: ['./scripts/search_docs.py', '{{query}}']
working_dir: '${HOME}/rag-tools'
environment:
DOCS_DIR: '${HOME}/Documents/notes'
Step 3: Implementing Semantic Search (Vector RAG)
For conceptual queries (e.g., "How do I handle errors?" when the word "error" might not be explicitly used), we need vector embeddings. We will use sentence-transformers and faiss for a purely local vector store.
indexing_logic.py
from sentence_transformers import SentenceTransformer
import faiss, pickle, numpy as np
# ... Load docs, chunk them ...
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunks)
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings).astype('float32'))
# Save index and metadata to local disk
Step 4: Connecting the LLM via n1n.ai
To put this into production, use the Anthropic SDK or a unified client. By routing your requests through n1n.ai, you ensure that your Local RAG with MCP remains functional even if a specific provider experiences downtime.
import anthropic
client = anthropic.Anthropic(api_key="YOUR_N1N_API_KEY", base_url="https://api.n1n.ai/v1")
# Call Claude with MCP Toolset
response = client.beta.messages.create(
model="claude-3-5-sonnet",
tools=[{"type": "mcp_toolset", "mcp_server_name": "docs"}],
messages=[{"role": "user", "content": "Summarize my auth logic."}]
)
Comparison: Keyword vs. Vector Search
| Feature | Keyword Search | Vector Search |
|---|---|---|
| Best For | Exact IDs, Error codes, Specific functions | Concepts, Themes, FAQs |
| Speed | Instant | Depends on Index Size |
| Setup | Zero (Scripts only) | Requires Indexing Step |
| Local RAG with MCP Fit | High (Small sets) | High (Large sets) |
Pro Tips for Local RAG with MCP
- Dynamic Re-indexing: Use
watchdogin Python to monitor yourDOCS_DIR. Whenever a file changes, trigger theindex_docs.pyscript automatically so your Local RAG with MCP is always up to date. - Context Window Management: Claude 3.5 Sonnet has a massive context window, but don't waste it. Limit your
search_docsoutput to the top 5 most relevant snippets to keep costs low and responses fast. - Security: Since MCP runs scripts on your machine, ensure the
working_diris restricted and the scripts do not have write access to sensitive system directories.
Conclusion
Building a Local RAG with MCP is the most efficient way to bring the power of state-of-the-art LLMs to your private data. It bypasses the complexity of enterprise cloud infrastructure while maintaining strict privacy. By utilizing n1n.ai as your API gateway, you gain the stability needed to run these tools reliably every day.
Get a free API key at n1n.ai