Project Files
README.md
A production-grade, fully local MCP server for LM Studio. Semantic document retrieval + live web search โ all on your machine, no cloud needed.
| Capability | Details |
|---|---|
| 11 embedding models | From fast MiniLM to Stella-1.5B / E5-Mistral โ switchable with one flag |
| 4 chunking strategies | paragraph ยท sentence ยท semantic ยท fixed |
| Hybrid retrieval | BM25+ sparse + dense cosine, fused via Reciprocal Rank Fusion |
| Cross-encoder reranking | MiniLM-L6 / L12 / Electra โ dramatically improves precision |
| HyDE | Hypothetical Document Embedding โ better recall on abstract queries |
| 10 file formats | PDF ยท MD ยท TXT ยท HTML ยท DOCX ยท EPUB ยท CSV ยท JSON |
| 4 web search providers | DuckDuckGo ยท Brave ยท Tavily ยท Serper |
| Live re-indexing | Add documents and re-index without restarting the server |
| Thread-safe TTL cache | Dedup repeated queries; configurable window |
| Key | Tier | Dim | Highlight |
|---|---|---|---|
minilm-l6 | โก fast | 384 | Smallest, lowest RAM |
minilm-l12 | โก fast | 384 | Better than L6, same size |
mpnet | โ balanced | 768 | Strong general-purpose |
gte-base | โ balanced | 768 | Top BEIR at this size |
bge-base | โ balanced | 768 | Default โ top MTEB |
bge-large | ๐ฅ powerful | 1024 | Best English BGE |
gte-large | ๐ฅ powerful | 1024 | Great on long docs |
e5-large | ๐ฅ powerful | 1024 | Instruction-aware, top BEIR |
nomic-v1.5 | ๐ฅ powerful | 768 | 8192-token context |
jina-v3 | ๐ฅ powerful | 1024 | Multilingual ยท code + text |
e5-mistral | ๐ฅ powerful | 4096 | Highest quality ยท needs GPU |
Switching models is safe โ each model gets its own ChromaDB collection automatically. Delete
.chroma_dbto force a full re-index.
| Strategy | Best for | Notes |
|---|---|---|
paragraph | General documents | Default โ fast, structure-aware |
sentence | Dense prose, QA | Requires nltk |
semantic | Mixed-topic documents | Slowest ingest, highest quality |
fixed | Last resort | No structure awareness |
| Key | Speed | Quality |
|---|---|---|
ms-marco-MiniLM-L-6-v2 | Fastest | Good |
ms-marco-MiniLM-L-12-v2 | Balanced | Better |
ms-marco-electra-base | Slowest | Best |
rag_searchSearch the local document knowledge base.
web_searchSearch the live web (requires --web-search).
reindex_documentsRe-scan and index new documents without restarting.
| Problem | Fix |
|---|---|
ModuleNotFoundError: frontmatter | Wrong package. Install python-frontmatter, not frontmatter. |
ValueError: Unknown embedding model | Run --list-models to see valid keys. |
| HyDE does nothing | LM Studio not running at port 1234. Check --hyde-url or disable --hyde. |
| Reranker is slow | Expected on CPU with large pools. Use a smaller --reranker-model. |
| No results returned | Lower --min-score to 0.0 or check that ingest completed at startup. |
| EPUB / DOCX not extracted | Install optional deps: pip install ebooklib python-docx |
stella-en-1.5b | ๐ฅ powerful | 1024 | MTEB SOTA ยท ~6 GB RAM |
.chroma_db/http://localhost:1234 (default). It silently falls back to the original query if the LLM is unreachable.--embedding-model.mcp-rag-server/
โโโ server.py โ Entry point โ run this
โโโ config.py โ All constants + embedding model registry
โโโ requirements.txt
โโโ README.md
โโโ src/
โโโ ingestion/
โ โโโ extractors.py โ PDF, MD, HTML, DOCX, EPUB, CSV, JSON
โ โโโ chunker.py โ paragraph / sentence / semantic / fixed
โโโ retrieval/
โ โโโ engine.py โ Unified RAG pipeline orchestrator
โ โโโ bm25.py โ BM25+ sparse index (zero dependencies)
โ โโโ fusion.py โ RRF + CombSUM result fusion
โ โโโ reranker.py โ Cross-encoder reranking
โ โโโ hyde.py โ Hypothetical Document Embedding
โโโ search/
โ โโโ providers.py โ DuckDuckGo / Brave / Tavily / Serper
โโโ tools/
โ โโโ rag_tool.py โ rag_search MCP tool
โ โโโ web_tool.py โ web_search MCP tool
โ โโโ ingest_tool.py โ reindex_documents MCP tool
โโโ utils/
โโโ cache.py โ Thread-safe TTL cache
โโโ logging.py โ Structured stderr logging
pip install -r requirements.txt
python server.py --docs /path/to/your/documents
python server.py \
--docs /path/to/your/documents \
--embedding-model bge-large \
--chunk-strategy sentence \
--reranker \
--hyde \
--web-search \
--search-provider tavily \
--tavily-api-key YOUR_KEY
python server.py --list-models
python server.py --list-rerankers
User Query
โ
โโโ[HyDE]โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LLM generates a hypothetical passage that would answer the query. โ
โ We embed that passage instead of the raw query. โ
โ (Greatly improves recall for abstract / jargon-heavy queries) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโผโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Dense Retrieval โ โ BM25+ Sparse Retrieval โ
โ ChromaDB cosine sim โ โ Keyword index (in-memory) โ
โ top Kร4 candidates โ โ top Kร4 candidates โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ Reciprocal Rank โ
โ Fusion (RRF) โ
โ Combines both lists โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ Cross-Encoder โ
โ Reranking (optional) โ
โ (query, chunk) pairs โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ Top-K Results โ
โ with source + scores โ
โโโโโโโโโโโโโโโโโโโโโโโโโ
{
"query": "What is the return policy?",
"top_k": 5,
"min_score": 0.3
}
{
"query": "latest Python 3.13 features",
"max_results": 5
}
{}
{
"mcpServers": {
"rag": {
"command": "python",
"args": [
"/absolute/path/to/mcp-rag-server/server.py",
"--docs", "/absolute/path/to/your/documents"
]
}
}
}
{
"mcpServers": {
"rag": {
"command": "python",
"args": [
"/absolute/path/to/mcp-rag-server/server.py",
"--docs", "/absolute/path/to/your/documents",
"--embedding-model", "bge-large",
"--chunk-strategy", "sentence",
"--reranker",
"--hyde",
"--web-search",
"--search-provider", "duckduckgo"
]
}
}
}
{
"mcpServers": {
"rag": {
"command": "python",
"args": [
"/absolute/path/to/mcp-rag-server/server.py",
"--docs", "/absolute/path/to/your/documents",
"--embedding-model", "nomic-v1.5",
"--chunk-size", "1024",
"--chunk-overlap", "128",
"--reranker"
]
}
}
}
Documents:
--docs PATH Folder to index (required)
--collection NAME ChromaDB collection prefix
Chunking:
--chunk-strategy paragraph | sentence | semantic | fixed
--chunk-size INT Target words per chunk (default: 512)
--chunk-overlap INT Overlap between chunks (default: 64)
Embedding:
--embedding-model KEY Model key (default: bge-base)
--list-models Print all models and exit
Retrieval:
--top-k INT Results per query (default: 5)
--min-score FLOAT Similarity threshold (default: 0.0)
--no-hybrid Dense-only; disable BM25+
Reranking:
--reranker Enable cross-encoder reranking
--reranker-model KEY Reranker key (default: ms-marco-MiniLM-L-6-v2)
--list-rerankers Print all rerankers and exit
HyDE:
--hyde Enable Hypothetical Document Embedding
--hyde-url URL Override LM Studio completions URL
Web search:
--web-search Enable web_search tool
--search-provider duckduckgo | brave | tavily | serper
--brave-api-key KEY
--tavily-api-key KEY
--serper-api-key KEY
--web-max-results INT Default results per web query (default: 5)