MCP RAG Server

A production-grade, fully local MCP server for LM Studio. Semantic document retrieval + live web search — all on your machine, no cloud needed.

Features

Capability	Details
11 embedding models	From fast MiniLM to Stella-1.5B / E5-Mistral — switchable with one flag
4 chunking strategies	`paragraph` · `sentence` · `semantic` · `fixed`
Hybrid retrieval	BM25+ sparse + dense cosine, fused via Reciprocal Rank Fusion
Cross-encoder reranking	MiniLM-L6 / L12 / Electra — dramatically improves precision
HyDE	Hypothetical Document Embedding — better recall on abstract queries
10 file formats	PDF · MD · TXT · HTML · DOCX · EPUB · CSV · JSON
4 web search providers	DuckDuckGo · Brave · Tavily · Serper
Live re-indexing	Add documents and re-index without restarting the server
Thread-safe TTL cache	Dedup repeated queries; configurable window

Project Structure

Quick Start

1. Install

2. Run (minimal)

3. Run (maximum power)

Embedding Models

Key	Tier	Dim	Highlight
`minilm-l6`	⚡ fast	384	Smallest, lowest RAM
`minilm-l12`	⚡ fast	384	Better than L6, same size
`mpnet`	⚖ balanced	768	Strong general-purpose
`gte-base`	⚖ balanced	768	Top BEIR at this size
`bge-base`	⚖ balanced	768	Default — top MTEB
`bge-large`	🔥 powerful	1024	Best English BGE
`gte-large`	🔥 powerful	1024	Great on long docs
`e5-large`	🔥 powerful	1024	Instruction-aware, top BEIR
`nomic-v1.5`	🔥 powerful	768	8192-token context
`jina-v3`	🔥 powerful	1024	Multilingual · code + text
`e5-mistral`	🔥 powerful	4096	Highest quality · needs GPU

Switching models is safe — each model gets its own ChromaDB collection automatically. Delete .chroma_db to force a full re-index.

Chunking Strategies

Strategy	Best for	Notes
`paragraph`	General documents	Default — fast, structure-aware
`sentence`	Dense prose, QA	Requires `nltk`
`semantic`	Mixed-topic documents	Slowest ingest, highest quality
`fixed`	Last resort	No structure awareness

Rerankers

Key	Speed	Quality
`ms-marco-MiniLM-L-6-v2`	Fastest	Good
`ms-marco-MiniLM-L-12-v2`	Balanced	Better
`ms-marco-electra-base`	Slowest	Best

How the Retrieval Pipeline Works

MCP Tools Exposed

`rag_search`

Search the local document knowledge base.

`web_search`

Search the live web (requires --web-search).

`reindex_documents`

Re-scan and index new documents without restarting.

LM Studio Configuration

RAG only (recommended default)

Full power (bge-large + reranker + web search)

Long documents (nomic 8192-token context)

All CLI Flags

Notes

No cloud. Everything runs locally. Embeddings, BM25, reranking — all on your machine.
ChromaDB persists in next to your docs folder. Safe to delete to re-index from scratch.

Troubleshooting

Problem	Fix
`ModuleNotFoundError: frontmatter`	Wrong package. Install `python-frontmatter`, not `frontmatter`.
`ValueError: Unknown embedding model`	Run `--list-models` to see valid keys.
HyDE does nothing	LM Studio not running at port 1234. Check `--hyde-url` or disable `--hyde`.
Reranker is slow	Expected on CPU with large pools. Use a smaller `--reranker-model`.
No results returned	Lower `--min-score` to 0.0 or check that ingest completed at startup.
EPUB / DOCX not extracted	Install optional deps: `pip install ebooklib python-docx`

User Query │ ├──[HyDE]────────────────────────────────────────────────────────────────┐ │ LLM generates a hypothetical passage that would answer the query. │ │ We embed that passage instead of the raw query. │ │ (Greatly improves recall for abstract / jargon-heavy queries) │ └────────────────────────────────────────────────────────────────────────┘ │ ┌────▼──────────────────┐ ┌──────────────────────────────┐ │ Dense Retrieval │ │ BM25+ Sparse Retrieval │ │ ChromaDB cosine sim │ │ Keyword index (in-memory) │ │ top K×4 candidates │ │ top K×4 candidates │ └────────────┬──────────┘ └──────────────┬───────────────┘ │ │ └──────────────┬────────────────┘ │ ┌───────────▼───────────┐ │ Reciprocal Rank │ │ Fusion (RRF) │ │ Combines both lists │ └───────────┬───────────┘ │ ┌───────────▼───────────┐ │ Cross-Encoder │ │ Reranking (optional) │ │ (query, chunk) pairs │ └───────────┬───────────┘ │ ┌───────────▼───────────┐ │ Top-K Results │ │ with source + scores │ └───────────────────────┘

cyper-rag-server