True local Retrieval-Augmented Generation for LM Studio. Ingest documents into a semantic vector index, query by meaning (not just keyword), and have relevant chunks automatically injected into every new chat session ā all running locally with no external services.
Keywords: lm studio plugin, local rag, retrieval augmented generation, private document search, offline semantic search, local embeddings, no cloud, no api key
This is what the knowledge-plugin and document-parser-plugin lack: actual semantic search with embeddings.
nomic-embed-text, mxbai-embed-large, all-minilm)To download an embedding model in LM Studio: go to the model browser, filter by "Embedding" under model type.
Load the built plugin folder in LM Studio.
| Field | Default | Description |
|---|---|---|
| Embedding Model Identifier | (blank) | Identifier of the loaded LM Studio embedding model. Blank = use any loaded embedding model |
| Chunk Size (chars) | 800 | Max characters per chunk. Smaller = precise, larger = more context |
| Chunk Overlap (chars) | 150 | Characters shared between adjacent chunks. Preserves context at boundaries |
| Top K Results | 5 | Chunks to retrieve per query |
| Data Path | ~/rag-data/ | Where the vector index is stored |
| Auto-Inject Context | true | Retrieve relevant chunks and prepend to every new session |
| Workspace Path | (blank) | Root for relative file paths. Blank = absolute paths only |
| Max File Size (MB) | 50 | Files over this limit are skipped |
rag_ingestParse, chunk, embed, and index a document or directory.
rag_querySemantic search over the index. Returns ranked chunks with similarity scores.
Returns: chunk text, source file, chunk index, and similarity score (0ā1).
rag_listList all documents in a collection with chunk counts and ingest times.
rag_deleteRemove all chunks for a specific document.
The source_path must match exactly the path used when ingesting.
rag_statsShow total documents, chunks, and per-collection breakdown.
When Auto-Inject Context is enabled, the first message of every session is automatically embedded and matched against the default collection. Relevant chunks are prepended:
The LLM sees both the retrieved context and your question, and can answer based on your actual documents rather than its training data. Disable in plugin settings for manual-only retrieval via rag_query.
Text is split at paragraph boundaries (\n\n) first to preserve semantic units, then accumulated into chunks up to Chunk Size. Adjacent chunks share Chunk Overlap characters to prevent context loss at boundaries. Oversized paragraphs are hard-split as a fallback.
Tuning guide:
Ingest a project wiki:
"Index all my project documentation" ā
rag_ingest(path="/Users/me/wiki/", recursive=true, collection="wiki")
Ask questions about ingested docs:
"What does the SLA say about uptime guarantees?" ā
rag_query(query="SLA uptime guarantee", collection="wiki")ā Returns relevant chunks ā LLM answers based on actual document content
Check what's indexed:
"What documents have I ingested?" ā
rag_stats()+rag_list()
Remove a stale document:
"Remove the old contract from the index" ā
rag_delete(source_path="/Users/me/docs/contract-v1.pdf")
| Model | Size | Quality | Notes |
|---|---|---|---|
nomic-embed-text | 137 MB | Good | Fast, well-rounded |
mxbai-embed-large | 670 MB | Better | Higher accuracy |
all-minilm-l6-v2 | 90 MB | Decent | Smallest, fastest |
bge-large-en-v1.5 | 670 MB | Better | Strong on English text |
Download any of these in LM Studio's model browser (filter: Embedding).
All vectors and metadata are stored as JSON files locally. Nothing is sent externally.
.pdf, .docx, .doc, .xlsx, .xls, .ods, .csv, .pptx, .ppt, .epub, .html, .htm, .json, .jsonl, .txt, .md, and any plain text/source filecd rag-plugin
npm install
npm run build
rag_ingest(path="/Users/me/docs/contract.pdf")
rag_ingest(path="/Users/me/project-docs/", recursive=true, collection="project")
rag_ingest(path="notes.txt", collection="personal", chunk_size=500)
rag_query(query="What are the termination clauses?")
rag_query(query="budget for Q3", collection="finance", top_k=10)
rag_list()
rag_list(collection="project")
rag_delete(source_path="/Users/me/docs/old-contract.pdf")
rag_delete(source_path="/Users/me/notes.txt", collection="personal")
rag_stats()
[Retrieved context from your document index:
[1] (contract.pdf)
The agreement may be terminated by either party with 30 days written notice...
---
[2] (contract.pdf)
In the event of material breach, the non-breaching party may terminate immediately...
]
What are the termination clauses?
~/rag-data/
default/ ā default collection vector index
project/ ā named collection
personal/ ā another named collection