RAG Plugin for LM Studio — Local Semantic Search & Document Q&A

True local Retrieval-Augmented Generation for LM Studio. Ingest documents into a semantic vector index, query by meaning (not just keyword), and have relevant chunks automatically injected into every new chat session — all running locally with no external services.

Keywords: lm studio plugin, local rag, retrieval augmented generation, private document search, offline semantic search, local embeddings, no cloud, no api key

This is what the knowledge-plugin and document-parser-plugin lack: actual semantic search with embeddings.

Who This Is For

Developers and researchers who want to ask questions about their own documents without uploading to any cloud service
Privacy-conscious users who need offline document Q&A with a local LLM
Anyone building a private knowledge base that works with LM Studio's local models
Teams that want semantic search over internal documentation without a SaaS subscription

What It Does

Semantic search — finds chunks by meaning, not keyword matching. "What are the payment terms?" finds relevant text even if it doesn't use those exact words.
True RAG pipeline — chunk → embed → store → retrieve → inject. The complete loop.
Local embeddings — uses a local LM Studio embedding model. Nothing leaves your machine.

Requirements

LM Studio with an embedding model loaded (e.g. nomic-embed-text, mxbai-embed-large, all-minilm)
Node.js 18+

To download an embedding model in LM Studio: go to the model browser, filter by "Embedding" under model type.

Installation

Load the built plugin folder in LM Studio.

Configuration

Field	Default	Description
Embedding Model Identifier	(blank)	Identifier of the loaded LM Studio embedding model. Blank = use any loaded embedding model
Chunk Size (chars)	`800`	Max characters per chunk. Smaller = precise, larger = more context
Chunk Overlap (chars)	`150`	Characters shared between adjacent chunks. Preserves context at boundaries
Top K Results	`5`	Chunks to retrieve per query
Data Path	`~/rag-data/`	Where the vector index is stored
Auto-Inject Context	`true`	Retrieve relevant chunks and prepend to every new session
Workspace Path	(blank)	Root for relative file paths. Blank = absolute paths only
Max File Size (MB)	`50`	Files over this limit are skipped

Tools

`rag_ingest`

Parse, chunk, embed, and index a document or directory.

`rag_query`

Semantic search over the index. Returns ranked chunks with similarity scores.

Returns: chunk text, source file, chunk index, and similarity score (0–1).

`rag_list`

List all documents in a collection with chunk counts and ingest times.

`rag_delete`

Remove all chunks for a specific document.

The source_path must match exactly the path used when ingesting.

`rag_stats`

Show total documents, chunks, and per-collection breakdown.

How Auto-Injection Works

When Auto-Inject Context is enabled, the first message of every session is automatically embedded and matched against the default collection. Relevant chunks are prepended:

The LLM sees both the retrieved context and your question, and can answer based on your actual documents rather than its training data. Disable in plugin settings for manual-only retrieval via rag_query.

Chunking Strategy

Text is split at paragraph boundaries (\n\n) first to preserve semantic units, then accumulated into chunks up to Chunk Size. Adjacent chunks share Chunk Overlap characters to prevent context loss at boundaries. Oversized paragraphs are hard-split as a fallback.

Tuning guide:

Dense technical docs → smaller chunks (400–600), less overlap
Long-form prose or contracts → larger chunks (800–1200), more overlap
Short notes or FAQs → default settings (800/150) work well

Example Workflows

Ingest a project wiki:

"Index all my project documentation" → rag_ingest(path="/Users/me/wiki/", recursive=true, collection="wiki")

Ask questions about ingested docs:

"What does the SLA say about uptime guarantees?" → rag_query(query="SLA uptime guarantee", collection="wiki") → Returns relevant chunks → LLM answers based on actual document content

Check what's indexed:

"What documents have I ingested?" → rag_stats() + rag_list()

Remove a stale document:

"Remove the old contract from the index" → rag_delete(source_path="/Users/me/docs/contract-v1.pdf")

Recommended Embedding Models

Model	Size	Quality	Notes
`nomic-embed-text`	137 MB	Good	Fast, well-rounded
`mxbai-embed-large`	670 MB	Better	Higher accuracy
`all-minilm-l6-v2`	90 MB	Decent	Smallest, fastest
`bge-large-en-v1.5`	670 MB	Better	Strong on English text

Download any of these in LM Studio's model browser (filter: Embedding).

Data Location

All vectors and metadata are stored as JSON files locally. Nothing is sent externally.