Big RAG is an LM Studio plugin for local retrieval-augmented generation over large document collections. It scans a configured directory, parses supported files, chunks text, embeds the chunks with an LM Studio embedding model, stores vectors in a sharded Vectra index, and injects relevant passages into chat prompts.
Features
Recursive directory scanning with optional filename exclude patterns.
Sharded Vectra vector storage for large local indexes.
Incremental indexing based on file hashes.
Batched parallel chunk embedding for faster indexing.
npm run build expects LM Studio's generated .lmstudio/entry.ts to exist. If you are working from a raw source checkout without that file, npx tsc still validates TypeScript compilation.
Configuration
Required Settings
Documents Directory: root directory containing files to index.
Vector Store Directory: directory where the sharded vector store and metadata are written.
Embedding
Embedding Model: LM Studio embedding model id. Use one spelling consistently for indexing and retrieval. If you change this value, run a full reindex because vectors from different models are not comparable.
Retrieval Settings
Retrieval Limit: maximum number of passages returned for each query. Default: 5.
Use npm run test when the LM Studio-generated .lmstudio/entry.ts file is present because that script runs the full build first.
Troubleshooting
No results found: confirm indexing completed, lower the affinity threshold, and check that the file type is supported.
License
ISC
Chunker fixed: chunkSize and chunkOverlap are now correctly interpreted as tokens, not words. Chunks match the configured size.
Sentence-boundary chunking: new "Chunking Strategy" setting for higher-quality semantically coherent chunks (slower, better retrieval).
Parallel chunk embedding: chunks are embedded in batches of 8, significantly reducing indexing time for large documents.
VectorStore stats cache: getStats() and hasFile() now use cached or in-memory lookups after load, reducing redundant disk reads per chat turn.
Min chunk length filter: short/noise chunks are discarded before embedding (configurable, default: 20 tokens).
New file formats: .docx, .odt, .csv, .tsv, .json, .jsonl, .yaml, .yml, .rst, and .log are now indexed.
Config change detection: changing documentsDirectory, vectorStoreDirectory, or embeddingModel in settings resets the in-memory store and sanity-check cache.
Max query length: very long user messages are truncated before embedding to match embedding model input limits (configurable, default: 512 tokens).
character
Min Chunk Length: chunks shorter than this estimated-token count are discarded before embedding. Set to 0 to keep all chunks. Default: 20.
BIG_RAG_EMBEDDING_MODEL
BIG_RAG_CHUNK_SIZE
BIG_RAG_CHUNK_OVERLAP
BIG_RAG_CHUNKING_STRATEGY (character or sentence)
BIG_RAG_MIN_CHUNK_LENGTH
BIG_RAG_MAX_CONCURRENT
BIG_RAG_ENABLE_OCR
BIG_RAG_PARSE_DELAY_MS
BIG_RAG_EXCLUDE_PATTERNS (semicolon-separated)
BIG_RAG_FAILURE_REPORT_PATH
BIG_RAG_FORCE_REINDEX
Embedding model mismatch: use the model recorded in .big-rag-embedding.json or run a full reindex.
Slow indexing: reduce OCR usage, keep vector store on SSD, and tune maxConcurrentFiles.
Too much retrieval noise: raise retrievalAffinityThreshold, use sentence chunking, or increase minChunkLength.
Context overflow: lower retrieval limit, reduce chunk size, or use a model with a larger context window.
cd big-rag-pluginnpm installnpm run buildnpm run dev