Big RAG Plugin for LM Studio

Big RAG is an LM Studio plugin for local retrieval-augmented generation over large document collections. It scans a configured directory, parses supported files, chunks text, embeds the chunks with an LM Studio embedding model, stores vectors in a sharded Vectra index, and injects relevant passages into chat prompts.

Features

Recursive directory scanning with optional filename exclude patterns.
Sharded Vectra vector storage for large local indexes.
Incremental indexing based on file hashes.
Batched parallel chunk embedding for faster indexing.
Character or sentence-boundary chunking.
Configurable retrieval limit, affinity threshold, chunk size, overlap, and minimum chunk length.
Prompt preprocessor integration with custom prompt templates and citations.
Embedding model manifest checks to prevent retrieval from incompatible indexes.

Improvements (v1.1.0)

Supported File Types

Documents: PDF, EPUB, DOCX, ODT, TXT, TEXT, RST, LOG
Structured text: CSV, TSV, JSON, JSONL, YAML, YML
Markdown: MD, Markdown, MDown, MDX, MKD, MKDN
Web content: HTM, HTML, XHTML
Images with OCR: BMP, JPEG, JPG, PNG
Archives: RAR is detected but not implemented yet

Installation

npm run build expects LM Studio's generated .lmstudio/entry.ts to exist. If you are working from a raw source checkout without that file, npx tsc still validates TypeScript compilation.

Configuration

Required Settings

Documents Directory: root directory containing files to index.
Vector Store Directory: directory where the sharded vector store and metadata are written.

Embedding

Embedding Model: LM Studio embedding model id. Use one spelling consistently for indexing and retrieval. If you change this value, run a full reindex because vectors from different models are not comparable.

Retrieval Settings

Retrieval Limit: maximum number of passages returned for each query. Default: 5.
Retrieval Affinity Threshold: minimum similarity score. Default: 0.5.
Max Query Length: estimated-token limit for the user query before embedding. Longer queries are truncated. Default: 512.

Chunking Settings

Chunk Size: target chunk size in estimated tokens. Default: 512.
Chunk Overlap: overlap between consecutive chunks in estimated tokens. Default: 100.
Chunking Strategy: character uses fixed character windows and is fastest. sentence groups complete sentences for better semantic coherence. Default: .

Indexing Settings

Max Concurrent Files: number of files processed at once. Default: 1.
Parser Delay (ms): delay before parsing each file to reduce local service pressure. Default: 500.
Enable OCR: enables OCR for images and image-based PDFs. Default: true.
Exclude Filename Patterns: one glob per line, matched against paths relative to Documents Directory.

Reindexing Controls

Manual Reindex Trigger: turn on and send a chat message to run indexing.
Skip Previously Indexed Files: when manual reindex is on, skip unchanged files if enabled; rebuild all files if disabled.
Automatic First Run: if the vector store is empty, the first chat message triggers indexing automatically.

CLI Indexing

After compiling with npx tsc, run:

Useful environment variables:

Architecture

src/promptPreprocessor.ts: reads config, initializes the vector store, runs retrieval, and injects RAG context.
src/ingestion/indexManager.ts: scans, parses, chunks, embeds, and writes document chunks.
src/parsers/: file-type parsers for HTML, PDF, EPUB, text, images, and Office documents.
src/utils/textChunker.ts: character and sentence chunking plus token estimation.
src/vectorstore/vectorStore.ts: sharded Vectra storage, search, stats cache, and file hash index.

Testing

Use npm run test when the LM Studio-generated .lmstudio/entry.ts file is present because that script runs the full build first.

Troubleshooting

No results found: confirm indexing completed, lower the affinity threshold, and check that the file type is supported.

License

ISC

big-rag