Big RAG Plugin for LM Studio
A powerful RAG (Retrieval-Augmented Generation) plugin for LM Studio that can index and search through gigabytes or even terabytes (not tested) of document data. Hosted here: github.com/ari99/lm_studio_big_rag_plugin
A powerful RAG (Retrieval-Augmented Generation) plugin for LM Studio that can index and search through gigabytes or even terabytes (not tested) of document data. Hosted here: ari99/lm_studio_big_rag_plugin on GitHub.
OCR Support: Optional OCR for image files using Tesseract
Vector Search: Uses Vectra with sharded indexes for efficient vector storage and retrieval (avoids single-file size limits)
Incremental Indexing: Automatically detects and skips already-indexed files
Concurrent Processing: Configurable concurrency for optimal performance
Persistent Storage: Vector embeddings are stored locally and persist across sessions
Supported File Types
Documents: PDF, EPUB, TXT, TEXT
Markdown: MD, MDX, Markdown, MDown, MKD, MKDN
Web Content: HTM, HTML, XHTML
Images (with OCR): BMP, JPEG, JPG, PNG
Archives: RAR (planned - currently not implemented)
Installation
Navigate to the plugin directory:
cd big-rag-plugin
Install dependencies:
npm install
Build the plugin:
npm run build
Run in development mode:
npm run dev
Configuration
The plugin provides the following configuration options in LM Studio:
Required Settings
Documents Directory: Root directory containing your documents (read access required)
Vector Store Directory: Where the vector database will be stored (read/write access required)
Embedding model
Embedding Model (plugin setting): String passed to LM Studioβs embedding load API. Both common forms can work for the same weightsβfor example mixedbread-ai/mxbai-embed-large-v1 (Hub / lms get) and text-embedding-mxbai-embed-large-v1 (as shown in lms ls). Use one spelling consistently for indexing and retrieval so it matches .big-rag-embedding.json; switching spelling without reindexing can trigger a mismatch warning. Default: nomic-ai/nomic-embed-text-v1.5-GGUF.
After changing the embedding model, run a full reindex (toggle Manual Reindex Trigger with Skip Previously Indexed Files off, or clear the vector store and let first-run indexing rebuild). Vectors from different models are not comparable in the same index.
.big-rag-embedding.json: Written under the vector store directory when the index has at least one chunk; records the model id and vector length used to build the index. If the configured model no longer matches this file, retrieval is blocked until you reindex or revert the setting. If the index has zero chunks, this file is removed so metadata cannot drift (including after manual shard deletion).
Indexes built with older plugin versions may have chunks but no manifest; retrieval still works, and a full reindex will create the manifest.
Retrieval Settings
Retrieval Limit (1-20, default: 5): Maximum number of chunks to return
Chunk Size (128-2048 tokens, default: 512): Size of text chunks for embedding
Chunk Overlap (0-512 tokens, default: 100): Overlap between consecutive chunks
Performance Settings
Max Concurrent Files (1-10, default: 1): Number of files to process simultaneously
Enable OCR (default: true): Enable OCR for image files and image-based PDFs using LM Studio's built-in document parser
Reindexing Controls
Manual Reindex Trigger (toggle): Turn this ON and submit any chat message to force indexing to run on every chat session where the plugin is enabled. Flip it OFF once youβre done to stop the automatic reindex loop.
Skip Previously Indexed Files (default: true): If enabled while "Manual Reindex Trigger" is enabled, each manual run touches just the documents that are new or have changed since the last index; if disabled, every chat rebuilds the entire index from scratch. Combine "Skip Previously Indexed Files" and "Manual Reindex Trigger" to choose between incremental updates or repeated full refreshes.
Automatic First-Run: If the vector store is empty, the plugin automatically indexes the configured documents the first time any chat message is processedβno manual input is required.
Usage
Configure the Plugin:
Open LM Studio settings
Navigate to the Big RAG plugin configuration
Set your documents directory (e.g., /Users/user/Documents/MyLibrary)
Set your vector store directory (e.g., /Users/user/.lmstudio/big-rag-db)
Initial Indexing:
The first time you send a message, the plugin will automatically scan and index your documents
This process may take a while depending on the size of your document collection
Progress will be shown in the LM Studio interface
Query Your Documents:
Simply chat with your LM Studio model as usual
The plugin will automatically search your indexed documents for relevant content
Retrieved passages will be injected into the context for the model to use
Architecture
Components
File Scanner (src/ingestion/fileScanner.ts):
Recursively scans directories
Filters for supported file types
Collects file metadata
Document Parsers (src/parsers/):
htmlParser.ts: Extracts text from HTML/HTM files
pdfParser.ts: Extracts text from PDF files
epubParser.ts: Extracts text from EPUB files
textParser.ts: Reads plain text & Markdown files with optional Markdown stripping
imageParser.ts: OCR for image files
documentParser.ts: Routes to appropriate parser
Vector Store (src/vectorstore/vectorStore.ts):
Uses Vectra with sharded indexes (one shard in memory at a time; avoids V8 string size limits)
Supports incremental updates
Efficient similarity search
Index Manager (src/ingestion/indexManager.ts):
Orchestrates the indexing pipeline
Manages concurrent processing
Handles progress reporting
Prompt Preprocessor (src/promptPreprocessor.ts):
Intercepts user queries
Performs vector search
Injects relevant context
Performance Considerations
Large Datasets
Disk Space: The vector store requires additional disk space (typically 10-20% of original document size)
Initial Indexing: Can take several hours for TB-scale collections
Memory Usage: Scales with concurrent processing (reduce maxConcurrentFiles if needed)
Optimization Tips
Start Small: Test with a subset of documents first
Disable OCR: Unless you have many image-based documents, keep OCR disabled
Adjust Concurrency: Lower maxConcurrentFiles on systems with limited resources
Chunk Size: Larger chunks (1024-2048) work better for technical documents
Threshold Tuning: Adjust retrievalAffinityThreshold based on result quality
Troubleshooting
No Results Found
Check that documents directory is correctly configured
Verify that indexing completed successfully
Try lowering the retrieval affinity threshold
Check LM Studio logs for errors
Embedding model mismatch
If you see a message that the index was built with a different embedding model than the one in settings, either change Embedding Model back to the value recorded in .big-rag-embedding.json or run a full reindex after changing the model.
Dimension mismatch means the modelβs output size changed; reindex after switching models or quantizations.
Slow Indexing
Reduce maxConcurrentFiles
Disable OCR if not needed
Ensure vector store directory is on a fast drive (SSD recommended)
Out of Memory
Reduce maxConcurrentFiles to 1 or 2
Process documents in batches by organizing them into subdirectories
Increase system swap space
OCR Not Working
Tesseract.js downloads language data on first use
Ensure internet connectivity during first OCR operation
Check that image files are valid and readable
Failure Reason Reporting
The CLI logs cumulative success / failed counts after each processed document.
Set BIG_RAG_FAILURE_REPORT_PATH=/absolute/path/report.json when running npm run index (or via LM Studio env settings) to emit a JSON report containing all failure reasons and counts after indexing completes. This is useful when triaging stubborn PDFs such as blueprints or large scanned books.
BIG_RAG_EMBEDDING_MODEL: Optional. When set for headless indexing (npm run index:cli / dist/cliIndex.js), overrides the default embedding model id (same default as the pluginβs Embedding Model setting). Empty/unset uses the built-in default from config.ts.
Limitations
RAR Archives: Not yet implemented (files are skipped)
Password-Protected Files: Not supported
Very Large Files: Individual files >100MB may cause memory issues
Non-English OCR: Currently only English OCR is configured