Version: 1.2.0 | License: MIT
A flexible RAG (Retrieval-Augmented Generation) plugin for LM Studio with dynamic embedding model selection, intelligent context management, and multilingual support.
nomic-ai/nomic-embed-text-v1.5-GGUF (built-in, fast)lm-kit/bge-m3-gguf (slower but more accurate)The plugin will automatically load into LM Studio. You should see "Register with LM Studio" in the terminal output.
Access plugin settings in LM Studio → Plugins → RAG-Flex
| Parameter | Default | Range | Description |
|---|---|---|---|
| Message Language | Auto-detected | EN/ZH-TW/JA | Language for runtime messages |
| Embedding Model | nomic-ai/nomic-embed-text-v1.5 | 4 presets | Select from preset embedding models |
| Custom Embedding Model | (empty) | Text input | Override selection above with model key (e.g. text-embedding-bge-m3), identifier (e.g. lm-kit/bge-m3-gguf), or full path |
| Context Usage Threshold | 0.7 | 0.1 - 1.0 | Trigger point for RAG retrieval (lower = more precise) |
| Retrieval Limit | 5 | 1 - 15 | Number of chunks to retrieve |
| Retrieval Affinity Threshold | 0.4 | 0.0 - 1.0 | Similarity threshold (BGE-M3: 0.4-0.6 recommended) |
| Enable Debug Logging | Off | On/Off |
| Model | Size | Speed | Best For | Language Support |
|---|---|---|---|---|
| nomic-ai/nomic-embed-text-v1.5-GGUF | 84 MB | ⚡⚡⚡ Fast | English, general use | English |
| NathanMad/sentence-transformers_all-MiniLM-L12-v2-gguf | 133 MB | ⚡⚡⚡ Fast | Lightweight tasks | English |
| groonga/gte-large-Q4_K_M-GGUF | 216 MB | ⚡⚡ Medium | Balanced performance | Multilingual |
| lm-kit/bge-m3-gguf | 1.16 GB | ⚡ Slow (F16) / ⚡⚡ Medium (Q4) | Chinese, multilingual, high precision | 100+ languages |
Note: Due to SDK limitations, the dropdown only shows preset models. Use the Custom Embedding Model field to specify any downloaded model by entering its model key (e.g. text-embedding-qwen3-embedding-8b), identifier, or full path.
The threshold determines when to switch from full-text injection to RAG retrieval:
When to adjust:
| Threshold | Behavior | Use Case |
|---|---|---|
| 0.3-0.5 | Forces RAG more often | Large documents, memory constraints |
| 0.6-0.7 | Balanced (default) | General use |
| 0.8-0.9 | Allows more full injection | Small documents, need full context |
Different content types require different similarity thresholds:
| Content Type | Recommended Threshold | Reason |
|---|---|---|
| Natural language text | 0.5-0.7 | Clear semantic matching |
| Technical documentation | 0.4-0.6 | Technical terms vary |
| Code/SQL | 0.3-0.4 | Syntax-heavy, lower semantic similarity |
| Mixed language | 0.4-0.5 | Account for language switching |
The plugin automatically detects your system language and sets the UI accordingly:
LANG, LANGUAGE, LC_ALL environment variablesSupported Languages:
📖 For developers: See I18N.md for technical details on the internationalization system, adding new languages, and translation guidelines. Also available in 繁體中文 and 日本語.
Enable debug logging for troubleshooting or development:
Default log location: ./logs/lmstudio-debug.log
Cause: Selected model not downloaded in LM Studio
Solution:
bge-m3)Alternative: Select a different model in plugin settings
Cause: Retrieval affinity threshold too high for your content
Solutions:
How to adjust: LM Studio → Plugins → RAG-Flex → Retrieval Affinity Threshold
Cause: Large file with high-precision embedding model
Solutions:
nomic-embed-text-v1.5 instead of bge-m3Cause: System locale auto-detection doesn't match your preference
Solution:
Note: This only changes plugin runtime messages (errors, status updates). LM Studio's UI language is controlled by LM Studio itself.
Possible causes:
Solutions:
./logs/lmstudio-debug.log\\ or /💡 Pro Tip: All error messages are AI-friendly - paste them directly into your LLM chat for automated troubleshooting!
| Format | Extension | Processing Method | Notes |
|---|---|---|---|
.pdf | Text extraction | Supports text-based PDFs (not scanned images) | |
| Word Documents | .docx | Full document parsing | Preserves structure and formatting |
| Plain Text | .txt | Direct read | UTF-8 encoding recommended |
| Markdown | .md | Markdown parsing | Maintains heading structure |
Not supported: Images, audio, video, Excel spreadsheets, scanned PDFs without OCR
| Feature | RAG-v1 | RAG-Flex (v1.2.0) |
|---|---|---|
| Embedding Models | ❌ Hardcoded (nomic only) | ✅ 4 selectable + auto-detection |
| Multilingual Support | ❌ English only | ✅ English, 繁體中文, 日本語 |
| Error Messages | ❌ Technical English | ✅ User-friendly, localized |
| Context Management | ⚙️ Basic threshold | ✅ Smart threshold-based strategy |
| Affinity Threshold | ❌ Fixed at 0.5 | ✅ Configurable (0.0-1.0) |
| No-result Handling | ❌ Exposes system prompt | ✅ Graceful degradation |
| Model Detection | ❌ Manual configuration | ✅ Auto-detects local models |
| Debug Tools | ❌ None | ✅ Optional debug logging |
| Configuration UI |
Contributions are welcome! Here's how you can help:
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)To add a new language:
src/locales/types.tssrc/locales/[lang].tssrc/locales/index.tssrc/config.ts language optionsREADME.[lang].mdMIT License - see LICENSE file for details.
This means you can:
Requirements:
Author: Henry Chen GitHub: @henrychen95 Repository: rag-flex LM Studio Plugin Page: lmstudio.ai/yongwei/rag-flex
⭐ If RAG-Flex helps your workflow, please star the repository!
Made with ❤️ for the LM Studio community
| Enable debug logs for developers |
| Debug Log Path | ./logs/lmstudio-debug.log | Custom path | Path to debug log file |
| ⚙️ English only |
| ✅ Multilingual (system language) |
git clone https://github.com/henrychen95/rag-flex.git
cd rag-flex
lms dev
📎 Upload: meeting-notes.txt (5 KB)
💬 You: "What were the action items from the meeting?"
🤖 AI: [Reviews entire document] "The action items were:
1. John to prepare Q4 report by Friday
2. Sarah to schedule follow-up meeting..."
📎 Upload: technical-manual.pdf (2 MB)
💬 You: "How do I configure SSL certificates?"
🤖 AI: [Retrieves relevant sections]
"Based on Citation 1 and Citation 3:
To configure SSL certificates, you need to..."
Citation 1: (Page 45) "SSL Configuration involves..."
Citation 3: (Page 89) "Certificate installation steps..."
Scenario: Software developer needs API documentation
Upload: FastAPI-documentation.pdf (3.2 MB)
Ask: "What authentication methods does FastAPI support?"
Result: RAG retrieval mode activated
✓ Retrieved 5 relevant citations
✓ Found JWT, OAuth2, API Key sections
✓ Provided code examples from documentation
Configuration Tips:
- Context Threshold: 0.7 (default)
- Retrieval Limit: 5-7 (for comprehensive coverage)
- Affinity Threshold: 0.5 (technical content)
Scenario: Lawyer reviewing contract terms
Upload: commercial-lease-agreement.docx (250 KB)
Ask: "What are the tenant's responsibilities for maintenance?"
Result: Full-text injection mode (file within threshold)
✓ Entire document injected as context
✓ AI can cross-reference multiple clauses
✓ Comprehensive answer with exact clause numbers
Configuration Tips:
- Context Threshold: 0.8 (allow full injection)
- Language: 繁體中文 (for Traditional Chinese contracts)
Scenario: Understanding database schema
Upload: database-schema.sql (450 KB)
Ask: "Explain the relationship between users and orders tables"
Result: RAG retrieval with lowered threshold
✓ Retrieved relevant CREATE TABLE statements
✓ Found foreign key constraints
✓ Identified junction tables
Configuration Tips:
- Affinity Threshold: 0.3-0.4 (lower for code/SQL)
- Retrieval Limit: 8-10 (capture related tables)
- Model: bge-m3 (better for code with comments in Chinese)
Scenario: Public servant processing applications
Upload: subsidy-application-guidelines-2024.pdf (1.8 MB)
Ask: "申請資格有哪些限制條件?"
Result: Multilingual RAG retrieval
✓ Language auto-detected as Traditional Chinese
✓ Retrieved eligibility criteria sections
✓ Citations include page numbers and article references
Configuration Tips:
- Language: 繁體中文
- Model: bge-m3 (best for Traditional Chinese)
- Affinity Threshold: 0.5-0.6
Scenario: Graduate student literature review
Upload: machine-learning-survey-2024.pdf (4.5 MB)
Ask: "What are the current challenges in transformer architectures?"
Result: Precision RAG retrieval
✓ Retrieved sections from "Challenges" and "Future Work"
✓ Cross-referenced with methodology sections
✓ Provided citations with page numbers
Configuration Tips:
- Context Threshold: 0.6 (force RAG for large papers)
- Retrieval Limit: 10-15 (capture diverse viewpoints)
- Model: gte-large (good balance for academic content)
Available Context = Remaining Context × Threshold
If (File Tokens + Prompt Tokens) > Available Context:
→ Use RAG Retrieval (precise mode)
Else:
→ Use Full-Text Injection (comprehensive mode)