Usage Examples
This document provides practical examples of using the Big RAG Plugin with different types of document collections.
Example 1: Technical Documentation Library
Scenario
You have a large collection of technical documentation, API references, and tutorials that you want to query using natural language.
Setup
# Directory structure
~/Documents/tech-library/
├── python/
│ ├── official-docs/
│ ├── tutorials/
│ └── api-reference/
├── javascript/
│ ├── mdn-docs/
│ └── frameworks/
└── databases/
├── postgresql/
└── mongodb/
Configuration
- Documents Directory:
~/Documents/tech-library
- Vector Store Directory:
~/.lmstudio/tech-library-db
- Chunk Size: 1024 (larger chunks for technical content)
- Chunk Overlap: 200
- Retrieval Limit: 7
- Affinity Threshold: 0.6
- Max Concurrent Files: 5
- OCR: Disabled
Example Queries
Query 1: "How do I connect to PostgreSQL using Python?"
Expected Behavior:
- Retrieves relevant passages from both Python and PostgreSQL documentation
- Combines information from multiple sources
- Provides code examples if available in the docs
Query 2: "What are the differences between var, let, and const in JavaScript?"
Expected Behavior:
- Finds relevant sections from JavaScript documentation
- Returns explanations with examples
- May include best practices if documented
Query 3: "Show me MongoDB aggregation pipeline examples"
Expected Behavior:
- Retrieves MongoDB-specific documentation
- Includes practical examples
- May reference multiple documents showing different use cases
Example 2: Research Paper Collection
Scenario
You're a researcher with hundreds of PDF papers that you want to search and reference.
Setup
# Directory structure
~/Research/papers/
├── machine-learning/
│ ├── deep-learning/
│ ├── reinforcement-learning/
│ └── nlp/
├── computer-vision/
└── robotics/
Configuration
- Documents Directory:
~/Research/papers
- Vector Store Directory:
~/.lmstudio/research-db
- Chunk Size: 768 (balanced for academic writing)
- Chunk Overlap: 150
- Retrieval Limit: 10 (more results for research)
- Affinity Threshold: 0.55
- Max Concurrent Files: 3 (PDFs are slower to parse)
- OCR: Disabled (assuming text-based PDFs)
Example Queries
Query 1: "What are the latest approaches to attention mechanisms in transformers?"
Expected Behavior:
- Searches across all NLP papers
- Retrieves relevant sections discussing attention
- May include citations from multiple papers
Query 2: "Compare different reinforcement learning algorithms for robotics"
Expected Behavior:
- Finds content from both RL and robotics papers
- Retrieves comparative information
- Provides context from multiple sources
Query 3: "What datasets are commonly used for object detection?"
Expected Behavior:
- Searches computer vision papers
- Lists datasets mentioned in papers
- May include performance benchmarks
Example 3: Legal Document Archive
Scenario
A law firm with thousands of legal documents, contracts, and case files.
Setup
# Directory structure
~/Legal/documents/
├── contracts/
│ ├── 2020/
│ ├── 2021/
│ ├── 2022/
│ ├── 2023/
│ └── 2024/
├── case-files/
└── regulations/
Configuration
- Documents Directory:
~/Legal/documents
- Vector Store Directory:
~/.lmstudio/legal-db
- Chunk Size: 512 (standard for legal text)
- Chunk Overlap: 100
- Retrieval Limit: 5
- Affinity Threshold: 0.7 (higher precision for legal)
- Max Concurrent Files: 2 (careful processing)
- OCR: Enabled (for scanned documents)
Example Queries
Query 1: "Find all contracts with non-compete clauses"
Expected Behavior:
- Searches across all contract documents
- Retrieves sections containing non-compete language
- Provides file references for review
Query 2: "What are the standard terms for intellectual property rights?"
Expected Behavior:
- Finds IP-related clauses across documents
- Shows variations in language
- Helps identify standard vs. custom terms
Query 3: "Show precedents for breach of contract cases"
Expected Behavior:
- Searches case files
- Retrieves relevant case information
- Provides context for similar situations
Example 4: Personal Knowledge Base
Scenario
Personal collection of notes, articles, ebooks, and saved web pages.
Setup
# Directory structure
~/Knowledge/
├── books/
│ ├── fiction/
│ └── non-fiction/
├── articles/
│ ├── saved-webpages/
│ └── pdfs/
├── notes/
│ ├── work/
│ └── personal/
└── recipes/
Configuration
- Documents Directory:
~/Knowledge
- Vector Store Directory:
~/.lmstudio/knowledge-db
- Chunk Size: 512
- Chunk Overlap: 100
- Retrieval Limit: 5
- Affinity Threshold: 0.5
- Max Concurrent Files: 4
- OCR: Enabled (for recipe images, etc.)
Example Queries
Query 1: "What did I save about productivity techniques?"
Expected Behavior:
- Searches across articles and notes
- Finds productivity-related content
- Combines information from multiple sources
Query 2: "Find that recipe for chocolate cake"
Expected Behavior:
- Searches recipe directory
- May use OCR if recipe is an image
- Returns recipe details
Query 3: "What books have I read about history?"
Expected Behavior:
- Searches book collection
- Identifies history-related books
- May provide summaries or key points
Example 5: Software Development Project
Scenario
Large codebase with documentation, README files, and code comments.
Setup
# Directory structure
~/Projects/myapp/
├── docs/
│ ├── api/
│ ├── guides/
│ └── tutorials/
├── README.md
├── CONTRIBUTING.md
└── src/
└── (various .md files for documentation)
Configuration
- Documents Directory:
~/Projects/myapp
- Vector Store Directory:
~/.lmstudio/myapp-docs-db
- Chunk Size: 768
- Chunk Overlap: 150
- Retrieval Limit: 6
- Affinity Threshold: 0.55
- Max Concurrent Files: 5
- OCR: Disabled
Example Queries
Query 1: "How do I set up the development environment?"
Expected Behavior:
- Finds setup instructions from README or guides
- Provides step-by-step information
- May reference multiple documentation files
Query 2: "What's the API for user authentication?"
Expected Behavior:
- Searches API documentation
- Retrieves authentication-related endpoints
- Shows usage examples
Query 3: "How do I contribute to this project?"
Expected Behavior:
- Finds CONTRIBUTING.md content
- Provides guidelines and workflow
- May include code style requirements
Example 6: Medical/Healthcare Records (Anonymized)
Scenario
Healthcare provider with anonymized patient records, research notes, and medical literature.
Setup
# Directory structure
~/Medical/data/
├── research/
├── literature/
└── case-studies/
Configuration
- Documents Directory:
~/Medical/data
- Vector Store Directory:
~/.lmstudio/medical-db
- Chunk Size: 512
- Chunk Overlap: 100
- Retrieval Limit: 8
- Affinity Threshold: 0.65 (higher precision for medical)
- Max Concurrent Files: 3
- OCR: Enabled (for scanned records)
Example Queries
Query 1: "What are common treatments for condition X?"
Expected Behavior:
- Searches medical literature and case studies
- Retrieves treatment protocols
- Provides evidence-based information
Query 2: "Find cases with similar symptoms"
Expected Behavior:
- Searches case studies
- Identifies similar presentations
- Helps with differential diagnosis
Large Text Collections (>10GB)
- Use higher concurrency (5-8)
- Disable OCR unless needed
- Consider processing in batches
- Use SSD for vector store
PDF-Heavy Collections
- Lower concurrency (2-3)
- Increase chunk size (1024+)
- Allow more time for initial indexing
- Monitor memory usage
- Enable OCR selectively
- Use moderate concurrency (3-4)
- Adjust threshold based on quality
- Test with small subset first
Frequently Updated Collections
- Enable auto-reindex
- Use file watching (future feature)
- Keep vector store on fast storage
- Regular maintenance
Troubleshooting Examples
Example: No Results for Known Content
Problem: Querying for content you know exists returns no results.
Solutions:
- Lower affinity threshold (try 0.3-0.4)
- Rephrase query to match document language
- Check that file was actually indexed
- Verify file format is supported
Example: Too Many Irrelevant Results
Problem: Getting too many low-quality matches.
Solutions:
- Increase affinity threshold (try 0.7-0.8)
- Reduce retrieval limit
- Use more specific queries
- Adjust chunk size for content type
Example: Slow Indexing
Problem: Initial indexing taking too long.
Solutions:
- Reduce max concurrent files
- Disable OCR if not needed
- Process subdirectories separately
- Check disk I/O performance
Best Practices
- Start Small: Test with a subset before indexing everything
- Tune Settings: Adjust based on your specific content
- Monitor Performance: Watch memory and disk usage
- Regular Maintenance: Periodically rebuild index for optimization
- Backup: Keep backups of your vector store
- Document: Note what settings work best for your use case
- Iterate: Refine queries and settings based on results