Forked from mindstudio/big-rag
Project Files
TESTING.md
This guide covers local validation for Big RAG v1.1.0.
npm install.Run TypeScript compilation:
npx tsc
Run the current automated smoke tests:
node --test dist/tests/parseDocument.test.js dist/tests/fileExcludePatterns.test.js
Use npm run test when .lmstudio/entry.ts exists, because that script runs the full build before executing tests.
Create a small mixed-format dataset:
Optional files to add:
~/test-documents~/.lmstudio/big-rag-test-dbExpected result:
Set:
Expected:
Set:
Expected:
Set:
Expected:
Verify that these file types are discovered and parsed:
Expected:
mammoth.content.xml when adm-zip is available.Index a large text or PDF file that produces many chunks.
Expected:
After indexing, send several chat messages without changing settings.
Expected:
Change one of these settings:
Expected:
Paste a long multi-paragraph query and set:
Expected:
Compile and run:
Optional CLI environment variables:
Expected:
minChunkLength or inspect parser output.minChunkLength or .Include:
.txt, .text.md, .markdown, .mdown, .mdx, .mkd, .mkdn.csv, .tsv.json, .jsonl.yaml, .yml.rst, .log.html, .htm, .xhtml.pdf, .epub.docx, .odt.bmp, .jpg, .jpeg, .png when OCR is enabledretrievalAffinityThresholdmaxConcurrentFiles low for PDFs.npm run build fails on .lmstudio/entry.ts, validate with npx tsc or run through LM Studio tooling to generate the entry file.mkdir -p ~/test-documents/subfolder
echo "This is a test document about artificial intelligence and machine learning." > ~/test-documents/test1.txt
echo '{"topic":"retrieval","summary":"RAG combines search with generation."}' > ~/test-documents/test2.json
echo "service: search
environment: local" > ~/test-documents/config.yaml
echo "# Deep Learning
Deep learning uses neural networks with multiple layers." > ~/test-documents/subfolder/test3.md
What is machine learning?
Chunk Size: 512
Chunk Overlap: 100
Chunking Strategy: character
Chunking Strategy: sentence
Min Chunk Length: 50
Max Query Length: 512
npx tsc
node dist/cliIndex.js ~/test-documents ~/.lmstudio/big-rag-cli-test-db
BIG_RAG_CHUNKING_STRATEGY=sentence
BIG_RAG_MIN_CHUNK_LENGTH=20
BIG_RAG_CHUNK_SIZE=512
BIG_RAG_CHUNK_OVERLAP=100
BIG_RAG_MAX_CONCURRENT=1
BIG_RAG_ENABLE_OCR=false