README.md
Local document ingestion for LM Studio. Extract text from PDFs, Word documents, spreadsheets, and plain text files ā then search inside them by keyword or regex. No cloud upload, no OCR service, everything stays local with complete privacy. This LM Studio plugin processes all files offline with no external services.
Keywords: lm studio plugin, document parser, local pdf reader, extract text from pdf, word document parser, offline document processing, no cloud, private file reading
pdf-parsemammothxlsxcd document-parser-plugin npm install npm run build
Load the built plugin folder in LM Studio.
| Field | Default | Description |
|---|---|---|
| Workspace Path | (blank) | Root directory for relative path lookups. Blank = absolute paths only |
| Max File Size (MB) | 50 | Files larger than this are rejected before parsing |
Extract full text from a document.
parse_document(path="/Users/me/docs/report.pdf") parse_document(path="budget.xlsx", sheet_name="Q1 2025") parse_document(path="contract.docx", max_chars=20000)
Parameters:
path ā relative (within workspace) or absolute file pathformat ā auto (default), pdf, docx, spreadsheet, txtsheet_name ā target a single sheet (spreadsheets only; blank = all sheets)max_chars ā truncate output at this many characters (default 80000 ā 60 pages)Returns: plain text. Spreadsheets return one Markdown table per sheet.
Search for a keyword or regex pattern inside a document.
search_document(path="report.pdf", pattern="revenue") search_document(path="data.xlsx", pattern="/\d{4}-\d{2}-\d{2}/") search_document(path="notes.txt", pattern="TODO", context_lines=3)
Parameters:
path ā file path (relative or absolute)pattern ā literal string or JavaScript regex (/pattern/flags)format ā same as parse_documentcontext_lines ā lines before/after each match (default 2, max 10)max_matches ā stop after this many matches (default 50, max 500)Returns: matched lines with surrounding context and line numbers.
| Extension | Format | Library |
|---|---|---|
.pdf | pdf-parse | |
.docx | Word | mammoth |
.xlsx, .xls, .ods, .csv | Spreadsheet | xlsx |
.txt, .md, .log, .json, .csv (text) | Plain text | Node.js fs |
Summarise a report:
"Summarise this PDF" (with file path) ā
parse_document(path="/path/to/report.pdf")
Analyse a spreadsheet:
"What does the Q1 sheet look like in this Excel file?" ā
parse_document(path="budget.xlsx", sheet_name="Q1 2025")
Find specific clauses in a contract:
"Find all mentions of 'termination' in this contract" ā
search_document(path="contract.docx", pattern="termination", context_lines=3)
Extract dates from a document:
"Find all dates in this report" ā
search_document(path="report.pdf", pattern="/\d{1,2}[\/\-]\d{1,2}[\/\-]\d{2,4}/")
Read a log file:
"Show me all ERROR lines in this log" ā
search_document(path="app.log", pattern="/^ERROR/", max_matches=100)