docs / promptPreprocessor.md
The prompt preprocessor handles user queries with attached files for a RAG (Retrieval-Augmented Generation) system. It automatically selects the optimal context injection strategy based on file sizes and the available context window of the language model.
preprocessParameters:
ctl: PromptPreprocessorController β preprocessor controlleruserMessage: ChatMessage β user messageWorkflow:
chooseContextInjectionStrategy()retrievalWhen used:
Process:
1. Parse each file via ctl.client.files.parseDocument() 2. Extract full content 3. Format with headers: ** filename full content ** 4. Inject into prompt with instructions
Output format:
This is a Enriched Context Generation scenario. The following content was found in the files provided by the user. ** document.pdf full content ** [full file content] ** end of document.pdf ** Based on the content above, please provide a response to the user query. User query: [user query]
When used:
Process:
1. Load embedding model (nomic-embed-text-v1.5-GGUF) 2. Perform semantic search via ctl.client.files.retrieve() 3. Filter results by retrievalAffinityThreshold 4. Add found citations to the prompt 5. Attach citations via ctl.addCitations()
Output format (with results):
The following citations were found in the files provided by the user: Citation 1: "[citation text]" Citation 2: "[citation text]" Use the citations above to respond to the user query, only if they are relevant. Otherwise, respond to the best of your ability without them. User Query: [user query]
Output format (no results):
Important: No citations were found in the user files for the user query. In less than one sentence, inform the user of this. Then respond to the query to the best of your ability. User Query: [user query]
When used:
The chooseContextInjectionStrategy() function makes decisions based on token calculations:
| Step | Description |
|---|---|
| 1 | Load LLM model via ctl.client.llm.model() |
| 2 | Measure current context usage via measureContextWindow() |
| 3 | Parse files and count total tokens |
| 4 | Calculate available tokens with 70% target utilization |
| 5 | Compare: totalFilePlusPromptTokenCount > availableContextTokens |
const contextOccupiedFraction = contextOccupiedPercent / 100; const targetContextUsePercent = 0.7; const targetContextUsage = targetContextUsePercent * (1 - contextOccupiedFraction); const availableContextTokens = Math.floor(modelRemainingContextLength * targetContextUsage);
If totalFileTokenCount + userPromptTokenCount > availableContextTokens β retrieval Else β inject-full-content
Measures context window utilization:
Returns:
{ totalTokensInContext: number, // total tokens in context modelContextLength: number, // model context size modelRemainingContextLength: number, // remaining tokens available contextOccupiedPercent: number // percentage filled }
Applies the model's prompt template:
model.applyPromptTemplate(ctx)"?" and retriesHandles the retrieval strategy:
retrievalAffinityThresholdHandles the full-content injection strategy:
input.replaceText()Parameters from configSchematics:
| Parameter | Type | Description |
|---|---|---|
retrievalLimit | number | Maximum number of citations to retrieve |
retrievalAffinityThreshold | number | Relevance threshold for filtering citations (0.0β1.0) |
The preprocessor displays progress via PredictionProcessStatusController:
| Status | Message |
|---|---|
| Deciding | Deciding how to handle the document(s)... |
| Loading parser | Loading parser for {filename}... |
| Parser loaded | {library} loaded for {filename}... |
| Processing | Parsing file {filename}... ({progress}%) |
| Retrieval | Retrieving relevant citations for user query... |
| Done | Retrieved {N} relevant citations for user query |
The preprocessor outputs debug information via ctl.debug():
import { text, type Chat, type ChatMessage, type FileHandle, type LLMDynamicHandle, type PredictionProcessStatusController, type PromptPreprocessorController, } from "@lmstudio/sdk";
Embedding Model: nomic-ai/nomic-embed-text-v1.5-GGUF
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β User Message + Files β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β preprocess() β β - Load history β β - Filter files (no images) β β - Choose strategy β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββ΄ββββββββββββββββ β β βΌ βΌ βββββββββββββββββββββββ βββββββββββββββββββββββ β inject-full-content β β retrieval β β β β β β - Parse all files β β - Load embeddings β β - Format content β β - Semantic search β β - Build prompt β β - Filter by score β βββββββββββββββββββββββ β - Build prompt β βββββββββββββββββββββββ β βββββββββββββββββ΄ββββββββββββββββ β β βΌ βΌ βββββββββββββββββββ βββββββββββββββββββ β Results found β β No results β β - Add citationsβ β - Inform user β β - Continue β β - Continue β βββββββββββββββββββ βββββββββββββββββββ