Documentation
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Manage Models
Model Info
API Reference
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Manage Models
Model Info
API Reference
Speculative Decoding
Speculative decoding is a technique that can substantially increase the generation speed of large language models (LLMs) without reducing response quality. See Speculative Decoding for more info.
To use speculative decoding in lmstudio-js
, simply provide a draftModel
parameter when performing the prediction. You do not need to load the draft model separately.
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
const mainModelKey = "qwen2.5-7b-instruct";
const draftModelKey = "qwen2.5-0.5b-instruct";
const model = await client.llm.model(mainModelKey);
const result = await model.respond("What are the prime numbers between 0 and 100?", {
draftModel: draftModelKey,
});
const { content, stats } = result;
console.info(content);
console.info(`Accepted ${stats.acceptedDraftTokensCount}/${stats.predictedTokensCount} tokens`);