Introduction

Project Setup

Predicting with LLMs

Chat

Image Input

Structured Response

Speculative Decoding

Cancelling Predictions

Text Completions

Configuration Parameters

Working with Chats

Agentic Flows

The .act() call

Tool Definition

Plugins

Introduction to Plugins

Using npm Dependencies

Tools Provider

Prompt Preprocessor

Generators

Custom Configuration

Publishing a Plugin

Text Embedding

Generating embedding vectors

Tokenization

Tokenizing text

Manage Models

List Local Models

List Loaded Models

Load and Access Models

API Reference

LLMLoadModelConfig

LLMPredictionConfigInput

Model Info

Get Context Length

Get Model Info

Introduction

Project Setup

Predicting with LLMs

Chat

Image Input

Structured Response

Speculative Decoding

Cancelling Predictions

Text Completions

Configuration Parameters

Working with Chats

Agentic Flows

The .act() call

Tool Definition

Plugins

Introduction to Plugins

Using npm Dependencies

Tools Provider

Prompt Preprocessor

Generators

Custom Configuration

Publishing a Plugin

Text Embedding

Generating embedding vectors

Tokenization

Tokenizing text

Manage Models

List Local Models

List Loaded Models

Load and Access Models

API Reference

LLMLoadModelConfig

LLMPredictionConfigInput

Model Info

Get Context Length

Get Model Info

Speculative Decoding

Speculative decoding is a technique that can substantially increase the generation speed of large language models (LLMs) without reducing response quality. See Speculative Decoding for more info.

To use speculative decoding in lmstudio-js, simply provide a draftModel parameter when performing the prediction. You do not need to load the draft model separately.

import { LMStudioClient } from "@lmstudio/sdk";

const client = new LMStudioClient();

const mainModelKey = "qwen2.5-7b-instruct";
const draftModelKey = "qwen2.5-0.5b-instruct";

const model = await client.llm.model(mainModelKey);
const result = await model.respond("What are the prime numbers between 0 and 100?", {
  draftModel: draftModelKey,
});

const { content, stats } = result;
console.info(content);
console.info(`Accepted ${stats.acceptedDraftTokensCount}/${stats.predictedTokensCount} tokens`);