Documentation

Agentic Flows

Text Embedding

Tokenization

Manage Models

Model Info

API Reference

Text Completions

Use llm.complete(...) to generate text completions from a loaded language model. Text completions mean sending an non-formatted string to the model with the expectation that the model will complete the text.

This is different from multi-turn chat conversations. For more information on chat completions, see Chat Completions.

1. Instantiate a Model

First, you need to load a model to generate completions from. This can be done using the model method on the llm handle.

import { LMStudioClient } from "@lmstudio/sdk";

const client = new LMStudioClient();
const model = await client.llm.model("qwen2.5-7b-instruct");

2. Generate a Completion

Once you have a loaded model, you can generate completions by passing a string to the complete method on the llm handle.

const completion = model.complete("My name is", {
  maxTokens: 100,
});

for await (const { content } of completion) {
  process.stdout.write(content);
}

console.info(); // Write a new line for cosmetic purposes

3. Print Prediction Stats

You can also print prediction metadata, such as the model used for generation, number of generated tokens, time to first token, and stop reason.

console.info("Model used:", completion.modelInfo.displayName);
console.info("Predicted tokens:", completion.stats.predictedTokensCount);
console.info("Time to first token (seconds):", completion.stats.timeToFirstTokenSec);
console.info("Stop reason:", completion.stats.stopReason);

Example: Get an LLM to Simulate a Terminal

Here's an example of how you might use the complete method to simulate a terminal.

import { LMStudioClient } from "@lmstudio/sdk";
import { createInterface } from "node:readline/promises";

const rl = createInterface({ input: process.stdin, output: process.stdout });
const client = new LMStudioClient();
const model = await client.llm.model();
let history = "";

while (true) {
  const command = await rl.question("$ ");
  history += "$ " + command + "\n";

  const prediction = model.complete(history, { stopStrings: ["$"] });
  for await (const { content } of prediction) {
    process.stdout.write(content);
  }
  process.stdout.write("\n");

  const { content } = await prediction.result();
  history += content;
}