Documentation
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Model Info
API Reference
Manage Models in Memory
AI models are huge. It can take a while to load them into memory. LM Studio's SDK allows you to precisely control this process.
Most commonly:
.model()
to get any currently loaded model.model("model-key")
to use a specific modelAdvanced (manual model management):
.load("model-key")
to load a new instance of a modelmodel.unload()
to unload a model from memory.model()
If you already have a model loaded in LM Studio (either via the GUI or lms load
), you can use it by calling .model()
without any arguments.
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
const model = await client.llm.model();
.model("model-key")
If you want to use a specific model, you can provide the model key as an argument to .model()
.
Calling .model("model-key")
will load the model if it's not already loaded, or return the existing instance if it is.
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
const model = await client.llm.model("llama-3.2-1b-instruct");
.load()
Use load()
to load a new instance of a model, even if one already exists. This allows you to have multiple instances of the same or different models loaded at the same time.
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
const llama = await client.llm.load("llama-3.2-1b-instruct");
const another_llama = await client.llm.load("llama-3.2-1b-instruct", {
identifier: "second-llama"
});
If you provide an instance identifier that already exists, the server will throw an error. So if you don't really care, it's safer to not provide an identifier, in which case the server will generate one for you. You can always check in the server tab in LM Studio, too!
.unload()
Once you no longer need a model, you can unload it by simply calling unload()
on its handle.
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
const model = await client.llm.model();
await model.unload();
You can also specify the same load-time configuration options when loading a model, such as Context Length and GPU offload.
See load-time configuration for more.
You can specify a time to live for a model you load, which is the idle time (in seconds) after the last request until the model unloads. See Idle TTL for more on this.
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
const model = await client.llm.load("llama-3.2-1b-instruct", {
ttl: 300, // 300 seconds
});