Documentation
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Model Info
Manage Models in Memory
AI models are huge. It can take a while to load them into memory. LM Studio's SDK allows you to precisely control this process.
Model namespaces:
client.llm
namespaceclient.embedding
namespacelmstudio.llm
is equivalent to client.llm.model
on the default clientlmstudio.embedding_model
is equivalent to client.embedding.model
on the default clientMost commonly:
.model()
to get any currently loaded model.model("model-key")
to use a specific modelAdvanced (manual model management):
.load_new_instance("model-key")
to load a new instance of a model.unload("model-key")
or model_handle.unload()
to unload a model from memory.model()
If you already have a model loaded in LM Studio (either via the GUI or lms load
),
you can use it by calling .model()
without any arguments.
import lmstudio as lms
model = lms.llm()
.model("model-key")
If you want to use a specific model, you can provide the model key as an argument to .model()
.
Calling .model("model-key")
will load the model if it's not already loaded, or return the existing instance if it is.
import lmstudio as lms
model = lms.llm("llama-3.2-1b-instruct")
.load_new_instance()
Use load_new_instance()
to load a new instance of a model, even if one already exists.
This allows you to have multiple instances of the same or different models loaded at the same time.
import lmstudio as lms
client = lms.get_default_client()
llama = client.llm.load_new_instance("llama-3.2-1b-instruct")
another_llama = client.llm.load_new_instance("llama-3.2-1b-instruct", "second-llama")
If you provide an instance identifier that already exists, the server will throw an error. So if you don't really care, it's safer to not provide an identifier, in which case the server will generate one for you. You can always check in the server tab in LM Studio, too!
.unload()
Once you no longer need a model, you can unload it by simply calling unload()
on its handle.
import lmstudio as lms
model = lms.llm()
model.unload()
You can also specify the same load-time configuration options when loading a model, such as Context Length and GPU offload.
See load-time configuration for more.
You can specify a time to live for a model you load, which is the idle time (in seconds) after the last request until the model unloads. See Idle TTL for more on this.
If you specify a TTL to model()
, it will only apply if model()
loads
a new instance, and will not retroactively change the TTL of an existing instance.
import lmstudio as lms
llama = lms.llm("llama-3.2-1b-instruct", ttl=3600)