lms load

Load or unload models, set context length, GPU offload, TTL, or estimate memory usage without loading.

The lms load command loads a model into memory. You can optionally set parameters such as context length, GPU offload, and TTL. This guide also covers unloading models with lms unload.

Flags

[path] (optional) : string

The path of the model to load. If not provided, you will be prompted to select one

--ttl (optional) : number

If provided, when the model is not used for this number of seconds, it will be unloaded

--gpu (optional) : string

How much to offload to the GPU. Values: 0-1, off, max

--context-length (optional) : number

The number of tokens to consider as context when generating text

--identifier (optional) : string

The identifier to assign to the loaded model for API reference

--estimate-only (optional) : boolean

Print a resource (memory) estimate and exit without loading the model

Load a model

Load a model into memory by running the following command:

lms load <model_key>

You can find the model_key by first running lms ls to list your locally downloaded models.

Set a custom identifier

Optionally, you can assign a custom identifier to the loaded model for API reference:

lms load <model_key> --identifier "my-custom-identifier"

You will then be able to refer to this model by the identifier my_model in subsequent commands and API calls (model parameter).

Set context length

You can set the context length when loading a model using the --context-length flag:

lms load <model_key> --context-length 4096

This determines how many tokens the model will consider as context when generating text.

Set GPU offload

Control GPU memory usage with the --gpu flag:

lms load <model_key> --gpu 0.5    # Offload 50% of layers to GPU
lms load <model_key> --gpu max    # Offload all layers to GPU
lms load <model_key> --gpu off    # Disable GPU offloading

If not specified, LM Studio will automatically determine optimal GPU usage.

Set TTL

Set an auto-unload timer with the --ttl flag (in seconds):

lms load <model_key> --ttl 3600   # Unload after 1 hour of inactivity

Estimate resources without loading

Preview memory requirements before loading a model using --estimate-only:

lms load --estimate-only <model_key>

Optional flags such as --context-length and --gpu are honored and reflected in the estimate. The estimator accounts for factors like context length, flash attention, and whether the model is vision‑enabled.

Example:

$ lms load --estimate-only gpt-oss-120b
Model: openai/gpt-oss-120b
Estimated GPU Memory:   65.68 GB
Estimated Total Memory: 65.68 GB

Estimate: This model may be loaded based on your resource guardrails settings.

Unload models

Use lms unload to remove models from memory.

Flags

[model_key] (optional) : string

The key of the model to unload. If not provided, you will be prompted to select one

--all (optional) : flag

Unload all currently loaded models

--host (optional) : string

The host address of a remote LM Studio instance to connect to

Unload a specific model

lms unload <model_key>

If no model key is provided, you will be prompted to select from currently loaded models.

Unload all models

lms unload --all

Unload from a remote LM Studio instance

lms unload <model_key> --host <host>

Operate on a remote LM Studio instance

lms load supports the --host flag to connect to a remote LM Studio instance.

lms load <model_key> --host <host>

For this to work, the remote LM Studio instance must be running and accessible from your local machine, e.g. be accessible on the same subnet.

lms load

Flags

Load a model

Set a custom identifier

Set context length

Set GPU offload

Set TTL

Estimate resources without loading

Unload models

Flags

Unload a specific model

Unload all models

Unload from a remote LM Studio instance

Operate on a remote LM Studio instance

On this page