Documentation
Load a model into memory, set context length, GPU offload, TTL, or estimate memory usage without loading.
The lms load
command loads a model into memory. You can optionally set parameters such as context length, GPU offload, and TTL.
[path] (optional) : string
The path of the model to load. If not provided, you will be prompted to select one
--ttl (optional) : number
If provided, when the model is not used for this number of seconds, it will be unloaded
--gpu (optional) : string
How much to offload to the GPU. Values: 0-1, off, max
--context-length (optional) : number
The number of tokens to consider as context when generating text
--identifier (optional) : string
The identifier to assign to the loaded model for API reference
--estimate-only (optional) : boolean
Print a resource (memory) estimate and exit without loading the model
Load a model into memory by running the following command:
lms load <model_key>
You can find the model_key
by first running lms ls
to list your locally downloaded models.
Optionally, you can assign a custom identifier to the loaded model for API reference:
lms load <model_key> --identifier "my-custom-identifier"
You will then be able to refer to this model by the identifier my_model
in subsequent commands and API calls (model
parameter).
You can set the context length when loading a model using the --context-length
flag:
lms load <model_key> --context-length 4096
This determines how many tokens the model will consider as context when generating text.
Control GPU memory usage with the --gpu
flag:
lms load <model_key> --gpu 0.5 # Offload 50% of layers to GPU lms load <model_key> --gpu max # Offload all layers to GPU lms load <model_key> --gpu off # Disable GPU offloading
If not specified, LM Studio will automatically determine optimal GPU usage.
Set an auto-unload timer with the --ttl
flag (in seconds):
lms load <model_key> --ttl 3600 # Unload after 1 hour of inactivity
Preview memory requirements before loading a model using --estimate-only
:
lms load --estimate-only <model_key>
Optional flags such as --context-length
and --gpu
are honored and reflected in the estimate. The estimator accounts for factors like context length, flash attention, and whether the model is vision‑enabled.
Example:
$ lms load --estimate-only gpt-oss-120b Model: openai/gpt-oss-120b Estimated GPU Memory: 65.68 GB Estimated Total Memory: 65.68 GB Estimate: This model may be loaded based on your resource guardrails settings.
lms load
supports the --host
flag to connect to a remote LM Studio instance.
lms load <model_key> --host <host>
For this to work, the remote LM Studio instance must be running and accessible from your local machine, e.g. be accessible on the same subnet.
This page's source is available on GitHub
On this page
Parameters
Load a model
Set a custom identifier
Set context length
Set GPU offload
Set TTL
Estimate resources without loading
Operate on a remote LM Studio instance