Documentation
Load a model into memory, set context length, GPU offload, TTL, or estimate memory usage without loading.
The lms load command loads a model into memory. You can optionally set parameters such as context length, GPU offload, and TTL.
[path] (optional) : string
The path of the model to load. If not provided, you will be prompted to select one
--ttl (optional) : number
If provided, when the model is not used for this number of seconds, it will be unloaded
--gpu (optional) : string
How much to offload to the GPU. Values: 0-1, off, max
--context-length (optional) : number
The number of tokens to consider as context when generating text
--identifier (optional) : string
The identifier to assign to the loaded model for API reference
--estimate-only (optional) : boolean
Print a resource (memory) estimate and exit without loading the model
Load a model into memory by running the following command:
lms load <model_key>
You can find the model_key by first running lms ls to list your locally downloaded models.
Optionally, you can assign a custom identifier to the loaded model for API reference:
lms load <model_key> --identifier "my-custom-identifier"
You will then be able to refer to this model by the identifier my_model in subsequent commands and API calls (model parameter).
You can set the context length when loading a model using the --context-length flag:
lms load <model_key> --context-length 4096
This determines how many tokens the model will consider as context when generating text.
Control GPU memory usage with the --gpu flag:
lms load <model_key> --gpu 0.5 # Offload 50% of layers to GPU lms load <model_key> --gpu max # Offload all layers to GPU lms load <model_key> --gpu off # Disable GPU offloading
If not specified, LM Studio will automatically determine optimal GPU usage.
Set an auto-unload timer with the --ttl flag (in seconds):
lms load <model_key> --ttl 3600 # Unload after 1 hour of inactivity
Preview memory requirements before loading a model using --estimate-only:
lms load --estimate-only <model_key>
Optional flags such as --context-length and --gpu are honored and reflected in the estimate. The estimator accounts for factors like context length, flash attention, and whether the model is vision‑enabled.
Example:
$ lms load --estimate-only gpt-oss-120b Model: openai/gpt-oss-120b Estimated GPU Memory: 65.68 GB Estimated Total Memory: 65.68 GB Estimate: This model may be loaded based on your resource guardrails settings.
lms load supports the --host flag to connect to a remote LM Studio instance.
lms load <model_key> --host <host>
For this to work, the remote LM Studio instance must be running and accessible from your local machine, e.g. be accessible on the same subnet.
This page's source is available on GitHub
On this page
Parameters
Load a model
Set a custom identifier
Set context length
Set GPU offload
Set TTL
Estimate resources without loading
Operate on a remote LM Studio instance