Documentation
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Configuring the Model
You can customize both inference-time and load-time parameters for your model. Inference parameters can be set on a per-request basis, while load parameters are set when loading the model.
Set inference-time parameters such as temperature
, maxTokens
, topP
and more.
result = model.respond(chat, config={
"temperature": 0.6,
"maxTokens": 50,
})
See LLMPredictionConfigInput
in the
Typescript SDK documentation for all configurable fields.
Note that while structured
can be set to a JSON schema definition as an inference-time configuration parameter
(Zod schemas are not supported in the Python SDK), the preferred approach is to instead set the
dedicated response_format
parameter, which allows you to more rigorously
enforce the structure of the output using a JSON or class based schema definition.
Set load-time parameters such as the context length, GPU offload ratio, and more.
.model()
The .model()
retrieves a handle to a model that has already been loaded, or loads a new one on demand (JIT loading).
Note: if the model is already loaded, the given configuration will be ignored.
import lmstudio as lms
model = lms.llm("qwen2.5-7b-instruct", config={
"contextLength": 8192,
"gpu": {
"ratio": 0.5,
}
})
See LLMLoadModelConfig
in the
Typescript SDK documentation for all configurable fields.
.load_new_instance()
The .load_new_instance()
method creates a new model instance and loads it with the specified configuration.
import lmstudio as lms
client = lms.get_default_client()
model = client.llm.load_new_instance("qwen2.5-7b-instruct", config={
"contextLength": 8192,
"gpu": {
"ratio": 0.5,
}
})
See LLMLoadModelConfig
in the
Typescript SDK documentation for all configurable fields.
On this page
Inference Parameters
Load Parameters
Set Load Parameters with .model()
Set Load Parameters with .load_new_instance()