You can customize both inference-time and load-time parameters for your model. Inference parameters can be set on a per-request basis, while load parameters are set when loading the model.

Inference Parameters

Set inference-time parameters such as temperature, maxTokens, topP and more.

result = model.respond(chat, config={
    "temperature": 0.6,
    "maxTokens": 50,
})

See LLMPredictionConfigInput in the Typescript SDK documentation for all configurable fields.

Note that while structured can be set to a JSON schema definition as an inference-time configuration parameter (Zod schemas are not supported in the Python SDK), the preferred approach is to instead set the dedicated response_format parameter, which allows you to more rigorously enforce the structure of the output using a JSON or class based schema definition.

Load Parameters

Set load-time parameters such as the context length, GPU offload ratio, and more.

Set Load Parameters with `.model()`

The .model() retrieves a handle to a model that has already been loaded, or loads a new one on demand (JIT loading).

Note: if the model is already loaded, the given configuration will be ignored.

import lmstudio as lms
model = lms.llm("qwen2.5-7b-instruct", config={
    "contextLength": 8192,
    "gpu": {
      "ratio": 0.5,
    }
})

See LLMLoadModelConfig in the Typescript SDK documentation for all configurable fields.

Set Load Parameters with `.load_new_instance()`

The .load_new_instance() method creates a new model instance and loads it with the specified configuration.

import lmstudio as lms
client = lms.get_default_client()
model = client.llm.load_new_instance("qwen2.5-7b-instruct", config={
    "contextLength": 8192,
    "gpu": {
      "ratio": 0.5,
    }
})

See LLMLoadModelConfig in the Typescript SDK documentation for all configurable fields.

Configuring the Model

Inference Parameters

Load Parameters

Set Load Parameters with .model()

Set Load Parameters with .load_new_instance()

Set Load Parameters with `.model()`

Set Load Parameters with `.load_new_instance()`