LM Studio 0.3.23
📣 ICYMI: LM Studio is now free for work! Read more https://lmstudio.ai/blog/free-for-work
LM Studio 0.3.23 is now available as a stable release. This build focuses on reliability improvements, performance for underpowered devices, and a handful of bug fixes.
openai/gpt-oss
in-chat tool calling reliabilityTool names are now consistently formatted before being sent to the model. Previously, tools with spaces in their names would confuse gpt-oss and lead to tool call failures. Tool names are now converted to snake_case.
Additionally, we squashed a few parsing bugs that could have previously led to parsing errors in the chat. You might notice significant improvements in tool calling reliability.
This is a change in behavior compared with version 0.3.22.
message.content
will no longer include reasoning content or <think>
tags.choices.message.reasoning
(non-streaming) and choices.delta.reasoning
(streaming).o3-mini
.In this version, we added an advanced model load setting to place all MoE expert weights onto the CPU or GPU (default).
Turn on to force MoE expert weights on CPU. Try this for low VRAM machines
This is beneficial if you don't have enough VRAM to offload the entire model to GPU dedicated memory. If this is the case, try to turn on the "Force Model Expert Weights onto CPU" option in the advanced load settings.
If you can offload the entire model to GPU memory, you're better off sticking with placing expert weights onto the GPU as well (this is the default option).
This is utilizing the same underlying technology as llama.cpp
's --n-cpu-moe
.
Recall that you can set persistent per-model settings. See the docs for more info.
Build 3
Build 2
/v1/chat/completions
Error: EPERM: operation not permitted, unlink
when auto-updating harmonyBuild 1
v1/chat/completions
message.content
will not include reasoning content or special tagso3-mini
.choices.message.reasoning
(stream=false) and choices.delta.reasoning
(stream=true)