LM Studio 0.3.9
•
2025-01-30
Optionally auto-unload unused API models after a certain amount of time
LM Studio 0.3.9 includes a new Idle TTL feature, support for nested folders in Hugging Face repositories, and an experimental API to receive reasoning_content
in a separate field in chat completion responses.
An earlier build of 0.3.9 had a bug with streaming DeepSeek R1 chat completion responses. Please update to the latest build (5) to fix this issue.
Use case: imagine you're using an app like Zed, Cline, or Continue.dev to interact with LLMs served by LM Studio. These apps leverage JIT to load models on-demand the first time you use them.
Problem: When you're not actively using a model, you might don't want it to remain loaded in memory.
Solution: Set a TTL for models loaded via API requests. The idle timer resets every time the model receives a request, so it won't disappear while you use it. A model is considered idle if it's not doing any work. When the idle TTL expires, the model is automatically unloaded from memory.
You can set the TTL in seconds in the request payload, or use lms load --ttl <seconds>
for command line use.
Read more in the docs article: TTL and Auto-Evict.
reasoning_content
in Chat Completion ResponsesFor DeepSeek R1, get reasoning content in a separate field
DeepSeek R1 models generate content within <think>
</think>
tags. This content is the model's "reasoning" process. In chat completion responses, you can now receive this content in a separate field called reasoning_content
following the pattern in DeepSeek's API.
This works for both streaming and non-streaming completions. You can turn this on in App Settings > Developer. This feature is currently experimental.
Note: per DeepSeek's docs, you should not pass back reasoning content to the model in the next request.
LM Studio supports multiple variants of llama.cpp
engines (CPU-only, CUDA, Vulkan, ROCm, Metal) as well as an Apple MLX engine. These engines receive frequent updates, especially when new models are released.
To reduce the need for manually updating multiple pieces, we've introduced auto-update for runtimes. This is enabled by default, but you can turn it off in App Settings.
After a runtime is updated you will see a notification showing the release notes. You can also manage this yourself in the runtimes tab: Ctrl + Shift + R
on Windows/Linux, Cmd + Shift + R
on macOS.
LM Runtimes will auto update to the latest. You can turn this off in settings
A long-requested feature: you can now download models from nested folders in Hugging Face repositories. If your favorite model publisher organizes their models in subfolders, you can now download them directly in LM Studio.
This makes it easy to download models like https://huggingface.co/unsloth/DeepSeek-R1-GGUF. Works for lms get <hugging face url>
as well.
Build 6
Build 5
reasoning_content
setting was not respected when streaming DeepSeek R1 chat completion responsesBuild 4
reasoning_content
in a separate field in chat completion responses (both streaming and non-streaming)
<think>
</think>
tags (like DeepSeek R1)Build 3
Build 2
Build 1
ttl
field in request payload)
lms load --ttl <seconds>