Changelog • LM Studio 0.3.10

This version introduces 🔮 Speculative Decoding!

Speculative Decoding is a technique that can speed up token generation by up to 1.5x-3x in some cases.

Supported in LM Studio for both llama.cpp and MLX, in the chat UI and server API.

Build 6

Fixed an issue where first message of tool streaming response did not include "assistant" role
Improved error message when trying to use a draft model with a different engine.
Fixed a bug where speculative decoding visualization does not work when continuing a message.

Build 5

Update MLX to enable Speculative Decoding on M1/M2 Macs (in addition to M3/M4)
Fixed an issue on Linux and macOS where child processes may not be cleaned up after app exit
[Mac][MLX] Fixed a bug where selecting a draft model during prediction would cause the model to crash

Build 4

New: Chat Appearance > "Expand chat container to window width" option
- This option allows you to expand the chat container to the full width of the window
Fixed RAG not working due to "path must be a string"
Bug fix: conversations would sometimes be named 'Untitled' regardless of auto naming settings

Build 3

The beginning and the end tags of reasoning blocks are now configurable in My Models page
- You can use this feature to enable thinking UI for models that don't use <think> and </think> tags to denote reasoning sections
Fixed a bug where structured output is not configurable in My Models page
Optimized engine indexing for reduced start-up delay
Option to re-run engine compatibility checks for specific engines from the Runtimes UI
[Mac] Improved reliability of MLX runtime installation, and improved detection of broken MLX runtimes

Build 2

Fixed a case where the message about updating the engine to use speculative decoding is not displayed
Fixed a bug where we sometimes show "no compatible draft models" despite we are still identifying them
[Linux] Fixed 'exit code 133' bug (reference: https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/285)

Build 1