February 18, 2025

LM Studio 0.3.10

0.3.10 - Release Notes

What's New

This version introduces 🔮 Speculative Decoding!

Speculative Decoding is a technique that can speed up token generation by up to 1.5x-3x in some cases.

Supported in LM Studio for both llama.cpp and MLX, in the chat UI and server API.

Change Log

Build 6

  • Fixed an issue where first message of tool streaming response did not include "assistant" role
  • Improved error message when trying to use a draft model with a different engine.
  • Fixed a bug where speculative decoding visualization does not work when continuing a message.

Build 5

  • Update MLX to enable Speculative Decoding on M1/M2 Macs (in addition to M3/M4)
  • Fixed an issue on Linux and macOS where child processes may not be cleaned up after app exit
  • [Mac][MLX] Fixed a bug where selecting a draft model during prediction would cause the model to crash

Build 4

  • New: Chat Appearance > "Expand chat container to window width" option
    • This option allows you to expand the chat container to the full width of the window
  • Fixed RAG not working due to "path must be a string"
  • Bug fix: conversations would sometimes be named 'Untitled' regardless of auto naming settings

Build 3

  • The beginning and the end tags of reasoning blocks are now configurable in My Models page
    • You can use this feature to enable thinking UI for models that don't use <think> and </think> tags to denote reasoning sections
  • Fixed a bug where structured output is not configurable in My Models page
  • Optimized engine indexing for reduced start-up delay
  • Option to re-run engine compatibility checks for specific engines from the Runtimes UI
  • [Mac] Improved reliability of MLX runtime installation, and improved detection of broken MLX runtimes

Build 2

  • Fixed a case where the message about updating the engine to use speculative decoding is not displayed
  • Fixed a bug where we sometimes show "no compatible draft models" despite we are still identifying them
  • [Linux] Fixed 'exit code 133' bug (reference: https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/285)

Build 1

  • New: 🔮 Speculative Decoding! (for llama.cpp and MLX models)
    • Use smaller "draft model" to achieve generation speed up by up to 1.5x-3x for larger models.
    • Works best when combining very small draft model + large main model. The speedup comes without any degradation in quality.
    • Your mileage may vary. Experiment with different draft models easily to find what works best.
    • Works in both chat UI and server API
    • Use the new "Visualize accepted draft tokens" feature to watch speculative decoding in action.
      • Turn on in chat sidebar.
  • New: Runtime (cmd/ctrl + shift + R) page UI
  • Auto update runtimes only on app start up
  • Fixed a bug where multiple images sent to the model would not be recognized