Changelog • LM Studio 0.3.14

New: Advanced GPU Controls 🎛️
- On multi-GPU setups, customize how models are offloaded onto your GPUs
  - Enable/disable individual GPUs
  - CUDA-specific features:
    - "Priority order" mode: The system will try to allocate more on GPUs listed first
    - "Limit Model Offload to Dedicated GPU memory" mode: Improved stability and optimized for long context on single GPU setups
- How to open GPU controls:
  - Windows: Ctrl+Shift+H
  - Mac: Cmd+Shift+H
- How to open GPU controls in a pop-out window:
  - Windows: Ctrl+Alt+Shift+H
  - Mac: Cmd+Option+Shift+H
  - Benefit: Manage GPU settings while models are loading
LG AI EXAONE Deep reasoning model support
Improved model loader UI in small window sizes
Improve llama model family tool call reliability through LM Studio SDK and OpenAI compatible streaming API
[SDK] Added support for GBNF grammar when using structured generation
[SDK/RESTful API] Added support for specifying presets in API calls
Improved support for Nemotron models

Build 5

Build 4

[Advanced GPU controls] Allow disabling all GPUs with any engine
[Advanced GPU controls] Fix bug where disabling a GPU would cause incorrect offloading when > 2 gpus
[Advanced GPU controls][CUDA] Improved stability of"Limit Model Offload to Dedicated GPU memory" mode
Added GPU controls logging to "Developer Logs"
Fixed a bug where sometimes editing model config inside the model loader popover does not take effect
Fixed a bug related to renaming state focus on chat cells

Build 3

[Advanced GPU controls] Fixed a bug where intermediate buffers were being allocated on disabled GPUs
Fixed "OpenSquareBracket !== CloseStatement" bug with Nemotron model
Fixed a bug where Nemotron GGUF model metadata was not being read properly
[Windows] Fixed: Make sure the LM Studio.exe executable is also signed. Should help with anti-virus false positives

Build 2

Optimized "Limit Model Offload to Dedicated GPU memory" mode in long context situations on single GPU setups
Speculative decoding draft model now respects GPU controls
[CUDA] Fixed a bug where model would crash with message "Invalid device index"
[Windows ARM] Fixed chat with document sometimes not working

Build 1

New: GPU Controls 🎛️
- On multi-GPU setups, customize how models are offloaded onto your GPUs
  - Enable/disable individual GPUs
  - CUDA-specific features:
    - "Priority order" mode: The system will try to allocate more on GPUs listed first
    - "Limit Model Offload to Dedicated GPU memory" mode: The system will limit offload of model weights to dedicated GPU memory and RAM only. Context may still use shared memory
- How to open GPU controls:
  - Windows: Ctrl+Shift+H
  - Mac: Cmd+Shift+H
- How to open GPU controls in a pop-out window:
  - Windows: Ctrl+Alt+Shift+H
  - Mac: Cmd+Option+Shift+H
  - Benefit: Manage GPU settings while models are loading
LG AI EXAONE Deep reasoning model support
Improved model loader UI in small window sizes
Improve llama model family tool call reliability through LM Studio SDK and OpenAI compatible streaming API
[SDK] Added support for GBNF grammar when using structured generation
[SDK/RESTful API] Added support for specifying presets
Fixed a bug where sometimes the last couple fragments of a prediction are lost