Improved VRAM usage estimation, especially when flash attention is enabled
Build 3
Setting to control whether to open the downloads panel after starting a model download (default: false)
Update CLI (lms) output colors to have better contrast in light mode
Fix a bug where copy buttons would sometimes not appear on conversation code blocks
Build 2
New: Find in Chat (Cmd/Ctrl+F) and Search All Chats (Cmd/Ctrl+Shift+F).
New: Sort chats sidebar by date updated, date created, or token count
Model resources estimation will now work for vision models
Added model resources estimation to the CLI. You can now run lms load --estimate-only <model-name> to preview a model's estimated memory requirements before loading
While using lms chat, you can now use Ctrl^C to interrupt ongoing predictions
Build 1
Additional model quantization files downloaded after the main model will now be properly nested under it
Improved memory usage estimation used for model loading guardrails.
Now memory estimation will take into account selected context lengths.
lms ps --json now reports model generation status and the number of queued prediction requests