Release Notes

New: Ability to pin models to the top is back!
- Right-click on a model in My Models and select "Pin to top" to pin it to the top of the list.
Chat migration dialog now appears in the chat side bar.
- You can migrate your chats from pre-0.3.0 versions from there.
- As of v0.3.1, system prompts now migrated as well.
- Your old chats are NOT deleted.
Don't show a badge with a number on the downloads button if there are no downloads
Added a button to collapse the FAQ sidebar in the Discover tab
Reduced default context size from 8K to 4K tokens to alleviate Out of Memory issues
- You can still configure any context size you want in the model load settings
Added a warning next to the Flash Attention in model load settings
- Flash Attention is experimental and may not be suitable for all models
Updated the bundled llama.cpp engine to 3246fe84d78c8ccccd4291132809236ef477e9ea (Aug 27)

Bug fix: My Models model size aggregation was incorrect when you had multi-part model files
Bug fix: (Linux) RAG would fail due to missing bundled embedding model (fixed)
Bug fix: Flash Attention - KV cache quantization is back to FP16 by default
- In 0.3.0, the K and V were both set to Q8, which introduced large latencies in some cases
  - You might notice an increase in memory consumption when FA is ON compared with 0.3.1, but on par with 0.2.31
Bug fix: On some setups app would hang at start up (fix + mitigation)
Bug fix: Fixed an issue where the downloads panel was dragged over the app top bar
Bug fix: Fixed typos in built-in code snippets (server tab)

Download the latest LM Studio from https://lmstudio.ai/download.
New to LM Studio? Head over to the documentation: Getting Started with LM Studio.
For discussions and community, join our Discord server: https://discord.gg/aPQfnNkxGC
If you want to use LM Studio at your organization at work, get in touch: LM Studio @ Work