Skip to main content

LM Studio 0.3.2

By LM Studio Team

LM Studio 0.3.2 Release Notes​

What's new in 0.3.2​

  • New: Ability to pin models to the top is back!
    • Right-click on a model in My Models and select "Pin to top" to pin it to the top of the list.
  • Chat migration dialog now appears in the chat side bar.
    • You can migrate your chats from pre-0.3.0 versions from there.
    • As of v0.3.1, system prompts now migrated as well.
    • Your old chats are NOT deleted.
  • Don't show a badge with a number on the downloads button if there are no downloads
  • Added a button to collapse the FAQ sidebar in the Discover tab
  • Reduced default context size from 8K to 4K tokens to alleviate Out of Memory issues
    • You can still configure any context size you want in the model load settings
  • Added a warning next to the Flash Attention in model load settings
    • Flash Attention is experimental and may not be suitable for all models
  • Updated the bundled llama.cpp engine to 3246fe84d78c8ccccd4291132809236ef477e9ea (Aug 27)

Bug Fixes​

  • Bug fix: My Models model size aggregation was incorrect when you had multi-part model files
  • Bug fix: (Linux) RAG would fail due to missing bundled embedding model (fixed)
  • Bug fix: Flash Attention - KV cache quantization is back to FP16 by default
    • In 0.3.0, the K and V were both set to Q8, which introduced large latencies in some cases
      • You might notice an increase in memory consumption when FA is ON compared with 0.3.1, but on par with 0.2.31
  • Bug fix: On some setups app would hang at start up (fix + mitigation)
  • Bug fix: Fixed an issue where the downloads panel was dragged over the app top bar
  • Bug fix: Fixed typos in built-in code snippets (server tab)

LM Studio 0.3.1

By LM Studio Team

LM Studio 0.3.1 Release Notes​

Chat migration improvement from 0.2.31 to 0.3.0+​

In this version (0.3.1) we've fixed the chat migration to bring in your system prompts from 0.2.31 chats.

If you've already migrated your chats in 0.3.0, you can run the 0.3.1 migrator again. It'll create another folder and will not override the previously migrated chats.

How to migrate chats​

After updating to 0.3.1, head to Settings and click the "Migrate Chats" button. Your older chats will be copied into a new folder within the Chat tab. They will not be deleted.

What's new in 0.3.1​

  • Chat migration improvement: Pre-0.3.0 chat migration now includes system prompts from 0.2.31
  • You can now paste (ctrl / cmd + V) images into the chat input box when a vision-enabled model is loaded.
  • Model load config: added an indication for the maximum context the model supports + button to the context length to it
  • Patched Gemma 2 prompt template to not error out when you provide a system prompt. Instead, the system prompt will be added as-is at the top of the context.
    • You can override this behavior by providing your own prompt template in the My Models screen.
  • More descriptive errors when a model crashes during operation
    • There may still be cases where the error doesn't have much information, please let us know if you're running into those cases.
  • Updated the bundled llama.cpp engine to 3ba780e2a8f0ffe13f571b27f0bbf2ca5a199efc (Aug 23)

Bug Fixes​

  • Bug fix: Vision-enabled models would crash on operation (fixed)
  • Bug fix: Search bar in the Discover page doesn't show (fixed)
  • Bug fix: "Model Card" button text color in Classic theme is dark (fixed)
  • Bug fix: LM Studio deeplinks from Hugging Face and elsewhere don't work (fixed)

For more, join our Discord community: https://discord.gg/aPQfnNkxGC

If you want to use LM Studio at your organization, get in touch! [email protected]

LM Studio 0.3.0

By LM Studio Team

We're incredibly excited to finally share LM Studio 0.3.0 πŸ₯³.

Since its inception, LM Studio packaged together a few elements for making the most out of local LLMs when you run them on your computer:

  1. A desktop application that runs entirely offline and has no telemetry
  2. A familiar chat interface
  3. Search & download functionality (via Hugging Face πŸ€—)
  4. A local server that can listen on OpenAI-like endpoints
  5. Systems for managing local models and configurations

With this update, we've improved upon, deepened, and simplified many of these aspects through what we've learned from over a year of running local LLMs.

Download LM Studio for Mac, Windows (x86 / ARM), or Linux (x86) from https://lmstudio.ai.

What's new in LM Studio 0.3.0​

Chat with your documents​

LM Studio 0.3.0 comes with built-in functionality to provide a set of document to an LLM and ask questions about them. If the document is short enough (i.e., if it fits in the model's "context"), LM Studio will add the file contents to the conversation in full. This is particularly useful for models that support long context such as Meta's Llama 3.1 and Mistral Nemo.

If the document is very long, LM Studio will opt into using "Retrieval Augmented Generation", frequently referred to as "RAG". RAG means attempting to fish out relevant bits of a very long document (or several documents) and providing them to the model for reference. This technique sometimes works really well, but sometimes it requires some tuning and experimentation.

Tip for successful RAG: provide as much context in your query as possible. Mention terms, ideas, and words you expect to be in the relevant source material. This will often increase the chance the system will provide useful context to the LLM. As always, experimentation is the best way to find what works best.

OpenAI-like Structured Output API​

OpenAI recently announced a JSON-schema based API that can result in reliable JSON outputs. LM Studio 0.3.0 supports this with any local model that can run in LM Studio! We've included a code snippet for doing this right inside the app. Look for it in the Developer page, on the right-hand pane.

UI themes​

LM Studio first shipped in May 2024 in dark retro theme, complete with Comic Sans sprinkled for good measure. The OG dark theme held strong, and LM Studio 0.3.0 introduces 3 additional themes: Dark, Light, Sepia. Choose "System" to automatically switch between Dark and Light, depending on your system's dark mode settings.

Automatic load parameters, but also full customizability​

Some of us are well versed in the nitty gritty of LLM load and inference parameters. But many of us, understandably, can't be bothered. LM Studio 0.3.0 auto-configures everything based on the hardware you are running it on. If you want to pop open the hood and configure things yourself, LM Studio 0.3.0 has even more customizable options.

Pro tip: head to the My Models page and look for the gear icon next to each model. You can set per-model defaults that will be used anywhere in the app.

Serve on the network​

If you head to the server page you'll see a new toggle that says "Serve on Network". Turning this on will open up the server to requests outside of 'localhost'. This means you could use LM Studio server from other devices on the network. Combined with the ability to load and serve multiple LLMs simultaneously, this opens up a lot of new use cases.

Folders to organize chats​

Useful if you're working on multiple projects at once. You can even nest folders inside folders!

Multiple generations for each chat​

LM Studio had a "regenerate" feature for a while. Now clicking "regenerate" keeps previous message generations and you can easily page between them using a familiar arrow right / arrow left interface.

How to migrate your chats from LM Studio 0.2.31 to 0.3.0​

To support features like multi-version regenerations we introduced a new data structure under the hood. You can migrate your pre-0.3.0 chats by going to Settings and clicking on "Migrate Chats". This will make a copy, and will not delete any old files.

Full list of updates​

Completely Refreshed UI:​

  • Includes themes, spellcheck, and corrections.
  • Built on top of lmstudio.js (TypeScript SDK).
  • New chat settings sidebar design.

Basic RAG (Retrieve & Generate):​

  • Drag and drop a PDF, .txt file, or other files directly into the chat window.
  • Max file input size for RAG (PDF / .docx) increased to 30MB.
  • RAG accepts any file type, but non-.pdf/.docx files are read as plain text.

Automatic GPU Detection + Offload:​

  • Distributes tasks between GPU and CPU based on your machine’s capabilities.
  • Can still be overridden manually.

Browse & Download "LM Runtimes":​

  • Download the latest LLM engines (e.g., llama.cpp) without updating the whole app.
  • Available options: ROCm, AVX-only, with more to come.

Automatic Prompt Template:​

  • LM Studio reads the metadata from the model file and applies prompt formatting automatically.

New Developer Mode:​

  • View model load logs, configure multiple LLMs for serving, and share an LLM over the network (not just localhost).
  • Supports OpenAI-like Structured Outputs with json_schema.

Folder Organization for Chats:​

  • Create folders to organize chats.

Prompt Processing Progress Indicator:​

  • Displays progress % for prompt processing.

Enhanced Model Loader:​

  • Easily configure load parameters (context, GPU offload) before model load.
  • Ability to set defaults for every configurable parameter for a given model file.
  • Improved model loader UI with a checkbox to control parameters.

Support for Embedding Models:​

  • Load and serve embedding models.
  • Parallelization support for multiple models.

Vision-Enabled Models:​

  • Image attachments in chats and API

Show Conversation Token Count:​

  • Displays the current tokens and total context.

Prompt Template Customization:​

  • Ability to override prompt templates.
  • Edit the "Jinja" template or manually provide prefixes/suffixes.
  • Prebuilt chat templates (ChatML, Alpaca, blank, etc.).

Conversation Management:​

  • Add conversation notes.
  • Clone and branch a chat on a specific message.

Customizable Chat Settings:​

  • Choose chat style and font size.
  • Remember settings for each model on load.

Initial Translations:​

Subtitles for Config Parameters:​

  • Descriptive subtitles for every configuration parameter.

For more, join our Discord community: https://discord.gg/aPQfnNkxGC

If you want to use LM Studio at your organization, get in touch! [email protected]

Run Llama 3.1 in LM Studio

LM Studio Team

Meta's newest Llama: Llama 3.1 is here!​

TLDR: Relatively small, fast, and supremely capable open-weights model you can run on your laptop.

MetaAI's newest generation of their Llama models, Llama 3.1, is now available.

How to download and run Llama 3.1 locally in your LM Studio​

  • Install LM Studio 0.2.28 from https://lmstudio.ai
  • Search for Meta-Llama-3.1-8B-Instruct-GGUF or use this direct download link.
  • When the download is complete, go ahead and load the model.
  • That's it! Now you're running Llama 3.1 locally.

If you're a developer, you can also use Llama 3.1 via LM Studio's built-in OpenAI-like server. See the docs for more details.

What's new with Llama 3.1?​

  • New longer context window supporting up to 128k tokens.
  • Now available in 3 different sizes, including a new 405B parameter flagship model, and upgraded 70B & 8B versions.
  • Available across 8 different languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Competitive with other leading, closed-source foundational models, including GPT-4, GPT-4o, and Claude 3.5 Sonnet.

Llama 3.1 comes in three sizes: 8B, 70B, and 405B.

NameDescription
8BRelatively small, fast model, and supremely capable LLM you can run on your laptop.
70BMedium-large variant that enables diverse use cases that may require more complex reasoning.
405BThe most extensively trained open LLM to date. Requires an extremely capable setup to run

To use MetaAI's Llama 3.1 in LM Studio, download or update to LM Studio 0.2.28 or later.

Download LM Studio from the LM Studio website.

Introducing `lms` - LM Studio's companion cli tool

By LM Studio Team

Today, alongside LM Studio 0.2.22, we're releasing the first version of lms β€” LM Studio's companion cli tool.

With lms you can load/unload models, start/stop the API server, and inspect raw LLM input (not just output). It's developed on github and we're welcoming issues and PRs from the community.

lms ships with LM Studio and lives in LM Studio's working directory, under ~/.cache/lm-studio/bin/. When you update LM Studio, it also updates your lms version. If you're a developer, you can also build lms from source.

Bootstrap lms on your system​

You need to run LM Studio at least once before you can use lms.

Afterwards, open your terminal and run one of these commands, depending on your operating system:

# Mac / Linux:
~/.cache/lm-studio/bin/lms bootstrap

# Windows:
cmd /c %USERPROFILE%/.cache/lm-studio/bin/lms.exe bootstrap

Afterwards, open a new terminal window and run lms.

This is the current output you will get:

$ lms
__ __ ___ ______ ___ _______ ____
/ / / |/ / / __/ /___ _____/ (_)__ / ___/ / / _/
/ /__/ /|_/ / _\ \/ __/ // / _ / / _ \ / /__/ /___/ /
/____/_/ /_/ /___/\__/\_,_/\_,_/_/\___/ \___/____/___/

lms - LM Studio CLI - v0.2.22
GitHub: https://github.com/lmstudio-ai/lmstudio-cli

Usage
lms <subcommand>

where <subcommand> can be one of:

- status - Prints the status of LM Studio
- server - Commands for managing the local server
- ls - List all downloaded models
- ps - List all loaded models
- load - Load a model
- unload - Unload a model
- create - Create a new project with scaffolding
- log - Log operations. Currently only supports streaming logs from LM Studio via `lms log stream`
- version - Prints the version of the CLI
- bootstrap - Bootstrap the CLI

For more help, try running `lms <subcommand> --help`

lms is MIT Licensed and it is developed in this repository on GitHub:

https://github.com/lmstudio-ai/lms

Use lms to automate and debug your workflows​

  • Start and stop the local server​

lms server start
lms server stop
  • List the local models on the machine​

lms ls

This will reflect the current LM Studio models directory, which you set in πŸ“‚ My Models tab in the app.

  • List the currently loaded models​

lms ps
  • Load a model (with options)​

lms load [--gpu=max|auto|0.0-1.0] [--context-length=1-N]

--gpu=1.0 means 'attempt to offload 100% of the computation to the GPU'.

  • Optionally, assign an identifier to your local LLM:
lms load TheBloke/phi-2-GGUF --identifier="gpt-4-turbo"

This is useful if you want to keep the model identifier consistent.

lms unload [--all]

Debug your prompting with lms log stream​

lms log stream allows you to inspect the exact input string that goes to the model.

This is particularly useful for debugging prompt template issues and other unexpected LLM behaviors.

$ lms log stream
I Streaming logs from LM Studio

timestamp: 5/2/2024, 9:49:47 PM
type: llm.prediction.input
modelIdentifier: TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF/tinyllama-1.1b-1t-openorca.Q2_K.gguf
modelPath: TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF/tinyllama-1.1b-1t-openorca.Q2_K.gguf
input: "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Hello, what's your name?
### Response:
"

lmstudio.js​

lms uses lmstudio.js to interact with LM Studio.

You can build your own programs that can do what lms does and much more.

lmstudio.js is in pre-release public alpha. Follow along on GitHub: https://github.com/lmstudio-ai/lmstudio.js.


Discuss all things lms and lmstudio.js in the new #dev-chat channel on the LM Studio Discord Server.

Download LM Studio for Mac / Windows / Linux from https://lmstudio.ai.

LM Studio 0.2.22 AMD ROCm - Technology Preview is available in https://lmstudio.ai/rocm

LM Studio on Twitter: https://twitter.com/LMStudioAI


Use Llama 3 in LM Studio

LM Studio Team

Llama 3 by MetaAI​

MetaAI released the next generation of their Llama models, Llama 3.

You can run Llama 3 in LM Studio, either using a chat interface or via a local LLM API server.

Llama 3 comes in two sizes: 8B and 70B and in two different variants: base and instruct fine-tuned.

NameVariantHow to use in LM Studio
Meta-Llama-3-8BBaseSwitch to the Blank Preset in LM Studio and utilize prompt engineering techniques such as 'few-shot prompting' and 'in-context learning'
Meta-Llama-3-8B-InstructInstructUse the Llama 3 Preset. This variant is expected to be able to follow instructions and be conversational.

Download and run Llama 3​

  • Search for lmstudio-community/llama-3 using the in-app search page.
  • Choose a download option from the search results.

To use MetaAI's Llama 3 in LM Studio, download or update to LM Studio 0.2.20 or later.

Download LM Studio from the LM Studio website.