Documentation
Running LLMs Locally
Plugins
Model Context Protocol (MCP)
Models (model.yaml)
API
User Interface
Running LLMs Locally
Plugins
Model Context Protocol (MCP)
Models (model.yaml)
API
User Interface
Introduction to model.yaml
Draft
model.yaml
describes a model and all of its variants in a single portable file. Models in LM Studio's model catalog are all implemented using model.yaml.
This allows abstracting away the underlying format (GGUF, MLX, etc) and presenting a single entry point for a given model. Furthermore, the model.yaml file supports baking in additional metadata, load and inference options, and even custom logic (e.g. enable/disable thinking).
You can clone existing model.yaml files on the LM Studio Hub and even publish your own!
model
The canonical identifier in the form publisher/model
.
model: qwen/qwen3-8b
base
Points to the "concrete" model files or other virtual models. Each entry uses a unique key
and one or more sources
from which the file can be fetched.
The snippet below demonstrates a case where the model (qwen/qwen3-8b
) can resolve to one of 3 different concrete models.
model: qwen/qwen3-8b base: - key: lmstudio-community/qwen3-8b-gguf sources: - type: huggingface user: lmstudio-community repo: Qwen3-8B-GGUF - key: lmstudio-community/qwen3-8b-mlx-4bit sources: - type: huggingface user: lmstudio-community repo: Qwen3-8B-MLX-4bit - key: lmstudio-community/qwen3-8b-mlx-8bit sources: - type: huggingface user: lmstudio-community repo: Qwen3-8B-MLX-8bit
Concrete model files refer to the actual weights.
metadataOverrides
Overrides the base model's metadata. This is useful for presentation purposes, for example in LM Studio's model catalog or in app model search. It is not used for any functional changes to the model.
metadataOverrides: domain: llm architectures: - qwen3 compatibilityTypes: - gguf - safetensors paramsStrings: - 8B minMemoryUsageBytes: 4600000000 contextLengths: - 40960 vision: false reasoning: true trainedForToolUse: true
config
Use this to "bake in" default runtime settings (such as sampling parameters) and even load time options. This works similarly to Per Model Defaults.
operation:
inference time parametersload:
load time parametersconfig: operation: fields: - key: llm.prediction.topKSampling value: 20 - key: llm.prediction.temperature value: 0.7 load: fields: - key: llm.load.contextLength value: 42690
customFields
Define model-specific custom fields.
customFields: - key: enableThinking displayName: Enable Thinking description: Controls whether the model will think before replying type: boolean defaultValue: true effects: - type: setJinjaVariable variable: enable_thinking
In order for the above example to work, the jinja template needs to have a variable named enable_thinking
.
Taken from https://lmstudio.ai/models/qwen/qwen3-8b
# model.yaml is an open standard for defining cross-platform, composable AI models # Learn more at https://modelyaml.org model: qwen/qwen3-8b base: - key: lmstudio-community/qwen3-8b-gguf sources: - type: huggingface user: lmstudio-community repo: Qwen3-8B-GGUF - key: lmstudio-community/qwen3-8b-mlx-4bit sources: - type: huggingface user: lmstudio-community repo: Qwen3-8B-MLX-4bit - key: lmstudio-community/qwen3-8b-mlx-8bit sources: - type: huggingface user: lmstudio-community repo: Qwen3-8B-MLX-8bit metadataOverrides: domain: llm architectures: - qwen3 compatibilityTypes: - gguf - safetensors paramsStrings: - 8B minMemoryUsageBytes: 4600000000 contextLengths: - 40960 vision: false reasoning: true trainedForToolUse: true config: operation: fields: - key: llm.prediction.topKSampling value: 20 - key: llm.prediction.minPSampling value: checked: true value: 0 customFields: - key: enableThinking displayName: Enable Thinking description: Controls whether the model will think before replying type: boolean defaultValue: true effects: - type: setJinjaVariable variable: enable_thinking
The GitHub specification contains further details and the latest schema.
On this page
Core fields
model
base
metadataOverrides
config
customFields
Complete example