Documentation
Getting Started
Integrations
Model Context Protocol (MCP)
Models (model.yaml)
API
User Interface
Getting Started
Integrations
Model Context Protocol (MCP)
Models (model.yaml)
API
User Interface
API Changelog
LM Studio API Changelog - new features and updates
lms log stream
now supports multiple sources and filters.
--source server
streams HTTP server logs (startup, endpoints, status)--source model --filter input,output
streams formatted user input and model output--json
for machine‑readable logs; --stats
adds tokens/sec and related metrics (model source)/v1/embeddings
endpoint ‡.gpt‑oss
on POST /v1/chat/completions
, reasoning content moves out of message.content
and into choices.message.reasoning
(non‑streaming) and choices.delta.reasoning
(streaming), aligning with o3‑mini
‡.POST /v1/chat/completions
(e.g., "reading 'properties'") and non‑streaming tool‑call failures ‡.stream_options
object on OpenAI‑compatible endpoints. Setting stream_options.include_usage
to true
returns prompt and completion token usage during streaming ‡.response_format.type
field now accepts "text"
in chat‑completion requests ‡.$defs
in tool definitions were stripped ‡.parameters
object and preventing hangs when an MCP server reloads ‡.GET /models
/api/v0
) now returns a capabilities
array in the GET /models
response. Each model lists its supported capabilities (e.g. "tool_use"
) ‡ so clients can programmatically discover tool‑enabled models.OpenAI-like REST API now supports the tool_choice
parameter:
{ "tool_choice": "auto" // or "none", "required" }
"tool_choice": "none"
— Model will not call tools"tool_choice": "auto"
— Model decides"tool_choice": "required"
— Model must call tools (llama.cpp only)Chunked responses now set "finish_reason": "tool_calls"
when appropriate.
RESTful API and SDKs support specifying presets in requests.
(example needed)
Enable speculative decoding in API requests with "draft_model"
:
{ "model": "deepseek-r1-distill-qwen-7b", "draft_model": "deepseek-r1-distill-qwen-0.5b", "messages": [ ... ] }
Responses now include a stats
object for speculative decoding:
"stats": { "tokens_per_second": ..., "draft_model": "...", "total_draft_tokens_count": ..., "accepted_draft_tokens_count": ..., "rejected_draft_tokens_count": ..., "ignored_draft_tokens_count": ... }
Set a TTL (in seconds) for models loaded via API requests (docs article: Idle TTL and Auto-Evict)
curl http://localhost:1234/api/v0/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-qwen-7b", "messages": [ ... ] + "ttl": 300, }'
With lms
:
lms load --ttl <seconds>
reasoning_content
in Chat Completion responsesFor DeepSeek R1 models, get reasoning content in a separate field. See more here.
Turn this on in App Settings > Developer.
Use any LLM that supports Tool Use and Function Calling through the OpenAI-like API.
Docs: Tool Use and Function Calling.
lms get
: download models from the terminalYou can now download models directly from the terminal using a keyword
lms get deepseek-r1
or a full Hugging Face URL
lms get <hugging face url>
To filter for MLX models only, add --mlx
to the command.
lms get deepseek-r1 --mlx
This page's source is available on GitHub
On this page
CLI log streaming: server + model
New model support (API)
Seed‑OSS tool‑calling and template fixes
Reasoning content and tool‑calling reliability
Bug fixes for streaming and tool calls
Streaming options and tool‑calling improvements
Tool‑calling reliability and token‑count updates
Model capabilities in GET /models
Improved Tool Use API Support
[API/SDK] Preset Support
Speculative Decoding API
Idle TTL and Auto Evict
Separate reasoning_content in Chat Completion responses
Tool and Function Calling API
Introducing lms get: download models from the terminal