Documentation
Core
LM Studio REST API
Core
LM Studio REST API
/v1/responses and variant listingPOST /v1/responses.
previous_response_id.reasoning.effort for openai/gpt‑oss‑20b.stream: true.lms ls --variants lists all variants for multi‑variant models.lms load --estimate-only <model> prints estimated GPU and total memory before loading. Honors --context-length and --gpu, and uses an improved estimator that now accounts for flash attention and vision models.lms chat: press Ctrl+C to interrupt an ongoing prediction.lms ps --json now reports each model's generation status and the number of queued prediction requests.lms log stream now supports multiple sources and filters.
--source server streams HTTP server logs (startup, endpoints, status)--source model --filter input,output streams formatted user input and model output--json for machine‑readable logs; --stats adds tokens/sec and related metrics (model source)/v1/embeddings endpoint ‡.gpt‑oss on POST /v1/chat/completions, reasoning content moves out of message.content and into choices.message.reasoning (non‑streaming) and choices.delta.reasoning (streaming), aligning with o3‑mini ‡.POST /v1/chat/completions (e.g., "reading 'properties'") and non‑streaming tool‑call failures ‡.stream_options object on OpenAI‑compatible endpoints. Setting stream_options.include_usage to true returns prompt and completion token usage during streaming ‡.response_format.type field now accepts "text" in chat‑completion requests ‡.$defs in tool definitions were stripped ‡.parameters object and preventing hangs when an MCP server reloads ‡.GET /models/api/v0) now returns a capabilities array in the GET /models response. Each model lists its supported capabilities (e.g. "tool_use") ‡ so clients can programmatically discover tool‑enabled models.OpenAI-like REST API now supports the tool_choice parameter:
{ "tool_choice": "auto" // or "none", "required" }
"tool_choice": "none" — Model will not call tools"tool_choice": "auto" — Model decides"tool_choice": "required" — Model must call tools (llama.cpp only)Chunked responses now set "finish_reason": "tool_calls" when appropriate.
RESTful API and SDKs support specifying presets in requests.
(example needed)
Enable speculative decoding in API requests with "draft_model":
{ "model": "deepseek-r1-distill-qwen-7b", "draft_model": "deepseek-r1-distill-qwen-0.5b", "messages": [ ... ] }
Responses now include a stats object for speculative decoding:
"stats": { "tokens_per_second": ..., "draft_model": "...", "total_draft_tokens_count": ..., "accepted_draft_tokens_count": ..., "rejected_draft_tokens_count": ..., "ignored_draft_tokens_count": ... }
Set a TTL (in seconds) for models loaded via API requests (docs article: Idle TTL and Auto-Evict)
curl http://localhost:1234/api/v0/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-qwen-7b", "messages": [ ... ] + "ttl": 300, }'
With lms:
lms load --ttl <seconds>
reasoning_content in Chat Completion responsesFor DeepSeek R1 models, get reasoning content in a separate field. See more here.
Turn this on in App Settings > Developer.
Use any LLM that supports Tool Use and Function Calling through the OpenAI-like API.
Docs: Tool Use and Function Calling.
lms get: download models from the terminalYou can now download models directly from the terminal using a keyword
lms get deepseek-r1
or a full Hugging Face URL
lms get <hugging face url>
To filter for MLX models only, add --mlx to the command.
lms get deepseek-r1 --mlx
This page's source is available on GitHub
On this page
OpenAI /v1/responses and variant listing
CLI: model resource estimates, status, and interrupts
CLI log streaming: server + model
New model support (API)
Seed‑OSS tool‑calling and template fixes
Reasoning content and tool‑calling reliability
Bug fixes for streaming and tool calls
Streaming options and tool‑calling improvements
Tool‑calling reliability and token‑count updates
Model capabilities in GET /models
Improved Tool Use API Support
[API/SDK] Preset Support
Speculative Decoding API
Idle TTL and Auto Evict
Separate reasoning_content in Chat Completion responses
Tool and Function Calling API
Introducing lms get: download models from the terminal