Documentation
Running LLMs Locally
Presets
API
User Interface
Advanced
Running LLMs Locally
Presets
API
User Interface
Advanced
API Changelog
OpenAI-like REST API now supports the tool_choice
parameter:
{ "tool_choice": "auto" // or "none", "required" }
"tool_choice": "none"
— Model will not call tools"tool_choice": "auto"
— Model decides"tool_choice": "required"
— Model must call tools (llama.cpp only)Chunked responses now set "finish_reason": "tool_calls"
when appropriate.
RESTful API and SDKs support specifying presets in requests.
(example needed)
Enable speculative decoding in API requests with "draft_model"
:
{ "model": "deepseek-r1-distill-qwen-7b", "draft_model": "deepseek-r1-distill-qwen-0.5b", "messages": [ ... ] }
Responses now include a stats
object for speculative decoding:
"stats": { "tokens_per_second": ..., "draft_model": "...", "total_draft_tokens_count": ..., "accepted_draft_tokens_count": ..., "rejected_draft_tokens_count": ..., "ignored_draft_tokens_count": ... }
Set a TTL (in seconds) for models loaded via API requests (docs article: Idle TTL and Auto-Evict)
curl http://localhost:1234/api/v0/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-qwen-7b", "messages": [ ... ] + "ttl": 300, }'
With lms
:
lms load --ttl <seconds>
reasoning_content
in Chat Completion responsesFor DeepSeek R1 models, get reasoning content in a separate field. See more here.
Turn this on in App Settings > Developer.
Use any LLM that supports Tool Use and Function Calling through the OpenAI-like API.
Docs: Tool Use and Function Calling.
lms get
: download models from the terminalYou can now download models directly from the terminal using a keyword
lms get deepseek-r1
or a full Hugging Face URL
lms get <hugging face url>
To filter for MLX models only, add --mlx
to the command.
lms get deepseek-r1 --mlx
On this page
Improved Tool Use API Support
[API/SDK] Preset Support
Speculative Decoding API
Idle TTL and Auto Evict
Separate reasoning_content in Chat Completion responses
Tool and Function Calling API
Introducing lms get: download models from the terminal