Documentation
Core
LM Studio REST API
Core
LM Studio REST API
The REST API includes enhanced stats such as Token / Second and Time To First Token (TTFT), as well as rich information about models such as loaded vs unloaded, max context, quantization, and more.
LM Studio now has its own REST API, in addition to OpenAI compatibility mode (learn more).
The REST API includes enhanced stats such as Token / Second and Time To First Token (TTFT), as well as rich information about models such as loaded vs unloaded, max context, quantization, and more.
GET /api/v0/models - List available modelsGET /api/v0/models/{model} - Get info about a specific modelPOST /api/v0/chat/completions - Chat Completions (messages → assistant response)POST /api/v0/completions - Text Completions (prompt → completion)POST /api/v0/embeddings - Text Embeddings (text → embedding)To start the server, run the following command:
lms server start
You can run LM Studio as a service and get the server to auto-start on boot without launching the GUI. Learn about Headless Mode.
GET /api/v0/modelsList all loaded and downloaded models
Example request
curl http://localhost:1234/api/v0/models
Response format
{ "object": "list", "data": [ { "id": "qwen2-vl-7b-instruct", "object": "model", "type": "vlm", "publisher": "mlx-community", "arch": "qwen2_vl", "compatibility_type": "mlx", "quantization": "4bit", "state": "not-loaded", "max_context_length": 32768 }, { "id": "meta-llama-3.1-8b-instruct", "object": "model", "type": "llm", "publisher": "lmstudio-community", "arch": "llama", "compatibility_type": "gguf", "quantization": "Q4_K_M", "state": "not-loaded", "max_context_length": 131072 }, { "id": "text-embedding-nomic-embed-text-v1.5", "object": "model", "type": "embeddings", "publisher": "nomic-ai", "arch": "nomic-bert", "compatibility_type": "gguf", "quantization": "Q4_0", "state": "not-loaded", "max_context_length": 2048 } ] }
GET /api/v0/models/{model}Get info about one specific model
Example request
curl http://localhost:1234/api/v0/models/qwen2-vl-7b-instruct
Response format
{ "id": "qwen2-vl-7b-instruct", "object": "model", "type": "vlm", "publisher": "mlx-community", "arch": "qwen2_vl", "compatibility_type": "mlx", "quantization": "4bit", "state": "not-loaded", "max_context_length": 32768 }
POST /api/v0/chat/completionsChat Completions API. You provide a messages array and receive the next assistant response in the chat.
Example request
curl http://localhost:1234/api/v0/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "granite-3.0-2b-instruct", "messages": [ { "role": "system", "content": "Always answer in rhymes." }, { "role": "user", "content": "Introduce yourself." } ], "temperature": 0.7, "max_tokens": -1, "stream": false }'
Response format
{ "id": "chatcmpl-i3gkjwthhw96whukek9tz", "object": "chat.completion", "created": 1731990317, "model": "granite-3.0-2b-instruct", "choices": [ { "index": 0, "logprobs": null, "finish_reason": "stop", "message": { "role": "assistant", "content": "Greetings, I'm a helpful AI, here to assist,\nIn providing answers, with no distress.\nI'll keep it short and sweet, in rhyme you'll find,\nA friendly companion, all day long you'll bind." } } ], "usage": { "prompt_tokens": 24, "completion_tokens": 53, "total_tokens": 77 }, "stats": { "tokens_per_second": 51.43709529007664, "time_to_first_token": 0.111, "generation_time": 0.954, "stop_reason": "eosFound" }, "model_info": { "arch": "granite", "quant": "Q4_K_M", "format": "gguf", "context_length": 4096 }, "runtime": { "name": "llama.cpp-mac-arm64-apple-metal-advsimd", "version": "1.3.0", "supported_formats": ["gguf"] } }
POST /api/v0/completionsText Completions API. You provide a prompt and receive a completion.
Example request
curl http://localhost:1234/api/v0/completions \ -H "Content-Type: application/json" \ -d '{ "model": "granite-3.0-2b-instruct", "prompt": "the meaning of life is", "temperature": 0.7, "max_tokens": 10, "stream": false, "stop": "\n" }'
Response format
{ "id": "cmpl-p9rtxv6fky2v9k8jrd8cc", "object": "text_completion", "created": 1731990488, "model": "granite-3.0-2b-instruct", "choices": [ { "index": 0, "text": " to find your purpose, and once you have", "logprobs": null, "finish_reason": "length" } ], "usage": { "prompt_tokens": 5, "completion_tokens": 9, "total_tokens": 14 }, "stats": { "tokens_per_second": 57.69230769230769, "time_to_first_token": 0.299, "generation_time": 0.156, "stop_reason": "maxPredictedTokensReached" }, "model_info": { "arch": "granite", "quant": "Q4_K_M", "format": "gguf", "context_length": 4096 }, "runtime": { "name": "llama.cpp-mac-arm64-apple-metal-advsimd", "version": "1.3.0", "supported_formats": ["gguf"] } }
POST /api/v0/embeddingsText Embeddings API. You provide a text and a representation of the text as an embedding vector is returned.
Example request
curl http://localhost:1234/api/v0/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "text-embedding-nomic-embed-text-v1.5", "input": "Some text to embed" }
Example response
{ "object": "list", "data": [ { "object": "embedding", "embedding": [ -0.016731496900320053, 0.028460891917347908, -0.1407836228609085, ... (truncated for brevity) ..., 0.02505224384367466, -0.0037634256295859814, -0.04341062530875206 ], "index": 0 } ], "model": "text-embedding-nomic-embed-text-v1.5@q4_k_m", "usage": { "prompt_tokens": 0, "total_tokens": 0 } }
Please report bugs by opening an issue on Github.
This page's source is available on GitHub
On this page
- Supported API Endpoints
Start the REST API server
Endpoints
GET /api/v0/models
GET /api/v0/models/{model}
POST /api/v0/chat/completions
POST /api/v0/completions
POST /api/v0/embeddings