Use OpenAI's Responses API with local models
LM Studio 0.3.29 is now available as a stable release. Update in‑app or download the latest version at lmstudio.ai/download.
/v1/responses
This release adds support for OpenAI’s /v1/responses
API through the LM Studio REST server.
previous_response_id
to continue interactions without needing to manage message history yourself.v1/chat/completions
.reasoning: { effort: "low" | "medium" | "high" }
for openai/gpt-oss-20b
.stream: true
to receive SSE events as the model generates, or omit for a single JSON response.To use REST API server endpoints, ensure your LM Studio server is running in the UI (Developer → Status: Running):
Start the server in the UI
Or through the lms
CLI:
→ % lms server start Success! Server is now running on port 1234
Example request:
curl http://127.0.0.1:1234/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-oss-20b", "input": "Provide a prime number less than 50", "reasoning": { "effort": "low" } }'
Response (shortened):
{ "id": "resp_123", "output": [{"type":"message", ...}] }
Continue the interaction by setting previous_response_id
to the id above:
curl http://127.0.0.1:1234/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-oss-20b", "input": "Multiply it by 2", "previous_response_id": "resp_123" }'
Example request:
curl http://127.0.0.1:1234/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-oss-20b", "input": "Hello", "stream": true }'
You’ll receive events as the model generates output like response.created
, response.output_text.delta
, and response.completed
. See docs for more details on streaming events.
Opt-in to allow use of remote MCP servers (Developer → Settings → Allow MCP → Remote):
Allow remote MCP servers
Example request:
curl http://127.0.0.1:1234/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "model": "openai/gpt-oss-20b", "tools": [{ "type": "mcp", "server_label": "tiktoken", "server_url": "https://gitmcp.io/openai/tiktoken", "allowed_tools": ["fetch_tiktoken_documentation"] }], "input": "What is the first sentence of the tiktoken documentation?" }'
Output will include a tool discovery and a tool call before the assistant’s reply. See docs for full schema and examples.
Quickly inspect every available variant for multi‑variant models:
lms ls --variants
Example output (variants only):
google/gemma-3-12b (2 variants) * google/gemma-3-12b@q3_k_l 12B gemma3 7.33 GB google/gemma-3-12b@4bit 12B gemma3 8.07 GB
Build 1
/v1/responses
stream=true
)lms ls
command option: lms ls --variants
to list all variants for multi-variant models