Documentation

API

OpenAI Compatibility API

Send requests to Responses, Chat Completions (text and images), Completions, and Embeddings endpoints

Send requests to Responses, Chat Completions (text and images), Completions, and Embeddings endpoints.


OpenAI-like API endpoints

LM Studio accepts requests on several OpenAI endpoints and returns OpenAI-like response objects.

Supported endpoints

GET  /v1/models
POST /v1/responses
POST /v1/chat/completions
POST /v1/embeddings
POST /v1/completions
See below for more info about each endpoint

Re-using an existing OpenAI client

Pro Tip

You can reuse existing OpenAI clients (in Python, JS, C#, etc) by switching up the "base URL" property to point to your LM Studio instead of OpenAI's servers.

Switching up the base url to point to LM Studio

Note: The following examples assume the server port is 1234
Python
from openai import OpenAI

client = OpenAI(
+    base_url="http://localhost:1234/v1"
)

# ... the rest of your code ...
Typescript
import OpenAI from 'openai';

const client = new OpenAI({
+  baseUrl: "http://localhost:1234/v1"
});

// ... the rest of your code ...
cURL
- curl https://api.openai.com/v1/chat/completions \
+ curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
-     "model": "gpt-4o-mini",
+     "model": "use the model identifier from LM Studio here",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

Endpoints overview

/v1/models

  • GET request
  • Lists the currently loaded models.
cURL example
curl http://localhost:1234/v1/models

/v1/responses

cURL example (non‑streaming)
curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Provide a prime number less than 50",
    "reasoning": { "effort": "low" }
  }'
Stateful follow‑up

Use the id from the previous response as previous_response_id.

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Multiply it by 2",
    "previous_response_id": "resp_123"
  }'
Streaming
curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Hello",
    "stream": true
  }'

You will receive SSE events like response.created, response.output_text.delta, and response.completed.

Tools and Remote MCP (opt‑in)

Enable remote MCP in the app (Developer → Settings). Example payload using an MCP server tool:

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "tools": [{
      "type": "mcp",
      "server_label": "tiktoken",
      "server_url": "https://gitmcp.io/openai/tiktoken",
      "allowed_tools": ["fetch_tiktoken_documentation"]
    }],
    "input": "What is the first sentence of the tiktoken documentation?"
  }'

/v1/chat/completions

  • POST request
  • Send a chat history and receive the assistant's response
  • Prompt template is applied automatically
  • You can provide inference parameters such as temperature in the payload. See supported parameters
  • See OpenAI's documentation for more information
  • As always, keep a terminal window open with lms log stream to see what input the model receives
Python example
# Example: reuse your existing OpenAI setup
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

completion = client.chat.completions.create(
  model="model-identifier",
  messages=[
    {"role": "system", "content": "Always answer in rhymes."},
    {"role": "user", "content": "Introduce yourself."}
  ],
  temperature=0.7,
)

print(completion.choices[0].message)

/v1/embeddings

  • POST request
  • Send a string or array of strings and get an array of text embeddings (integer token IDs)
  • See OpenAI's documentation for more information
Python example
# Make sure to `pip install openai` first
from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

def get_embedding(text, model="model-identifier"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

print(get_embedding("Once upon a time, there was a cat."))

/v1/completions

Heads Up

This OpenAI-like endpoint is no longer supported by OpenAI. LM Studio continues to support it.

Using this endpoint with chat-tuned models might result in unexpected behavior such as extraneous role tokens being emitted by the model.

For best results, utilize a base model.


Supported payload parameters

For an explanation for each parameter, see https://platform.openai.com/docs/api-reference/chat/create.

model
top_p
top_k
messages
temperature
max_tokens
stream
stop
presence_penalty
frequency_penalty
logit_bias
repeat_penalty
seed

Community

Chat with other LM Studio developers, discuss LLMs, hardware, and more on the LM Studio Discord server.

This page's source is available on GitHub