Documentation
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Manage Models
Model Info
Text Completions
Use llm.complete(...)
to generate text completions from a loaded language model.
Text completions mean sending a non-formatted string to the model with the expectation that the model will complete the text.
This is different from multi-turn chat conversations. For more information on chat completions, see Chat Completions.
First, you need to load a model to generate completions from.
This can be done using the top-level llm
convenience API,
or the model
method in the llm
namespace when using the scoped resource API.
For example, here is how to use Qwen2.5 7B Instruct.
import lmstudio as lms
model = lms.llm("qwen2.5-7b-instruct")
Once you have a loaded model, you can generate completions by passing a string to the complete
method on the llm
handle.
# The `chat` object is created in the previous step.
result = model.complete("My name is", config={"maxTokens": 100})
print(result)
You can also print prediction metadata, such as the model used for generation, number of generated tokens, time to first token, and stop reason.
# `result` is the response from the model.
print("Model used:", result.model_info.display_name)
print("Predicted tokens:", result.stats.predicted_tokens_count)
print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
print("Stop reason:", result.stats.stop_reason)
Here's an example of how you might use the complete
method to simulate a terminal.
import lmstudio as lms
model = lms.llm()
console_history = []
while True:
try:
user_command = input("$ ")
except EOFError:
print()
break
if user_command.strip() == "exit":
break
console_history.append(f"$ {user_command}")
history_prompt = "\n".join(console_history)
prediction_stream = model.complete_stream(
history_prompt,
config={ "stopStrings": ["$"] },
)
for fragment in prediction_stream:
print(fragment.content, end="", flush=True)
print()
console_history.append(prediction_stream.result().content)