Documentation
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Using lmstudio-python
in REPL
To simplify interactive use, lmstudio-python
offers a convenience API which manages
its resources via atexit
hooks, allowing a default synchronous client session
to be used across multiple interactive commands.
This convenience API is shown in the examples throughout the documentation as the
Python (convenience API)
tab (alongside the Python (scoped resource API)
examples,
which use with
statements to ensure deterministic cleanup of network communication
resources).
The convenience API allows the standard Python REPL, or more flexible alternatives like Juypter Notebooks, to be used to interact with AI models loaded into LM Studio. For example:
>>> import lmstudio as lms
>>> loaded_models = lms.list_loaded_models()
>>> for idx, model in enumerate(loaded_models):
... print(f"{idx:>3} {model}")
...
0 LLM(identifier='qwen2.5-7b-instruct')
>>> model = loaded_models[0]
>>> chat = lms.Chat("You answer questions concisely")
>>> chat = lms.Chat("You answer questions concisely")
>>> chat.add_user_message("Tell me three fruits")
UserMessage(content=[TextData(text='Tell me three fruits')])
>>> print(model.respond(chat, on_message=chat.append))
Banana, apple, orange.
>>> chat.add_user_message("Tell me three more fruits")
UserMessage(content=[TextData(text='Tell me three more fruits')])
>>> print(model.respond(chat, on_message=chat.append))
Mango, strawberry, avocado.
>>> chat.add_user_message("How many fruits have you told me?")
UserMessage(content=[TextData(text='How many fruits have you told me?')])
>>> print(model.respond(chat, on_message=chat.append))
You asked for three initial fruits and three more, so I've listed a total of six fruits.
While not primarily intended for use this way, the SDK's asynchronous structured concurrency API
is compatible with the asynchronous Python REPL that is launched by python -m asyncio
.
For example:
# Note: assumes use of the "python -m asyncio" asynchronous REPL (or equivalent)
# Requires Python SDK version 1.5.0 or later
>>> from contextlib import AsyncExitStack
>>> import lmstudio as lms
>>> resources = AsyncExitStack()
>>> client = await resources.enter_async_context(lms.AsyncClient())
>>> loaded_models = await client.llm.list_loaded()
>>> for idx, model in enumerate(loaded_models):
... print(f"{idx:>3} {model}")
...
0 AsyncLLM(identifier='qwen2.5-7b-instruct-1m')
>>> model = loaded_models[0]
>>> chat = lms.Chat("You answer questions concisely")
>>> chat.add_user_message("Tell me three fruits")
UserMessage(content=[TextData(text='Tell me three fruits')])
>>> print(await model.respond(chat, on_message=chat.append))
Apple, banana, and orange.
>>> chat.add_user_message("Tell me three more fruits")
UserMessage(content=[TextData(text='Tell me three more fruits')])
>>> print(await model.respond(chat, on_message=chat.append))
Mango, strawberry, and pineapple.
>>> chat.add_user_message("How many fruits have you told me?")
UserMessage(content=[TextData(text='How many fruits have you told me?')])
>>> print(await model.respond(chat, on_message=chat.append))
You asked for three fruits initially, then three more, so I’ve listed six fruits in total.