Documentation

Get Context Length

LLMs and embedding models, due to their fundamental architecture, have a property called context length, and more specifically a maximum context length. Loosely speaking, this is how many tokens the models can "keep in memory" when generating text or embeddings. Exceeding this limit will result in the model behaving erratically.

Use the get_context_length() function on the model object

It's useful to be able to check the context length of a model, especially as an extra check before providing potentially long input to the model.

context_length = model.get_context_length()

The model in the above code snippet is an instance of a loaded model you get from the llm.model method. See Manage Models in Memory for more information.

Example: Check if the input will fit in the model's context window

You can determine if a given conversation fits into a model's context by doing the following:

  • Convert the conversation to a string using the prompt template.
  • Count the number of tokens in the string.
  • Compare the token count to the model's context length.
import lmstudio as lms

def does_chat_fit_in_context(model: lms.LLM, chat: lms.Chat) → bool:
    # Convert the conversation to a string using the prompt template.
    formatted = model.apply_prompt_template(chat)
    # Count the number of tokens in the string.
    token_count = len(model.tokenize(formatted))
    # Get the current loaded context length of the model
    context_length = model.get_context_length()
    return token_count < context_length

model = lms.llm()

chat = lms.Chat.from_history({
    "messages": [
        { "role": "user", "content": "What is the meaning of life." },
        { "role": "assistant", "content": "The meaning of life is..." },
        # ... More messages
    ]
})

print("Fits in context:", does_chat_fit_in_context(model, chat))