ouroboroid/ai-lab • LM Studio Hub

Tool	Kind	What it does
`save_prompt`	Store-write	Save a new version of a prompt — auto-increments version, never overwrites
`get_prompt`	Store-read	Get a prompt by name (latest or specific version)
`list_prompts`	Store-read	List all prompts with latest version and description
`list_prompt_versions`	Store-read	List all versions of a specific prompt
`diff_prompts`	Compute	Line-by-line diff between two prompt versions
`run_prompt_template`	Compute	Fill `{{variable}}` placeholders and return the rendered prompt

Tool	Kind	What it does
`create_eval_dataset`	Store-write	Create a named dataset for organizing test cases
`add_eval_case`	Store-write	Add a test case with input and expected output
`list_eval_datasets`	Store-read	List all datasets with case counts
`get_eval_dataset`	Store-read	Get a dataset with all its cases and IDs

Tool	Kind	What it does
`log_model_result`	Store-write	Log a model's output for a case with optional score
`compare_models`	Store-read	Leaderboard + case-by-case comparison across models
`generate_eval_report`	Scaffold	Return a report payload for LLM to write a narrative eval summary

Score	Meaning
`1.0`	Perfect — exactly matches expected or fully correct
`0.8`	Good — minor issues or slight deviation
`0.5`	Partial — got the right idea but incomplete or partially wrong
`0.2`	Poor — attempted but fundamentally wrong
`0.0`	Failure — completely wrong or refused
`null`	Not yet scored

1. create_eval_dataset("task_name")
2. add_eval_case × 10+
3. get_eval_dataset → note case IDs
4. Run each model manually against each input
5. log_model_result for each model × case
6. compare_models → leaderboard + case breakdown
7. generate_eval_report → narrative summary

ai-lab

AI Lab Plugin for LM Studio

What It Does

Installation

Configuration

Tools

Prompt Management

Eval Datasets

Model Evaluation

Prompt Versioning

Template Variables

Eval Workflow

Scoring

Example Sessions

Data