Project Files
README.md
LM Studio plugin with two tools for fast, precise local image analysis.
| Tool | Model | What it does |
|---|---|---|
analyse_image | FastVLM-7B-MLX | Comprehensive image descriptions + generation metadata extraction |
detect_object | Florence-2-large (default) | Object detection with exact bounding box coordinates |
detect_object | Qwen3-VL-8B-Instruct (alt.) | More versatile object detection via vision-language model |
The plugin manages its own Python environment and server process. Both models load lazily on first use — no restart needed.
Standalone vs. draw-things-chat
This plugin works standalone — any LM Studio agent can call
analyse_imageanddetect_objectdirectly. To enable the agent to "see", evaluate, and autonomously correct generated images, combine it withdraw-things-chat. Vision Promotion — an exclusive feature ofdraw-things-chat— automatically routes generated images to the agent for review. Without it, generated images are displayed to the user but the agent itself cannot inspect them.
analyse_imageUse-case — generation metadata extraction: Reads prompt, model, seed, steps, and all other parameters embedded in PNG files produced by Draw Things and compatible tools. This is something an agent cannot do on its own — the metadata is binary-encoded in the file and not visible to the model.
Use-case — vision description: Fast, precise, comprehensive natural-language image descriptions via FastVLM. Useful for batch-captioning large numbers of images for training datasets, with full control over the prompt. Requires macOS (FastVLM is Apple Silicon only).
Accepts attachments (a1), generated variants (v2), indexed images (i3), pictures (p1).
detect_objectcrop, inpaint, outpaint and zoom-in from the process-image pluginanalyse_imageBoth models must be downloaded manually before the plugin can use them.
analyse_imageSource: apple/FastVLM-7B-int4 on Hugging Face
Verify the download:
Note: The model on HuggingFace is already in the correct format for this plugin. No conversion step is required. The plugin uses a patched build of
mlx-vlm(commit1884b551+ Apple's patch) that loadsapple/FastVLM-7B-int4directly.
detect_objectA more solid alternative to Qwen3-VL-8B for object detection. Produces precise labels, but is limited to previously trained, dedicated concepts in its detection. Cannot be prompted for concepts beyond the scope of training.
Source: microsoft/Florence-2-large on Hugging Face
Which variant to download? Download the standard
microsoft/Florence-2-largerepository. Do not use a fine-tuned derivative or an MLX-converted variant — this plugin loads Florence-2 via PyTorch/transformers, not via MLX.
Verify the download:
Note: Florence-2 uses Hugging Face Remote Code (
trust_remote_code=True). The Python files above must be present in the local directory.
detect_objectA more versatile alternative to Florence-2 for object detection. Produces precise labels and better handles complex scenes, close-ups, and facial detail. Comprehensive "world knowledge". May be prompted in natural language.
Source: lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit on Hugging Face
Alternatively, download directly via the LM Studio model browser — search for lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit.
Verify the download:
To activate, set the Qwen3-VL Model Path setting in the plugin configuration to the absolute path of this directory.
Note: Qwen3-VL and Florence-2 can both be configured at the same time. The active detection backend is selected via the Detection Backend setting (
florence2orqwen3_vl).
| Hardware | Recommended backend | Notes |
|---|---|---|
| Apple Silicon M1–M5 | MLX (default) | Fastest on current Apple Silicon |
| Windows / Linux | MLX backend disabled | Florence-2 via PyTorch CPU/CUDA; FastVLM requires macOS — vision description unavailable, metadata extraction works |
Windows / Linux: The
detect_objecttool should work on Windows and Linux (PyTorch/CPU).analyse_imageworks partially: PNG generation metadata extraction is platform-independent and works everywhere. The vision description (FastVLM) requiresmlx-vlmand is macOS-only — that part will not work. Both configurations are untested.
Python 3.9 or newer must be available on PATH. The plugin creates an isolated virtual environment (.fastvlm/venv/) on first run and installs all required packages automatically — fastapi, uvicorn, mlx-vlm, transformers, torch, timm, einops, pillow.
https://lmstudio.ai/ceveyne/analyse-image
Activate it in LM Studio.
Open the plugin's global settings and set:
| Setting | Value |
|---|---|
| MLX Vision: Model Path | Absolute path to your FastVLM-7B-MLX directory |
| Florence-2 Model Path | Absolute path to your Florence-2-large directory |
All other settings have working defaults.
The plugin starts the server automatically on first tool call. No manual server management needed.
detect_object bounding boxes from this plugin directly| Setting | Default | Description |
|---|---|---|
| Previews in Chat | on | Inline image previews in tool responses. Turn off when used with draw-things-chat — images are shown to the user automatically. Turn on for standalone use. |
| MLX Vision: Load Model | on | Disable to skip FastVLM loading entirely |
| Vision Prompt | (built-in) | Default prompt sent to the vision model. Leave empty to use the built-in default. |
| MLX Vision: Model Path | (empty) | Path to FastVLM-7B-MLX model directory |
| MLX Vision: Port | 8765 | Port for the shared local server |
| MLX Vision: Max Tokens | 384 | Maximum response length in tokens (1–4096) |
| MLX Vision: Temperature | 0.7 | Sampling temperature (0.0–2.0) |
| Detection: Load Model | on | Disable to skip detection model loading entirely |
| Florence-2: Model Path | (empty) | Path to Florence-2-large model directory |
| Detection Backend: Use Qwen3-VL | off | Use Qwen3-VL instead of Florence-2 for object detection. Requires Qwen3-VL Model Path. |
| Qwen3-VL: Object Detection Prompt | (built-in) | Instruction sent to Qwen3-VL for object detection. Leave empty to use the built-in default. |
| Qwen3-VL: Model Path | (empty) | Path to the Qwen3-VL MLX model directory |
With the plugin active, ask your agent:
"Describe what's in attachment a1"
"Analyse image variants v2 and v3"
"Detect all objects in image i1"
"Find the objects in a1, then crop to the person"
Target notation: a1 (attachment 1), v2 (variant 2), i3 (indexed image 3), p1 (picture 1).
See CHANGELOG.md for version history.
MIT
| Include Generation Metadata | on | Append Draw Things generation parameters (prompt, model, seed, …) embedded in PNG files to each analysis result |
pip install huggingface-hub
huggingface-cli download apple/FastVLM-7B-int4 \
--local-dir ~/Documents/Models/FastVLM-7B-MLX
~/Documents/Models/FastVLM-7B-MLX/
config.json
model.safetensors ← int4-quantized MLX weights (uint32)
model.safetensors.index.json
fastvithd.mlpackage/ ← CoreML vision tower
tokenizer.json
...
huggingface-cli download microsoft/Florence-2-large \
--local-dir ~/Documents/Models/Florence-2-large
~/Documents/Models/Florence-2-large/
config.json
model.safetensors ← main weights
configuration_florence2.py ← Remote Code (required)
modeling_florence2.py ← Remote Code (required)
processing_florence2.py ← Remote Code (required)
tokenizer.json
...
huggingface-cli download lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit \
--local-dir ~/Documents/Models/Qwen3-VL-8B-Instruct-MLX-4bit
~/Documents/Models/Qwen3-VL-8B-Instruct-MLX-4bit/
config.json
model.safetensors ← 4-bit quantized MLX weights
preprocessor_config.json
tokenizer.json
...