analyse-image

LM Studio plugin with two tools for fast, precise local image analysis.

Tool	Model	What it does
`analyse_image`	FastVLM-7B-MLX	Comprehensive image descriptions + generation metadata extraction
`detect_object`	Florence-2-large (default)	Object detection with exact bounding box coordinates
`detect_object`	Qwen3-VL-8B-Instruct (alt.)	More versatile object detection via vision-language model

The plugin manages its own Python environment and server process. Both models load lazily on first use — no restart needed.

Standalone vs. draw-things-chat

This plugin works standalone — any LM Studio agent can call analyse_image and detect_object directly. To enable the agent to "see", evaluate, and autonomously correct generated images, combine it with draw-things-chat. Vision Promotion — an exclusive feature of draw-things-chat — automatically routes generated images to the agent for review. Without it, generated images are displayed to the user but the agent itself cannot inspect them.

Tools

`analyse_image`

Use-case — generation metadata extraction: Reads prompt, model, seed, steps, and all other parameters embedded in PNG files produced by Draw Things and compatible tools. This is something an agent cannot do on its own — the metadata is binary-encoded in the file and not visible to the model.

Use-case — vision description: Fast, precise, comprehensive natural-language image descriptions via FastVLM. Useful for batch-captioning large numbers of images for training datasets, with full control over the prompt. Requires macOS (FastVLM is Apple Silicon only).

Accepts attachments (a1), generated variants (v2), indexed images (i3), pictures (p1).

`detect_object`

Detects objects in an image with label and exact pixel bounding box
Returns structured position data compatible with crop, inpaint, outpaint and zoom-in from the process-image plugin
Works on the same target notation as analyse_image

Requirements

Models

Both models must be downloaded manually before the plugin can use them.

FastVLM-7B-MLX — for `analyse_image`

Source: apple/FastVLM-7B-int4 on Hugging Face

Verify the download:

Note: The model on HuggingFace is already in the correct format for this plugin. No conversion step is required. The plugin uses a patched build of mlx-vlm (commit 1884b551 + Apple's patch) that loads apple/FastVLM-7B-int4 directly.

Florence-2-large — for `detect_object`

A more solid alternative to Qwen3-VL-8B for object detection. Produces precise labels, but is limited to previously trained, dedicated concepts in its detection. Cannot be prompted for concepts beyond the scope of training.

Source: microsoft/Florence-2-large on Hugging Face

Which variant to download? Download the standard microsoft/Florence-2-large repository. Do not use a fine-tuned derivative or an MLX-converted variant — this plugin loads Florence-2 via PyTorch/transformers, not via MLX.

Verify the download:

Note: Florence-2 uses Hugging Face Remote Code (trust_remote_code=True). The Python files above must be present in the local directory.

Qwen3-VL-8B-Instruct-MLX-4bit — alternative backend for `detect_object`

A more versatile alternative to Florence-2 for object detection. Produces precise labels and better handles complex scenes, close-ups, and facial detail. Comprehensive "world knowledge". May be prompted in natural language.

Source: lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit on Hugging Face

Alternatively, download directly via the LM Studio model browser — search for lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit.

Verify the download:

To activate, set the Qwen3-VL Model Path setting in the plugin configuration to the absolute path of this directory.

Note: Qwen3-VL and Florence-2 can both be configured at the same time. The active detection backend is selected via the Detection Backend setting (florence2 or qwen3_vl).

Hardware

Hardware	Recommended backend	Notes
Apple Silicon M1–M5	MLX (default)	Fastest on current Apple Silicon
Windows / Linux	MLX backend disabled	Florence-2 via PyTorch CPU/CUDA; FastVLM requires macOS — vision description unavailable, metadata extraction works

Windows / Linux: The detect_object tool should work on Windows and Linux (PyTorch/CPU). analyse_image works partially: PNG generation metadata extraction is platform-independent and works everywhere. The vision description (FastVLM) requires mlx-vlm and is macOS-only — that part will not work. Both configurations are untested.

Python

Python 3.9 or newer must be available on PATH. The plugin creates an isolated virtual environment (.fastvlm/venv/) on first run and installs all required packages automatically — fastapi, uvicorn, mlx-vlm, transformers, torch, timm, einops, pillow.

Setup

1. Install the plugin

https://lmstudio.ai/ceveyne/analyse-image

Activate it in LM Studio.

2. Configure model paths

Open the plugin's global settings and set:

Setting	Value
MLX Vision: Model Path	Absolute path to your `FastVLM-7B-MLX` directory
Florence-2 Model Path	Absolute path to your `Florence-2-large` directory

All other settings have working defaults.

3. Use it

The plugin starts the server automatically on first tool call. No manual server management needed.

4. Companion plugins (recommended)

draw-things-chat — full text2image / image2image / video generation, with Vision Promotion so the agent can see and evaluate its output
process-image — crop, zoom-in, and refine; uses detect_object bounding boxes from this plugin directly

Configuration reference

Setting	Default	Description
Previews in Chat	on	Inline image previews in tool responses. Turn off when used with `draw-things-chat` — images are shown to the user automatically. Turn on for standalone use.
MLX Vision: Load Model	on	Disable to skip FastVLM loading entirely
Vision Prompt	(built-in)	Default prompt sent to the vision model. Leave empty to use the built-in default.
MLX Vision: Model Path	(empty)	Path to `FastVLM-7B-MLX` model directory
MLX Vision: Port	`8765`	Port for the shared local server
MLX Vision: Max Tokens	`384`	Maximum response length in tokens (1–4096)
MLX Vision: Temperature	`0.7`	Sampling temperature (0.0–2.0)
Detection: Load Model	on	Disable to skip detection model loading entirely
Florence-2: Model Path	(empty)	Path to `Florence-2-large` model directory
Detection Backend: Use Qwen3-VL	off	Use Qwen3-VL instead of Florence-2 for object detection. Requires Qwen3-VL Model Path.
Qwen3-VL: Object Detection Prompt	(built-in)	Instruction sent to Qwen3-VL for object detection. Leave empty to use the built-in default.
Qwen3-VL: Model Path	(empty)	Path to the Qwen3-VL MLX model directory

Usage

With the plugin active, ask your agent:

"Describe what's in attachment a1"
"Analyse image variants v2 and v3"
"Detect all objects in image i1"
"Find the objects in a1, then crop to the person"

Target notation: a1 (attachment 1), v2 (variant 2), i3 (indexed image 3), p1 (picture 1).

Changelog

See CHANGELOG.md for version history.

License

MIT

analyse-image

analyse-image

Tools

analyse_image

detect_object

Requirements

Models

FastVLM-7B-MLX — for analyse_image

Florence-2-large — for detect_object

Qwen3-VL-8B-Instruct-MLX-4bit — alternative backend for detect_object

Hardware

Python

Setup

1. Install the plugin

2. Configure model paths

3. Use it

4. Companion plugins (recommended)

Configuration reference

Usage

Changelog

License

`analyse_image`

`detect_object`

FastVLM-7B-MLX — for `analyse_image`

Florence-2-large — for `detect_object`

Qwen3-VL-8B-Instruct-MLX-4bit — alternative backend for `detect_object`