The plugin first looks for that loaded identifier, then falls back to a Gemma-first local VLM priority list. The default preload uses Gemma 4 26B A4B, with Gemma 4 E4B as the smaller fallback. When the plugin loads a VLM itself, it requests the model's maximum context length. If you preload the identifier manually, load it with the maximum context length too.

Tools:

analyze_image: free-form VLM image analysis.

bounding_boxes: detects boxes with required mode: "gemma" or mode: "qwen". Gemma mode asks for box_2d: [y1, x1, y2, x2] on a 0-1000 grid; Qwen mode asks for bbox_2d: [x1, y1, x2, y2] and accepts either model-input pixels or a 0-1000 grid. Both modes convert boxes to normalized and pixel coordinates, draw labeled boxes on a PNG copy by default, and return markdown for displaying the annotated image in chat.

To try Qwen mode, preload a Qwen-VL model into the shared VLM identifier:

Then call bounding_boxes with mode: "qwen". If no model is preloaded under vlm-tools-vlm, Qwen mode first uses an already-loaded Qwen vision model, then tries known exact Qwen keys, then falls back to downloaded vision models whose key, display name, or path looks like Qwen.

The prompt preprocessor consumes attached image files from the current user message and replaces them with absolute local paths so the model can call these tools without forwarding the image as normal VLM input. It also reminds the model to pass an explicit bounding-box mode.

Generated model-input and annotated images are written under .generated/images/, which is gitignored.

lms load google/gemma-4-26b-a4b --identifier vlm-tools-vlm

lms load lmstudio-community/Qwen3-VL-4B-Instruct-MLX-8bit --identifier vlm-tools-vlm