254 Downloads
The 4B version of Qwen's latest vision-language model. Includes comprehensive upgrades to visual perception, spatial reasoning, and image understanding.
Vision Input
The latest generation vision-language model in the Qwen series with comprehensive upgrades to visual perception, spatial reasoning, and video understanding.
Delivers strong vision-language performance across diverse tasks including document analysis, visual question answering, video understanding, and agentic interactions. Suitable for edge deployment with efficient inference on Apple Silicon via MLX quantization.
The underlying model files this model uses
When you download this model, LM Studio picks the source that will best suit your machine (you can override this)
Custom configuration options included with this model