357 Downloads

qwen/
Qwen3-VL-8B
8B
qwen3_vl

The 8B version of Qwen's latest vision-language model. Includes comprehensive upgrades to visual perception, spatial reasoning, and image understanding.

Vision Input

Last Updated8 hours ago
README

Qwen3 VL 8B

The latest generation vision-language model in the Qwen series with comprehensive upgrades to visual perception, spatial reasoning, and video understanding.

Key Features

  • Visual Agent: Operates PC and mobile GUIs—recognizes elements, understands functions, and completes tasks
  • Visual Coding: Generates Draw.io, HTML, CSS, and JavaScript from images and videos
  • Advanced Spatial Perception: Provides 2D/3D grounding for spatial reasoning and embodied AI applications
  • Enhanced Reasoning: Excels at STEM and math with causal analysis and evidence-based answers
  • Upgraded Recognition: Recognizes celebrities, anime, products, landmarks, flora, fauna, and more
  • Expanded OCR: Supports 32 languages with robust performance in low light, blur, and tilt conditions
  • Pure Text Performance: Text understanding on par with pure LLMs through seamless text-vision fusion

Architecture Highlights

  • 8.77B parameters
  • Interleaved-MRoPE for enhanced video reasoning
  • DeepStack for fine-grained detail capture
  • Text-Timestamp Alignment for precise event localization
  • Context length: 256,000 tokens
  • Vision-enabled multimodal model

Performance

Delivers strong vision-language performance across diverse tasks including document analysis, visual question answering, video understanding, and agentic interactions. Suitable for edge deployment with efficient inference on Apple Silicon via MLX quantization.

sources

The underlying model files this model uses

When you download this model, LM Studio picks the source that will best suit your machine (you can override this)

config

Custom configuration options included with this model

No custom configuration.