qwen2.5-vl

Public

Model

Revisions

Description

a 7B Vision Language Model (VLM) from the Qwen2.5 family

Stats

232.6K Downloads

21 stars

2 forks

Capabilities

Vision Input

Minimum system memory

5GB

Qwen2.5-VL-7B-Instruct

Qwen2.5-VL-7B-Instruct is a vision-language model that processes images, text, and video, supporting structured outputs and visual localization. It can analyze charts, graphics, and layouts, and is capable of temporal reasoning over long video sequences.

The model is intended for use in document analysis, event detection, and extracting structured data from visual content. Outputs include bounding boxes, points, and structured JSON data.

Sources

The underlying model files this model uses

Based on

🤗lmstudio-community/Qwen2.5-VL-7B-Instruct-GGUF→

GGUF