Capabilities
Minimum system memory
Tags
Last updated
Updated on May 4byREADME
license: other license_name: tencent-hunyuan-community license_link: https://huggingface.co/tencent/HunyuanOCR/blob/main/LICENSE language:
HunyuanOCR converted to GGUF format for inference with .
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
This is a quantization of Tencent's HunyuanOCR — a 1B parameter OCR expert Vision-Language Model. Native support was added in llama.cpp build b8670 (April 2026).
| Component | Spec |
|---|---|
| Type | Vision-Language Model (VLM) |
| Parameters | ~1.12B |
| Text Model | hunyuan-dense, 24-layer decoder, 1024 dim, GQA (16Q/8KV) |
| Vision Encoder | 27-layer ViT, 1152 dim, perceiver-based projector |
| Features | xdrope RoPE, QK normalization, RMS norm, SiLU SwiGLU |
| File | Size | Description |
|---|---|---|
HunyuanOCR-Q4_K_M.gguf | 339 MB | Text model — Q4_K_M quantized |
HunyuanOCR-Q8_0.gguf | 551 MB | Text model — Q8_0 quantized |
HunyuanOCR-F16.gguf | 1.0 GB | Text model — F16 (full precision) |
mmproj-HunyuanOCR-F16.gguf | 909 MB | Vision encoder (mmproj) — F16 required |
# Via Hugging Face (easiest) llama-server -hf AnandSingh/hunyuanocr-GGUF # Or locally llama-server \ -m HunyuanOCR-Q4_K_M.gguf \ --mmproj mmproj-HunyuanOCR-F16.gguf
API request:
{ "messages": [ { "role": "user", "content": [ { "type": "text", "text": "OCR" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,..." } } ] } ] }
| Task | Prompt |
|---|---|
| Text Spotting | 检测并识别图片中的文字,将文本坐标格式化输出。 |
| Document Parsing | 提取文档图片中正文的所有信息用markdown格式表示,其中页眉、页脚部分忽略,表格用html格式表达,文档中公式用latex格式表示,按照阅读顺序组织进行解析。 |
| Formula Recognition | 识别图片中的公式,用LaTeX格式表示。 |
| Table Extraction | 把图中的表格解析为 HTML。 |
| Translation | 先提取文字,再将文字内容翻译为英文。 |
b8670 or laterLicensed under the Tencent Hunyuan Community License Agreement.
Original model by Tencent Hunyuan Vision Team. GGUF conversion is not affiliated with or endorsed by Tencent.
Based on
GGUF