Capabilities
Minimum system memory
Tags
Last updated
Updated 10 days agobyREADME
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
Based on
GGUF
license: other license_name: tencent-hunyuan-community license_link: https://huggingface.co/tencent/HunyuanOCR/blob/main/LICENSE language:
HunyuanOCR converted to GGUF format for inference with llama.cpp.
This is a quantization of Tencent's HunyuanOCR — a 1B parameter OCR expert Vision-Language Model. Native support was added in llama.cpp build b8670 (April 2026).
| Component | Spec |
|---|---|
| Type | Vision-Language Model (VLM) |
| Parameters | ~1.12B |
| Text Model | hunyuan-dense, 24-layer decoder, 1024 dim, GQA (16Q/8KV) |
| Vision Encoder | 27-layer ViT, 1152 dim, perceiver-based projector |
| Features | xdrope RoPE, QK normalization, RMS norm, SiLU SwiGLU |
| File | Size | Description |
|---|---|---|
HunyuanOCR-Q4_K_M.gguf | 339 MB | Text model — Q4_K_M quantized |
HunyuanOCR-Q8_0.gguf | 551 MB | Text model — Q8_0 quantized |
HunyuanOCR-F16.gguf | 1.0 GB | Text model — F16 (full precision) |
mmproj-HunyuanOCR-F16.gguf | 909 MB | Vision encoder (mmproj) — F16 required |
API request:
| Task | Prompt |
|---|---|
| Text Spotting | 检测并识别图片中的文字,将文本坐标格式化输出。 |
| Document Parsing | 提取文档图片中正文的所有信息用markdown格式表示,其中页眉、页脚部分忽略,表格用html格式表达,文档中公式用latex格式表示,按照阅读顺序组织进行解析。 |
| Formula Recognition | 识别图片中的公式,用LaTeX格式表示。 |
| Table Extraction | 把图中的表格解析为 HTML。 |
| Translation | 先提取文字,再将文字内容翻译为英文。 |
b8670 or laterLicensed under the Tencent Hunyuan Community License Agreement.
Original model by Tencent Hunyuan Vision Team. GGUF conversion is not affiliated with or endorsed by Tencent.
# Via Hugging Face (easiest)
llama-server -hf AnandSingh/hunyuanocr-GGUF
# Or locally
llama-server \
-m HunyuanOCR-Q4_K_M.gguf \
--mmproj mmproj-HunyuanOCR-F16.gguf
{
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "OCR" },
{ "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,..." } }
]
}
]
}