olmocr-2-7b

Public

Model

Revisions

Description

The olmOCR 2 model is a Vision Language Model (VLM) from Allen AI.

Stats

59.5K Downloads

14 stars

Capabilities

Vision Input

Minimum system memory

5GB

olmOCR 2 by allenai

The olmOCR 2 model is fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been trained on highly curated set academic papers, technical documentation, and other reference content. The model was fine-tuned on English documents using a multilingual base VLM; other languages may work.

This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit.

Parameters

Custom configuration options included with this model

Repeat Penalty

Temperature

0.1

Sources

The underlying model files this model uses

Based on

🤗lmstudio-community/olmOCR-2-7B-1025-GGUF→

GGUF