olmocr-2-7b

Public

The olmOCR 2 model is a Vision Language Model (VLM) from Allen AI.

430 Downloads

2 stars

Capabilities

Vision Input

Minimum system memory

5GB

Tags

7B
qwen2vl

README

olmOCR 2 by allenai

The olmOCR 2 model is fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been trained on highly curated set academic papers, technical documentation, and other reference content. The model was fine-tuned on English documents using a multilingual base VLM; other languages may work.

This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit.

Parameters

Custom configuration options included with this model

Repeat Penalty
1
Temperature
0.1

Sources

The underlying model files this model uses