The olmOCR 2 model is a Vision Language Model (VLM) from Allen AI.

Open weights and open data

The model is a fine-tune of Qwen2.5-VL-7B-Instruct (by Alibaba Qwen) using the olmOCR-mix-1025 dataset.

It has been trained on highly curated set academic papers, technical documentation, and other reference content. The model was fine-tuned on English documents using a multilingual base VLM; other languages may work.

How to use olmOCR in LM Studio

This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit.

olmOCR-Bench Scores

This model scores the following scores on olmOCR-bench when used with the olmOCR toolkit toolkit which automatically renders, rotates, and retries pages as needed.

Model	ArXiv	Old Scans Math	Tables	Old Scans	Headers and Footers	Multi column	Long tiny text	Base	Overall
olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025	82.9	82.1	84.3	48.3	95.7	84.3	81.4	99.7	82.3 ± 1.1
olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025-FP8	83.0	82.3	84.9	47.7	96.1	83.7	81.9	99.7	82.4 ± 1.1

License

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

olmOCR 2

olmOCR 2

Memory Requirements

Capabilities

About olmOCR 2

Open weights and open data

How to use olmOCR in LM Studio

olmOCR-Bench Scores

License