← All Models

olmOCR 2

214 Downloads

The olmOCR 2 model is a Vision Language Model (VLM) from Allen AI.

Models
Updated 2 hours ago
4.70 GB

Memory Requirements

To run the smallest olmOCR 2, you need at least 5 GB of RAM.

Capabilities

olmOCR 2 models support vision input. They are available in gguf.

About olmOCR 2

undefined

The olmOCR 2 model is a Vision Language Model (VLM) from Allen AI.

Open weights and open data

The model is a fine-tune of Qwen2.5-VL-7B-Instruct (by Alibaba Qwen) using the olmOCR-mix-1025 dataset.

It has been trained on highly curated set academic papers, technical documentation, and other reference content. The model was fine-tuned on English documents using a multilingual base VLM; other languages may work.

How to use olmOCR in LM Studio

This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit.

olmOCR-Bench Scores

This model scores the following scores on olmOCR-bench when used with the olmOCR toolkit toolkit which automatically renders, rotates, and retries pages as needed.

ModelArXivOld Scans MathTablesOld ScansHeaders and FootersMulti columnLong tiny textBaseOverall
olmOCR pipeline v0.4.0 with olmOCR-2-7B-102582.982.184.348.395.784.381.499.782.3 ± 1.1
olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025-FP883.082.384.947.796.183.781.999.782.4 ± 1.1

License

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.