Description
The olmOCR 2 model is a Vision Language Model (VLM) from Allen AI.
Stats
430 Downloads
2 stars
Capabilities
Minimum system memory
Tags
Last updated
Updated 3 hours agobyREADME
The olmOCR 2 model is fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been trained on highly curated set academic papers, technical documentation, and other reference content. The model was fine-tuned on English documents using a multilingual base VLM; other languages may work.
This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit.
Parameters
Custom configuration options included with this model
Sources
The underlying model files this model uses
Based on