olmOCR

Vision-language model based OCR toolkit by AI2 for document understanding.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

About

olmOCR from the Allen Institute for AI converts PDFs and image-based documents into clean plain text and Markdown using a 7 billion parameter vision-language model based on Qwen2.5-VL. Instead of character-level OCR it reads whole pages, preserving natural reading order across multi-column layouts and handling equations, tables, handwriting, and old scans while stripping headers and footers. The pipeline runs locally on an NVIDIA GPU with at least 12 GB of VRAM, against remote vLLM servers, or across multi-node clusters through AWS S3, with a Docker image available, and the project estimates conversion cost below 200 dollars per million pages. It also ships olmOCR-Bench, a benchmark of more than 7,000 test cases across 1,400 documents for comparing OCR systems, on which the current release scores competitively. Code and weights are released under the Apache 2.0 license, and the tool is typically used to build training corpora and RAG ingestion pipelines from messy document collections.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: OCR & Document Processing
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: Apache-2.0
Minimum VRAM: 8 GB
Added: Apr 3, 2026

0.0 (0)

Website GitHub

Browse all OCR & Document Processing tools

Mentioned in

PDF Parsing for RAG in 2026: MinerU, Docling, Marker Compared

A benchmarked comparison of MinerU, Docling, Marker 2, Surya, PDF-Extract-Kit and Zerox for RAG ingestion,...

Billy C

olmOCR

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Docling

DocTR

MinerU

PyMuPDF

Tabula

Camelot

Mentioned in