Tools/OCR & Document Processing/OCRmyPDF

OCRmyPDF

Adds searchable text layer to scanned PDFs using Tesseract OCR.

Open SourceSelf HostedOffline Capable

0.0 (0)

About

OCRmyPDF is a command-line tool that adds a searchable, selectable text layer to scanned PDFs using the Tesseract OCR engine, leaving the original page images visually unchanged. Along the way it can deskew crooked pages, clean images before recognition, generate archival PDF/A output, and optimize images so the result is often smaller than the input. It supports the 100 plus languages Tesseract covers, validates input and output files, spreads work across CPU cores, and copes with documents thousands of pages long, all without needing a GPU. Installation comes via pip, Homebrew, distribution packages, or Docker on Linux, macOS, Windows through WSL, and the BSDs, and the core is released under the MPL-2.0 license. It is a common backbone for document archiving workflows and document management systems such as Paperless-ngx, and for anyone turning boxes of scans into text they can actually search.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: OCR & Document Processing
Price: Free
Platform: Local/Desktop
Difficulty: Beginner (1/5)
License: MPL-2.0
Added: Apr 3, 2026

0.0 (0)

Website GitHub

Browse all OCR & Document Processing tools

OCRmyPDF

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Docling

DocTR

MinerU

PyMuPDF

Tabula

Camelot