Tools/OCR & Document Processing/Tesseract

Featured Tool

Tesseract

Most widely-used open-source OCR engine supporting 100+ languages.

Open SourceSelf HostedOffline Capable

0.0 (0)

Visit Website View on GitHub

About

Tesseract is a widely used open-source OCR engine originally developed at HP and now maintained with Google's support. Version 4 added an LSTM-based recognition engine focused on line recognition while retaining the legacy character-pattern engine, and it supports more than 100 languages across images and PDFs. It ships a library and a command-line program and runs on CPU. Released under the Apache 2.0 license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: OCR & Document Processing
Price: Free
Platform: Local/Desktop
Difficulty: Easy (2/5)
License: Apache-2.0
Added: Apr 3, 2026

Tags

ocr text-extraction multilingual google lstm

Related Tools

Featured

Docling

OCR & Document Processing

Document parsing library by IBM for converting PDFs and documents to structured data.

Open SourceSelf HostedOffline

Easy

0.0 (0)

DocTR

OCR & Document Processing

Deep learning based OCR library in Python and TensorFlow/PyTorch.

Open SourceSelf HostedOffline

Easy

0.0 (0)

MinerU

OCR & Document Processing

One-stop tool for high-quality PDF extraction to Markdown or JSON.

Open SourceSelf HostedOffline

Easy

0.0 (0)

PyMuPDF

OCR & Document Processing

Python bindings for MuPDF library for fast PDF text and image extraction.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Tabula

OCR & Document Processing

Tool for extracting tables from PDF files into CSV or DataFrame format.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Camelot

OCR & Document Processing

Python library for extracting tables from PDF files.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Browse all OCR & Document Processing tools