Tools/OCR & Document Processing/PyMuPDF

PyMuPDF

Python bindings for MuPDF library for fast PDF text and image extraction.

Open SourceSelf HostedOffline Capable

0.0 (0)

Visit Website View on GitHub

About

PyMuPDF provides Python bindings for the MuPDF engine for fast extraction of text, images, and metadata from PDFs, plus rendering pages to images and converting and manipulating documents. It is high-performance and memory-efficient and is widely used in AI data pipelines, with a companion helper aimed at producing LLM-ready Markdown. Released under the AGPL-3.0 license, with a commercial license available.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: OCR & Document Processing
Price: Free
Platform: Local/Desktop
Difficulty: Beginner (1/5)
License: AGPL-3.0
Added: Apr 3, 2026

Tags

document pdf text-extraction images python fast

Related Tools

Featured

Docling

OCR & Document Processing

Document parsing library by IBM for converting PDFs and documents to structured data.

Open SourceSelf HostedOffline

Easy

0.0 (0)

DocTR

OCR & Document Processing

Deep learning based OCR library in Python and TensorFlow/PyTorch.

Open SourceSelf HostedOffline

Easy

0.0 (0)

MinerU

OCR & Document Processing

One-stop tool for high-quality PDF extraction to Markdown or JSON.

Open SourceSelf HostedOffline

Easy

0.0 (0)

Tabula

OCR & Document Processing

Tool for extracting tables from PDF files into CSV or DataFrame format.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Featured

EasyOCR

OCR & Document Processing

Ready-to-use OCR library supporting 80+ languages with simple Python API.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Camelot

OCR & Document Processing

Python library for extracting tables from PDF files.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Browse all OCR & Document Processing tools