Tools/OCR & Document Processing/pdfplumber

pdfplumber

Python library for extracting text, tables, and metadata from PDFs.

Open SourceSelf HostedOffline Capable

0.0 (0)

About

pdfplumber gives Python developers fine-grained access to the contents of machine-generated PDFs. Built on pdfminer.six, it exposes every character, line, rectangle, and curve with position and font metadata, extracts text with optional layout preservation, reads form field values, annotations, and hyperlinks, and pulls tables using a detection strategy based on line intersection analysis inspired by Tabula and Anssi Nurminen's research. A visual debugging feature renders pages with overlays so users can see exactly what the parser detects and tune table settings for difficult layouts. The library works without OCR and is explicitly aimed at text-based rather than scanned documents; it does not create or modify PDFs. Tested against Python 3.10 through 3.14 and released under the MIT license, it is a standard choice for data journalists, analysts, and engineers extracting structured data from reports, filings, and forms.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: OCR & Document Processing
Price: Free
Platform: Local/Desktop
Difficulty: Beginner (1/5)
License: MIT
Added: Apr 3, 2026

0.0 (0)

Website GitHub

Browse all OCR & Document Processing tools

pdfplumber

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Docling

DocTR

MinerU

PyMuPDF

Tabula

Camelot