Camelot

Python library for extracting tables from PDF files.

Open SourceSelf HostedOffline Capable

0.0 (0)

About

Camelot is a Python library that extracts tables from PDF files into pandas DataFrames. Text-based documents need no OCR, and optional OCR support extends coverage to scanned, image-only PDFs. Several parsing flavors are available: lattice for tables with ruled lines, stream for whitespace-separated layouts, network and hybrid parsers based on text alignment, and an optional neural backend using a Table Transformer model for complex borderless tables, plus an auto-detection mode. Each extraction carries accuracy and whitespace metrics so poor results can be filtered or tuned, with configuration for table regions, column separators, and text processing. Tables export to CSV, JSON, Excel, HTML, Markdown, and SQLite, and input can come from file paths, URLs, raw bytes, or file-like objects. The default pdfium backend keeps the core install light, with PyTorch loading only for the ML parser, and a command-line interface is included. Camelot is MIT licensed and popular with analysts automating data extraction from reports and public documents.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Camelot

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

DocTR

MinerU

PyMuPDF

Tabula

EasyOCR

Docling