Docling
Document parsing library by IBM for converting PDFs and documents to structured data.
About
Docling by IBM Research converts PDFs, DOCX, PPTX, and other formats into structured JSON or Markdown for LLM and RAG pipelines. It parses complex layouts, tables, figures, and equations, preserving reading order, and exposes a Python API built with Pydantic models. It runs on CPU and is part of the LF AI and Data Foundation. Released under the MIT license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- OCR & Document Processing
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- MIT
- Added
- Apr 3, 2026
Related Tools
Python library for extracting text, tables, and metadata from PDFs.
Python bindings for MuPDF library for fast PDF text and image extraction.
Python library for extracting tables from PDF files.
Ready-to-use OCR library supporting 80+ languages with simple Python API.
Turn-key OCR system for historical and non-Latin script documents.
Vision-language model based OCR toolkit by AI2 for document understanding.