MinerU
One-stop tool for high-quality PDF extraction to Markdown or JSON.
About
MinerU by OpenDataLab is a one-stop tool for converting PDFs into machine-readable Markdown or structured JSON for LLM and RAG pipelines. It handles complex layouts, reading order, formulas, tables, and images by combining several OCR and layout-analysis models, and offers a command-line tool, a web demo, and a Python package. It targets clean extraction from scientific and technical documents. Released under the AGPL-3.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- OCR & Document Processing
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- AGPL-3.0
- Added
- Apr 3, 2026
Related Tools
Python library for extracting text, tables, and metadata from PDFs.
Python bindings for MuPDF library for fast PDF text and image extraction.
Python library for extracting tables from PDF files.
Ready-to-use OCR library supporting 80+ languages with simple Python API.
Turn-key OCR system for historical and non-Latin script documents.
Vision-language model based OCR toolkit by AI2 for document understanding.