Unstructured
Open-source library for preprocessing unstructured documents for LLM pipelines.
Open SourceSelf HostedOffline Capable
0.0 (0)
About
Unstructured provides open-source components for ingesting and preprocessing unstructured documents (PDFs, HTML, DOCX, images, etc.) for LLM and RAG pipelines. Handles partitioning, chunking, cleaning, and staging. Python library runs on CPU. Apache 2.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- OCR & Document Processing
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- Apache-2.0
- Added
- Apr 3, 2026
Similar Tools
Featured
Most widely-used open-source OCR engine supporting 100+ languages.
Open SourceSelf HostedOffline
Easy
0.0 (0)
Featured
Ready-to-use OCR library supporting 80+ languages with simple Python API.
Open SourceSelf HostedOffline
Beginner
0.0 (0)
Featured
Multilingual OCR toolkit by PaddlePaddle with state-of-the-art accuracy.
Open SourceSelf HostedOffline
Easy
0.0 (0)