GOT-OCR
General OCR Theory model with unified end-to-end architecture for various OCR tasks.
About
GOT-OCR, General OCR Theory by StepFun, is a unified end-to-end model that reads scene text, document text, formulas, charts, sheet music, and more through a single vision-encoder-decoder architecture rather than separate task-specific systems. It supports multi-page and region-based recognition and is positioned as a step toward OCR 2.0. Released under the Apache 2.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- OCR & Document Processing
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- Apache-2.0
- Minimum VRAM
- 6 GB
- Added
- Apr 3, 2026
Related Tools
Python library for extracting text, tables, and metadata from PDFs.
Python bindings for MuPDF library for fast PDF text and image extraction.
Python library for extracting tables from PDF files.
Ready-to-use OCR library supporting 80+ languages with simple Python API.
Turn-key OCR system for historical and non-Latin script documents.
Vision-language model based OCR toolkit by AI2 for document understanding.