GOT-OCR

General OCR Theory model with unified end-to-end architecture for various OCR tasks.

Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)

0.0 (0)

About

GOT-OCR2.0 implements General OCR Theory, the idea that one unified end-to-end model can replace a pipeline of task-specific OCR systems. A vision encoder built on the Vary model feeds a Qwen-based language decoder, and the same weights read scene text, dense documents, mathematical formulas, tables, charts, and even sheet music. It supports plain text extraction, formatted output that preserves layout, fine-grained region OCR guided by bounding boxes, multi-page documents, and rendered HTML visualization of results. Inference runs through Hugging Face Transformers with batch support, and the community has ported it to ONNX, MNN, llama.cpp GGUF, and vLLM, with LoRA fine-tuning available via ms-swift and distributed training through DeepSpeed. The code is Apache 2.0 licensed while the training data carries CC-BY-NC 4.0 terms. The model passed one million downloads on Hugging Face within months of release and is frequently used for document parsing in RAG pipelines.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: OCR & Document Processing
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: Apache-2.0
Minimum VRAM: 6 GB
Added: Apr 3, 2026

0.0 (0)

Website GitHub

Browse all OCR & Document Processing tools

GOT-OCR

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Docling

DocTR

MinerU

PyMuPDF

Tabula

Camelot