Tools/OCR & Document Processing/OCRFlux-3B

OCRFlux-3B

End-to-end OCR model using vision-language architecture.

Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)

0.0 (0)

About

OCRFlux-3B is a 3 billion parameter vision-language model from the ChatDOC team that converts PDFs and images into clean, readable Markdown. Single-page parsing handles multi-column layouts, tables, equations, and reading order while stripping headers and footers, but its distinguishing feature is cross-page merging: the toolkit detects tables and paragraphs that continue across page boundaries and stitches them back together, which the project describes as a first among open source OCR tools. On the project's OCRFlux-bench-single benchmark it reports edit distance similarity of 0.971 for English and 0.962 for Chinese. Running it takes an NVIDIA GPU with at least 12 GB of VRAM, with 24 GB recommended, and tensor parallel inference across multiple GPUs is supported. The model weights are published on Hugging Face and the project is Apache 2.0 licensed, so it can be used commercially in document processing and RAG ingestion pipelines.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: OCR & Document Processing
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
Minimum VRAM: 6 GB
Added: Apr 3, 2026

0.0 (0)

Website GitHub

Browse all OCR & Document Processing tools

OCRFlux-3B

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Docling

DocTR

MinerU

PyMuPDF

Tabula

Camelot