Tools/Speech-to-Text / Speech Recognition/Paraformer (FunASR)

Paraformer (FunASR)

Non-autoregressive ASR model by Alibaba achieving fast parallel transcription.

Open SourceSelf HostedOffline CapableGPU Required (4GB+ VRAM)

0.0 (0)

About

Paraformer is the flagship speech recognition model in FunASR, an industrial speech toolkit from Alibaba's DAMO Academy released under the MIT license, with model weights carrying their own licenses. Unlike autoregressive recognizers that decode token by token, Paraformer predicts an entire utterance in parallel, which yields fast inference at competitive accuracy; the repository reports Chinese character error rates around 8 to 10 percent at well over 100x realtime on GPU. FunASR wraps the model in a complete pipeline: FSMN-VAD segments audio, CT-PUNC restores punctuation, CAM++ handles speaker diarization, and a WebSocket service supports streaming recognition with partial results. The toolkit spans offline, streaming, and edge deployment, including GGUF builds that run without a Python runtime, and newer models in the family extend coverage to dozens of languages. It is used in production speech systems, particularly Chinese transcription workloads where throughput and accuracy both matter.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: MIT
Minimum VRAM: 4 GB
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

Paraformer (FunASR)

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Conformer (ESPnet)

ESPnet

Insanely Fast Whisper

Kaldi

Wav2Vec 2.0

Canary (NVIDIA NeMo)