Tools/Speech-to-Text / Speech Recognition/NVIDIA NeMo ASR

NVIDIA NeMo ASR

Production-grade ASR models and toolkit by NVIDIA for speech recognition.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

About

Within the NVIDIA NeMo framework, the ASR collection supplies production-grade speech recognition models together with training and fine-tuning recipes. It covers architectures such as Conformer and FastConformer and pretrained families including Parakeet, which supports both offline and streaming transcription with configurable latency, and Canary, a multilingual model that transcribes and also translates speech. Checkpoints published on Hugging Face span a wide range of languages, and the toolkit handles data preparation, fine-tuning on custom audio, punctuation and capitalization restoration, and export paths for deployment on NVIDIA GPUs. Everything builds on PyTorch, and training realistically requires NVIDIA hardware. The code and most model checkpoints are released under the Apache 2.0 license, and the collection is common in commercial voice products and research groups that need accurate transcription they can adapt to their own domain.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Advanced (4/5)
License: Apache-2.0
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

NVIDIA NeMo ASR

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Conformer (ESPnet)

ESPnet

Insanely Fast Whisper

Kaldi

Wav2Vec 2.0

Canary (NVIDIA NeMo)