Canary (NVIDIA NeMo)
Multilingual ASR model by NVIDIA supporting 4 languages with translation.
About
Canary by NVIDIA NeMo is a multilingual automatic speech recognition model that both transcribes and translates speech, originally covering English, German, French, and Spanish and extended in later versions to more languages with lower word error rates. It is built on the NeMo speech framework with open weights and demos on Hugging Face. Released under the Apache 2.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- Apache-2.0
- Minimum VRAM
- 8 GB
- Added
- Apr 3, 2026
Related Tools
Whisper extension providing word-level timestamps for transcription.
Convolution-augmented transformer for speech recognition in ESPnet toolkit.
Pre-trained speech models for STT, TTS, and VAD with simple PyTorch integration.
CLI tool that transcribes audio 10x faster using pipeline optimizations.
Self-supervised speech representation model by Meta for ASR.
Open-source speaker diarization and voice activity detection toolkit.