Tools/Speech-to-Text / Speech Recognition/Canary (NVIDIA NeMo)

Canary (NVIDIA NeMo)

Multilingual ASR model by NVIDIA supporting 4 languages with translation.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

About

Canary is NVIDIA's family of multilingual speech models built on the NeMo toolkit, combining a FastConformer encoder with a Transformer decoder in an encoder-decoder design. The original Canary-1B has one billion parameters across 24 encoder and 24 decoder layers and covers English, German, French, and Spanish, performing both transcription and speech-to-text translation between English and the other three languages, with or without punctuation and capitalization. Training drew on 85,000 hours of speech, and the model uses concatenated SentencePiece tokenizers, one per language. Later releases extended the family to more languages with lower word error rates. Models load through NeMo, an open source Apache 2.0 speech framework, with tasks and languages specified through prompts or manifest files; weights are published on Hugging Face under their own terms, and the original 1B checkpoint carries a CC-BY-NC-4.0 license. Speech teams reach for Canary when transcription and translation are needed from a single model.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: Apache-2.0
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

Mentioned in

Building Real-Time Voice Agents: TEN, Pipecat, and LiveKit

A working guide to real-time voice agent stacks: latency budgets, turn detection, interruption handling,...

Max P

Beyond Whisper: Parakeet, SenseVoice and ASR in 2026

Whisper is no longer the default: how Parakeet, SenseVoice, Kimi-Audio, Ultravox and Moshi compare on...

Max P

Canary (NVIDIA NeMo)

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

ESPnet

Insanely Fast Whisper

Kaldi

Wav2Vec 2.0

Pyannote Audio

Conformer (ESPnet)

Mentioned in