Tools/Speech-to-Text / Speech Recognition/SpeechBrain

SpeechBrain

All-in-one conversational AI toolkit for speech recognition, enhancement, and more.

Open SourceSelf HostedOffline CapableGPU Required (4GB+ VRAM)

0.0 (0)

About

SpeechBrain is an open-source PyTorch toolkit for conversational AI covering more than 20 speech and text processing tasks, including speech recognition, speaker and language identification, speech separation and enhancement, text-to-speech and vocoding, emotion recognition, spoken language understanding, and voice activity detection, with additional recipes for EEG signal analysis. Training is orchestrated through its Brain class with YAML-based hyperparameter files, and the toolkit supports dynamic batching, mixed precision, and multi-GPU distributed training. Users can start from over 100 pretrained models published on the Hugging Face Hub or reproduce results using more than 200 training recipes spanning 40+ datasets, backed by extensive tutorials and documentation. Development is community driven with roots at Mila and Concordia University, and the code is released under the Apache 2.0 license, which permits commercial use. The audience is largely speech researchers and graduate students, though the pretrained models also see production use as ready-made components.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Advanced (4/5)
License: Apache-2.0
Minimum VRAM: 4 GB
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

SpeechBrain

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Conformer (ESPnet)

ESPnet

Insanely Fast Whisper

Kaldi

Wav2Vec 2.0

Canary (NVIDIA NeMo)