Tools/Speech-to-Text / Speech Recognition/Conformer (ESPnet)

Conformer (ESPnet)

Convolution-augmented transformer for speech recognition in ESPnet toolkit.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

About

Conformer is a speech recognition architecture that augments transformer encoders with convolution modules, letting a model capture global context through self-attention and local acoustic detail through convolution within the same block. ESPnet, an end-to-end speech processing toolkit built on PyTorch with Kaldi-style data pipelines, provides a widely used open implementation of Conformer alongside the newer E-Branchformer, trained with hybrid CTC/attention objectives or transducer losses. Because ESPnet covers far more than recognition, the same recipe system extends to text-to-speech, speech translation, enhancement and separation, speaker diarization, spoken language understanding, and voice conversion, with integration for pretrained models such as Wav2Vec 2.0, HuBERT, and Whisper and distributed training over Slurm or MPI. Released under the Apache 2.0 license, the toolkit is a standard choice for speech researchers and engineers reproducing or extending published recognition systems.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Advanced (4/5)
License: Apache-2.0
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

Conformer (ESPnet)

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

ESPnet

Insanely Fast Whisper

Kaldi

Wav2Vec 2.0

Pyannote Audio

Canary (NVIDIA NeMo)