ESPnet
End-to-end speech processing toolkit covering ASR, TTS, and speech translation.
About
ESPnet is an end-to-end speech processing toolkit supporting ASR, TTS, speech translation, speech enhancement, and more. Includes Conformer, Transformer, and other architectures. Widely used in research. Developed by Johns Hopkins, CMU, and others. Apache 2.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Expert (5/5)
- License
- Apache-2.0
- Minimum VRAM
- 8 GB
- Added
- Apr 3, 2026
Related Tools
Whisper extension providing word-level timestamps for transcription.
Multilingual ASR model by NVIDIA supporting 4 languages with translation.
Convolution-augmented transformer for speech recognition in ESPnet toolkit.
Pre-trained speech models for STT, TTS, and VAD with simple PyTorch integration.
CLI tool that transcribes audio 10x faster using pipeline optimizations.
Open-source speaker diarization and voice activity detection toolkit.