Silero Models
Pre-trained speech models for STT, TTS, and VAD with simple PyTorch integration.
About
Silero Models is a set of pretrained speech models for speech-to-text, text-to-speech, and voice activity detection that load through PyTorch Hub or a pip package in a single line. The models cover several languages, are designed for production use, and stay lightweight and fast on CPU. The available models are listed in a manifest file. Released under the MIT license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Beginner (1/5)
- License
- MIT
- Added
- Apr 3, 2026
Related Tools
Whisper extension providing word-level timestamps for transcription.
Multilingual ASR model by NVIDIA supporting 4 languages with translation.
Convolution-augmented transformer for speech recognition in ESPnet toolkit.
CLI tool that transcribes audio 10x faster using pipeline optimizations.
Self-supervised speech representation model by Meta for ASR.
Open-source speaker diarization and voice activity detection toolkit.