Pyannote Audio
Open-source speaker diarization and voice activity detection toolkit.
About
pyannote.audio is a Python toolkit for speaker diarization, voice activity detection, overlapped-speech detection, and speaker embedding extraction, built on PyTorch. It ships pretrained pipelines that can be fine-tuned on user data and exposes a Pipeline API that loads models directly from the Hugging Face Hub. The community-1 pipeline is the current open-source baseline; a paid precision-2 pipeline is offered alongside. MIT licensed.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- MIT
- Minimum VRAM
- 4 GB
- Added
- Apr 3, 2026
Related Tools
Multilingual ASR model by NVIDIA supporting 4 languages with translation.
Convolution-augmented transformer for speech recognition in ESPnet toolkit.
Pre-trained speech models for STT, TTS, and VAD with simple PyTorch integration.
CLI tool that transcribes audio 10x faster using pipeline optimizations.
Self-supervised speech representation model by Meta for ASR.
Whisper extension providing word-level timestamps for transcription.