Tools/Speech-to-Text / Speech Recognition/Pyannote Audio

Pyannote Audio

Open-source speaker diarization and voice activity detection toolkit.

Open SourceSelf HostedOffline CapableGPU Required (4GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

pyannote.audio is a Python toolkit for speaker diarization, voice activity detection, overlapped-speech detection, and speaker embedding extraction, built on PyTorch. It ships pretrained pipelines that can be fine-tuned on user data and exposes a Pipeline API that loads models directly from the Hugging Face Hub. The community-1 pipeline is the current open-source baseline; a paid precision-2 pipeline is offered alongside. MIT licensed.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: MIT
Minimum VRAM: 4 GB
Added: Apr 3, 2026

Tags

stt diarization speaker voice-activity verification

Related Tools

Conformer (ESPnet)

Speech-to-Text / Speech Recognition

Convolution-augmented transformer for speech recognition in ESPnet toolkit.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

ESPnet

Speech-to-Text / Speech Recognition

End-to-end speech processing toolkit covering ASR, TTS, and speech translation.

Open SourceSelf HostedOfflineGPU 8GB+

Expert

0.0 (0)

Insanely Fast Whisper

Speech-to-Text / Speech Recognition

CLI tool that transcribes audio 10x faster using pipeline optimizations.

Open SourceSelf HostedOfflineGPU 6GB+

Easy

0.0 (0)

Kaldi

Speech-to-Text / Speech Recognition

Established speech recognition toolkit used in research and production systems.

Open SourceSelf HostedOffline

Expert

0.0 (0)

Wav2Vec 2.0

Speech-to-Text / Speech Recognition

Self-supervised speech representation model by Meta for ASR.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

Canary (NVIDIA NeMo)

Speech-to-Text / Speech Recognition

Multilingual ASR model by NVIDIA supporting 4 languages with translation.

Open SourceSelf HostedOfflineGPU 8GB+

Intermediate

0.0 (0)

Browse all Speech-to-Text / Speech Recognition tools

Mentioned in

Beyond Whisper: Parakeet, SenseVoice and ASR in 2026

Whisper is no longer the default: how Parakeet, SenseVoice, Kimi-Audio, Ultravox and Moshi compare on...

Max P