WhisperX
Whisper with word-level timestamps and speaker diarization
About
WhisperX is a Whisper-based automatic speech recognition pipeline that adds word-level timestamps via forced phoneme alignment and uses voice-activity detection to batch audio for fast inference (around 70x realtime with the large-v2 model). It supports speaker diarization through external models and runs on GPU with CUDA 12.8 or on CPU. Useful for subtitles, transcript editing, and meeting-style audio where word offsets matter.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- Audio & Speech
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- BSD-4-Clause
- Added
- Jan 29, 2026
Related Tools
Free text-to-speech generator with multiple voices, accents, and languages. No signup required.
Deep learning toolkit for text-to-speech synthesis
Qwen Chat is an AI assistant for everyone, powered by the Qwen series models. It’s free to use, open to all, and ready to help with creativity, collaboration, and endless possibilities.
Transformer-based text-to-audio model from Suno
OpenAI's powerful speech recognition model
Fast, local neural text-to-speech for home automation