Tools/Audio & Speech/WhisperX

WhisperX

Whisper with word-level timestamps and speaker diarization

Open SourceSelf HostedOffline Capable

0.0 (0)

View on GitHub Documentation

About

WhisperX is a Whisper-based automatic speech recognition pipeline that adds word-level timestamps via forced phoneme alignment and uses voice-activity detection to batch audio for fast inference (around 70x realtime with the large-v2 model). It supports speaker diarization through external models and runs on GPU with CUDA 12.8 or on CPU. Useful for subtitles, transcript editing, and meeting-style audio where word offsets matter.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Audio & Speech
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: BSD-4-Clause
Added: Jan 29, 2026

Tags

speech-to-text whisper timestamps diarization

Related Tools

Featured

TextSpeakPro

Free text-to-speech generator with multiple voices, accents, and languages. No signup required.

Beginner

5.0 (1)

Coqui TTS

Deep learning toolkit for text-to-speech synthesis

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Featured

faster-whisper

CTranslate2-based Whisper with 4x faster transcription

Open SourceSelf HostedOffline

Easy

0.0 (0)

BigVGAN

Universal neural vocoder from NVIDIA that converts mel spectrograms into waveforms up to 44 kHz.

Open SourceSelf HostedOfflineGPU

Intermediate

0.0 (0)

GLM-4-Voice

End-to-end Chinese and English spoken dialogue model from Zhipu AI with streaming speech output.

Open SourceSelf HostedOfflineGPU

Intermediate

0.0 (0)

Bark

Transformer-based text-to-audio model from Suno

Open SourceSelf HostedOfflineGPU 8GB+

Easy

0.0 (0)

Browse all Audio & Speech tools

Mentioned in

Beyond Whisper: Parakeet, SenseVoice and ASR in 2026

Whisper is no longer the default: how Parakeet, SenseVoice, Kimi-Audio, Ultravox and Moshi compare on...

Max P

whisper.cpp vs faster-whisper: Speed and Accuracy Compared

Two leading open source paths to running OpenAI Whisper. One is a CPU-friendly C/C++ port, the other rides...

Billy C