Tools/Speech-to-Text / Speech Recognition/Insanely Fast Whisper

Insanely Fast Whisper

CLI tool that transcribes audio 10x faster using pipeline optimizations.

Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)

0.0 (0)

About

Insanely Fast Whisper wraps optimized Whisper inference in an opinionated command-line tool for on-device transcription. It combines Hugging Face Transformers, Optimum, and Flash Attention 2 with fp16 precision and batched inference, batch size 24 by default, to transcribe about 150 minutes of audio in roughly 98 seconds with Whisper Large v3 on an NVIDIA A100. The CLI supports OpenAI Whisper checkpoints and the distilled distil-whisper variants, handles both transcription and translation, produces chunk-level or word-level timestamps, and adds speaker diarization through Pyannote.audio. It runs on CUDA GPUs and on Apple Silicon via MPS, installs with pipx, and accepts a file path or URL as input. The project began as a benchmarking showcase for how fast Whisper could run with standard Transformers optimizations and grew into a community-maintained tool. Released under the Apache 2.0 license, it suits developers who want fast local transcription without writing pipeline code.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Easy (2/5)
License: MIT
Minimum VRAM: 6 GB
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

Insanely Fast Whisper

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Conformer (ESPnet)

ESPnet

Kaldi

Wav2Vec 2.0

Pyannote Audio

Canary (NVIDIA NeMo)