Whisper

General-purpose speech recognition model by OpenAI trained on 680K hours of multilingual audio.

Open SourceSelf HostedOffline Capable

0.0 (0)

Visit Website View on GitHub

About

Few open source releases have reshaped a field the way Whisper did for speech recognition. OpenAI's model is a Transformer sequence-to-sequence network trained on 680,000 hours of multilingual, multitask audio, which lets a single checkpoint handle transcription, speech translation into English, language identification, and voice activity detection across some 99 languages. Audio is processed as log-Mel spectrograms in sliding 30 second windows, with every task expressed as a token sequence so no separate pipeline stages are needed. Checkpoints range from tiny at 39 million parameters, needing about 1 GB of VRAM, up to large at 1.55 billion needing around 10 GB, plus English-only variants and a faster turbo model that trades away translation. Code and weights are MIT licensed and run locally on CPU or GPU through a simple Python API or CLI. Its accuracy and permissive license spawned an ecosystem of ports and optimizations, and it remains the default starting point for transcription features in both applications and research.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Easy (2/5)
License: MIT
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

Whisper

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Conformer (ESPnet)

ESPnet

Insanely Fast Whisper

Kaldi

Wav2Vec 2.0

Canary (NVIDIA NeMo)