Tools/Speech-to-Text / Speech Recognition/Kaldi

Kaldi

Established speech recognition toolkit used in research and production systems.

Open SourceSelf HostedOffline Capable

0.0 (0)

Visit Website View on GitHub

About

Written in C++, Kaldi has been a foundation of automatic speech recognition research and production systems for well over a decade. The toolkit provides the building blocks for complete recognition pipelines: feature extraction, acoustic modeling spanning GMM and neural network approaches, and decoding based on finite state transducers through OpenFst, with linear algebra handled by LAPACK, OpenBLAS, or ATLAS. Extensive example recipes for standard corpora ship in the repository and have served as starting points for many research papers and commercial systems. Kaldi runs on Linux, macOS, Cygwin, and Windows, and can be cross compiled for Android and WebAssembly. It predates the current wave of end to end speech models and demands more setup and expertise than pip installable alternatives, but remains in active use across academia, industry, and teaching. Released under the Apache 2.0 license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Expert (5/5)
License: Apache-2.0
Added: Apr 3, 2026

Tags

stt asr research production cpp academic

Related Tools

Conformer (ESPnet)

Speech-to-Text / Speech Recognition

Convolution-augmented transformer for speech recognition in ESPnet toolkit.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

ESPnet

Speech-to-Text / Speech Recognition

End-to-end speech processing toolkit covering ASR, TTS, and speech translation.

Open SourceSelf HostedOfflineGPU 8GB+

Expert

0.0 (0)

Insanely Fast Whisper

Speech-to-Text / Speech Recognition

CLI tool that transcribes audio 10x faster using pipeline optimizations.

Open SourceSelf HostedOfflineGPU 6GB+

Easy

0.0 (0)

Wav2Vec 2.0

Speech-to-Text / Speech Recognition

Self-supervised speech representation model by Meta for ASR.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

Pyannote Audio

Speech-to-Text / Speech Recognition

Open-source speaker diarization and voice activity detection toolkit.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Canary (NVIDIA NeMo)

Speech-to-Text / Speech Recognition

Multilingual ASR model by NVIDIA supporting 4 languages with translation.

Open SourceSelf HostedOfflineGPU 8GB+

Intermediate

0.0 (0)

Browse all Speech-to-Text / Speech Recognition tools