Speech-to-Text / Speech Recognition AI Tools
Open-source automatic speech recognition (ASR) models and tools for transcribing and translating audio.
Open-source automatic speech recognition (ASR) models and tools for transcribing and translating audio.
General-purpose speech recognition model by OpenAI trained on 680K hours of multilingual audio.
High-performance C/C++ port of Whisper for CPU-based speech recognition.
Offline speech recognition toolkit supporting 20+ languages with small models.
End-to-end speech recognition engine by Mozilla using TensorFlow.
Established speech recognition toolkit used in research and production systems.
Production-grade ASR models and toolkit by NVIDIA for speech recognition.
All-in-one conversational AI toolkit for speech recognition, enhancement, and more.
Pre-trained speech models for STT, TTS, and VAD with simple PyTorch integration.
Distilled version of Whisper that is 6x faster with minimal accuracy loss.
CLI tool that transcribes audio 10x faster using pipeline optimizations.
JAX-based Whisper implementation optimized for TPU/GPU with 70x+ speedup.
Industrial-grade ASR toolkit by Alibaba with Paraformer non-autoregressive models.
End-to-end speech processing toolkit covering ASR, TTS, and speech translation.
Cross-platform speech recognition using ONNX Runtime for on-device ASR.
Self-supervised speech representation model by Meta for ASR.
Multilingual ASR model by NVIDIA supporting 4 languages with translation.
Speech understanding model by Mistral AI for transcription and analysis.
Non-autoregressive ASR model by Alibaba achieving fast parallel transcription.
Convolution-augmented transformer for speech recognition in ESPnet toolkit.
Whisper extension providing word-level timestamps for transcription.
Open-source speaker diarization and voice activity detection toolkit.