Speech-to-Text / Speech Recognition AI Tools
Open-source automatic speech recognition (ASR) models and tools for transcribing and translating audio.
Open-source automatic speech recognition (ASR) models and tools for transcribing and translating audio.
High-performance C/C++ port of Whisper for CPU-based speech recognition.
General-purpose speech recognition model by OpenAI trained on 680K hours of multilingual audio.
Multilingual ASR model by NVIDIA supporting 4 languages with translation.
Convolution-augmented transformer for speech recognition in ESPnet toolkit.
Pre-trained speech models for STT, TTS, and VAD with simple PyTorch integration.
CLI tool that transcribes audio 10x faster using pipeline optimizations.
Self-supervised speech representation model by Meta for ASR.
JAX-based Whisper implementation optimized for TPU/GPU with 70x+ speedup.
Offline speech recognition toolkit supporting 20+ languages with small models.
End-to-end speech recognition engine by Mozilla using TensorFlow.
Established speech recognition toolkit used in research and production systems.
Production-grade ASR models and toolkit by NVIDIA for speech recognition.
All-in-one conversational AI toolkit for speech recognition, enhancement, and more.
Distilled version of Whisper that is 6x faster with minimal accuracy loss.
Industrial-grade ASR toolkit by Alibaba with Paraformer non-autoregressive models.
End-to-end speech processing toolkit covering ASR, TTS, and speech translation.
Cross-platform speech recognition using ONNX Runtime for on-device ASR.
Speech understanding model by Mistral AI for transcription and analysis.
Open-source speaker diarization and voice activity detection toolkit.
Non-autoregressive ASR model by Alibaba achieving fast parallel transcription.
Whisper extension providing word-level timestamps for transcription.