Tools/Speech-to-Text / Speech Recognition/Voxtral

Voxtral

Speech understanding model by Mistral AI for transcription and analysis.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

About

Voxtral is Mistral AI's entry into speech, a pair of models that combine transcription with genuine audio understanding rather than stopping at speech-to-text. Because the models are built on a Mistral Small 3.1 language backbone, they can answer questions about a recording, summarize it, or trigger function calls directly from spoken intent without chaining a separate ASR system into an LLM. The lineup includes Voxtral at 24B parameters for production deployments and Voxtral Mini at 3B for local and edge use, both released as open weights under Apache 2.0 and also served through Mistral's API. A 32,000 token context window handles audio around 30 to 40 minutes long, and language detection is automatic across English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and others, with transcription accuracy Mistral reports as competitive with or ahead of Whisper. Developers building voice agents and enterprises that want private, self-hosted speech intelligence are the intended audience.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Speech-to-Text / Speech Recognition
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Website GitHub

Browse all Speech-to-Text / Speech Recognition tools

Voxtral

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Conformer (ESPnet)

ESPnet

Insanely Fast Whisper

Kaldi

Wav2Vec 2.0

Canary (NVIDIA NeMo)