News

Voxtral | Mistral Artificial Intelligence

July 16, 2025

Mistral launches Voxtral: open-source speech understanding models with top-tier accuracy and deep audio comprehension.

Mistral just dropped Voxtral, two new speech AI models designed for both cloud and edge use. There’s a 24B parameter version for production workloads and a 3B “Mini” variant for local devices. Both are Apache 2.0 licensed and available via API and Hugging Face.

This moves past the usual open-source versus costly proprietary speech trade-off. Voxtral claims state-of-the-art transcription and semantic understanding at less than half the price of competitive APIs.

Key features:

Handles up to 30 minutes of speech for transcription, 40 minutes for understanding
Built-in Q&A and summarization directly from audio—no need to chain separate models
Automatic language detection with support for English, Spanish, French, Hindi, German, and more
Direct voice-triggered function calls for backend workflows without extra parsing
Strong text comprehension thanks to Mistral Small 3.1 backbone

Benchmarks show Voxtral outperforms OpenAI Whisper large-v3, GPT-4o mini Transcribe, Gemini 2.5 Flash, and rivals ElevenLabs Scribe in transcription accuracy. It holds state-of-the-art results on European and multilingual datasets.

Audio understanding tests prove Voxtral competitive with GPT-4o mini and Gemini 2.5 Flash, topping speech translation tasks.

Mistral offers enterprise options for private deployment, domain-specific fine-tuning, longer contexts, and speaker/emotion detection.

You can download Voxtral and Mini 3B from Hugging Face. API pricing starts at $0.001 per minute.

Mistral will demo Voxtral with Inworld’s speech-to-speech tech on August 6th. Register here.

Sample audio and full benchmarks are live on Mistral’s site.

Try Voxtral in Le Chat’s voice mode soon: record, transcribe, ask questions, and summarize audio on web or mobile.

Mistral promises more audio features coming in the next months.

Audio sample from Mistral: