OpenAI Whisper is an open-source automatic speech recognition model trained on 680,000 hours of multilingual audio. Released in 2022, it remains the benchmark for open-source transcription.
**Accuracy**
Whisper large-v3 achieves word error rates competitive with paid services like Google Speech-to-Text and Amazon Transcribe, particularly for English. It handles accents, background noise, and technical vocabulary better than most alternatives.
**Language Support**
Supports transcription in 99 languages and translation to English from any of those languages in a single pass.
**Local Execution**
Running locally means no data leaves your infrastructure — critical for privacy-sensitive use cases like medical or legal transcription. The large model requires a capable GPU; smaller models run on CPU.
**Speed**
Whisper large-v3 processes audio at roughly 10-15x real time on a modern GPU. The faster distil-whisper models run at 5-8x real time with minimal accuracy loss.
**Integration**
Available via OpenAI API at $0.006/minute for hosted inference. The open-source model can be self-hosted via Hugging Face transformers or faster-whisper.
**Verdict**
Whisper is the best free speech recognition option available. For most transcription needs, it matches or exceeds paid services.