What Is Grok Voice Think Fast 1.0?
grok-voice-think-fast-1.0 is xAI's new flagship voice model, released April 24, 2026. Purpose-built for enterprise voice agent deployments — customer support lines, phone sales, appointment booking, and multi-turn workflows requiring reliable structured data extraction and high-volume tool calling. Runs on xAI's fully in-house voice stack (custom VAD, tokenizer, audio models) — the same infrastructure serving Grok mobile apps, Tesla vehicles, and Starlink customer support. API access via WebSocket at wss://api.x.ai/v1/realtime, OpenAI Realtime API compatible, at $0.05 per minute flat.
Benchmark: #1 on τ-voice Bench
τ-voice Bench evaluates full-duplex voice agents under realistic conditions — telephony audio, background noise, heavy accents, and natural interruptions. grok-voice-think-fast-1.0 takes the top position at launch. On phone call entity recognition, Grok STT reports 5.0% word error rate versus ElevenLabs at 12.0%, Deepgram at 13.5%, and AssemblyAI at 21.3%.
Core Capabilities
Background Reasoning With No Latency Impact
The model reasons through edge cases in real-time without adding delay to audio output. It checks its work before speaking — catching obvious errors without making the caller wait. xAI's published example: when asked which months contain the letter X, a typical model answers confidently and wrongly. This model catches the error first.
Structured Data Extraction Under Noise
Collects email addresses, street addresses, phone numbers, full names, and account numbers — even from callers speaking quickly with strong accents. Handles natural speech corrections mid-sentence: extracts the intended value, invokes the lookup tool with the corrected parameter, reads back the normalized result for confirmation.
High-Volume Tool Orchestration
The Starlink production deployment runs 28 distinct tools across hundreds of support and sales workflows per session. Trained specifically for continuous, parallel tool calling — not occasional lookups — maintaining accuracy under that load throughout a full customer interaction.
Starlink Production Results
| Metric | Result |
| Phone sales conversion | 20% |
| Autonomous support resolution | 70% |
| Tools per session | 28 |
Vendor-reported figures. The Starlink line (+1 888 GO STARLINK) is publicly callable and independently testable.
Pricing
| Surface | Price | Notes |
| Voice Agent API | $0.05 / min | Flat rate. Tools billed separately. |
| OpenAI Realtime API | ~$0.10+ / min | Token-based; blended cost typically higher |
| STT Batch | $0.10 / hr | Standalone transcription, 25+ languages |
| TTS | $4.20 / 1M chars | 5 voices, 20 languages, speech tags |
Limitations
No persistent cross-session memory: Full context within a session; each call starts fresh unless prior context passed via API.
Voice only: No computer use, image generation, or document output.
No formal compliance certs: No SOC 2 or ISO 27001 as of April 24, 2026.
Tool costs stack: $0.05/min is connection time only — high-frequency tool sessions add meaningful per-call cost.
Verdict
grok-voice-think-fast-1.0 is the strongest purpose-built enterprise voice agent available as of April 2026. Leads τ-voice Bench on real-world conditions, costs half what OpenAI Realtime API costs in production, and is already running at commercial scale on Starlink's global support line. For any team evaluating AI voice agents for customer support or phone sales, this is the benchmark to beat.