QUICK ANSWER
grok-build-0.1 is available via the xAI API, OpenRouter, and Vercel AI Gateway from May 27, 2026. Pricing: $1.00 input / $2.00 output per million tokens with $0.20/M cached input. No output token cap. 256K context, 100+ tok/s, always-on reasoning, native MCP, tool calling, image input. Model identifier: grok-build-0.1. Best for: API-embedded coding agents, multi-agent pipelines, high-volume tool-calling workflows where cost matters.
API vs CLI - Two Different Products Built on the Same Model
This is a common point of confusion worth clarifying upfront. Grok Build (the CLI app) is xAI's terminal-based coding agent — a TUI tool available to SuperGrok ($30/month) and X Premium+ subscribers. It launched in beta on May 25, 2026 and is what you use interactively to work on your own codebase. grok-build-0.1 (the API model) is the underlying model that powers the CLI, now available for developers to call directly via API — meaning you can embed it in your own applications, pipelines, and agent frameworks without using the Grok Build CLI at all.
The underlying model was released on May 20, 2026. API availability was confirmed officially on May 27 via xAI's news page and simultaneously on OpenRouter and Vercel AI Gateway. The xAI release notes confirm the model slug is grok-build-0.1 and that it "performs best in agentic harnesses like Grok Build, Cursor, Hermes Agent, OpenClaw, Kilo Code, or OpenCode" — giving you a direct list of the agent frameworks where it has been tested and performs well.
Model Specs and Capabilities
| Spec |
grok-build-0.1 |
| Input price |
$1.00 / million tokens |
| Output price |
$2.00 / million tokens |
| Cached input price |
$0.20 / million tokens |
| Context window |
256,000 tokens |
| Output token cap |
None |
| Throughput |
100+ tokens/second |
| Reasoning |
Always-on (not configurable, not disableable) |
| Input modalities |
Text + image |
| Tool calling |
Yes - native |
| MCP support |
Yes - native |
| Structured outputs |
Yes |
| Model identifier |
grok-build-0.1 |
The always-on reasoning is an important implementation detail. Unlike Grok 4.3 where reasoning effort is configurable (none, low, medium, high), grok-build-0.1 always runs reasoning before producing its final output. Every response includes structured analysis before the answer. This means slightly higher latency per request than a non-reasoning model, but more reliable results on complex multi-step coding tasks. It is not adjustable — you cannot turn it off to save tokens on simpler tasks.
The no-output-token-cap is a practical advantage for long autonomous coding sessions that generate large amounts of code. Models with output caps (many enforce 4K or 8K limits by default) require workarounds in agentic pipelines to handle tasks that produce large outputs. grok-build-0.1 eliminates that constraint.
How to Access It on Each Platform
xAI API (direct)
curl https://api.x.ai/v1/responses -H "Content-Type: application/json" -H "Authorization: Bearer $XAI_API_KEY" -d '{
"model": "grok-build-0.1",
"input": [{
"role": "user",
"content": "Refactor this Python function to be async"
}]
}'
Get your API key at console.x.ai. Note: uses the Responses API format, not Chat Completions.
OpenRouter
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="x-ai/grok-build-0.1",
messages=[{"role": "user", "content": "Write a FastAPI endpoint"}]
)
OpenAI-compatible interface. Use x-ai/grok-build-0.1 as the model identifier on OpenRouter.
Vercel AI Gateway
// Use your Vercel AI Gateway API key
// Model identifier: xai/grok-build-0.1
// Endpoint: https://gateway.ai.vercel.com/v1
import { generateText } from 'ai'
import { gateway } from '@ai-sdk/gateway'
const { text } = await generateText({
model: gateway('xai/grok-build-0.1'),
prompt: 'Review this code for security issues'
})
Vercel AI SDK integration. Handles provider routing automatically. Free tier: $5 credits every 30 days.
Pricing Comparison - Why $1/$2 Is Significant
| Model |
Input (per 1M) |
Output (per 1M) |
Context |
Reasoning |
| grok-build-0.1 |
$1.00 |
$2.00 |
256K |
Always-on |
| GPT-5.4 (OpenAI) |
$2.00 |
$8.00 |
128K |
Optional |
| Claude Sonnet 4.6 (Anthropic) |
$3.00 |
$15.00 |
1M |
Optional |
| Gemini 3.5 Flash (Google) |
$1.50 |
$9.00 |
1M |
Optional (tiered) |
| Claude Haiku 4.5 (Anthropic) |
$1.00 |
$5.00 |
200K |
None |
At $1.00/$2.00, grok-build-0.1 is the cheapest coding-specialized model with always-on reasoning available via API today. Claude Haiku 4.5 matches on input at $1.00 but costs 2.5x more on output ($5.00 vs $2.00) and has no reasoning capability. Gemini 3.5 Flash has a larger context window (1M vs 256K) but costs 50% more on input and 4.5x more on output. For output-heavy workloads — long code generation, large file refactors, extended agent sessions — grok-build-0.1's $2.00 output rate is the most competitive number in this tier.
Native MCP Support - What It Means in Practice
Native MCP support means grok-build-0.1 can connect to any MCP server without a translation layer. Where many models require a middleware adapter to convert MCP tool calls into the model's native tool format, grok-build-0.1 understands the MCP protocol directly. For developers building agent pipelines on existing MCP infrastructure - using servers for GitHub, Linear, Notion, Google Workspace, file systems, or custom internal tools - the model plugs in without changes to the MCP server configuration.
xAI shipped a Connectors layer alongside Grok Build that includes pre-built integrations with GitHub, Notion, Linear, Google Workspace, Microsoft 365, Vercel, and Canva, plus Bring-Your-Own-MCP support for connecting any MCP-compatible tool. This integration catalog is narrower than Claude Code's 3,000+ MCP server ecosystem, but the pre-built connectors for the most common developer and professional tools cover most production agent workflows.
Decision Framework - When to Use grok-build-0.1 vs Claude Code vs Codex
Use grok-build-0.1 when:
- Cost is the primary constraint - $2.00 output is cheaper than every comparable reasoning model
- You need always-on reasoning without paying for a premium tier
- Your pipeline runs high output volume (no output token cap is a practical advantage)
- You are building on Vercel and want native AI Gateway integration
- You need 100+ tok/s throughput for latency-sensitive agent loops
Use Claude Code (API) when:
- You need 1M token context for large codebase work
- Output quality on complex architecture tasks is more important than cost
- You need the full 3,000+ MCP server ecosystem
- Your workload is SWE-bench Pro class - hard multi-file real-world problems
Use Codex (API / GPT-5.5) when:
- You need async cloud execution sandboxes with computer use
- You want desktop computer use integrated with coding
- You are already in the OpenAI ecosystem and consolidation matters
- You need the SWE-bench Verified benchmark leader (88.7%)
Frequently Asked Questions
Do I need a SuperGrok subscription to use grok-build-0.1 via API?
No. The API model is separate from the CLI subscription. You pay per token at $1.00/$2.00 per million using an xAI API key from console.x.ai. SuperGrok and X Premium+ subscriptions give you access to the Grok Build CLI app; the API model is a separate pay-per-use product billed directly to your xAI account.
Does grok-build-0.1 use the Chat Completions API or the Responses API?
The xAI direct API uses the Responses API format (as shown in the curl example above with the "input" field rather than "messages"). OpenRouter and Vercel AI Gateway both provide OpenAI-compatible Chat Completions interfaces, so if you are routing through those platforms you can use standard openai.ChatCompletion client code.
Can grok-build-0.1 access the internet or X/Twitter data in real time?
Not automatically via the API. Real-time X data access is a feature of the Grok consumer app and the Grok Build CLI app, where it is built into the product. Via the raw API, grok-build-0.1 does not automatically have internet access or X data access unless you provide a web search or X API tool call via MCP or function calling. You can build this yourself by attaching an MCP server or tool that provides web search capability.
Is there a rate limit on the xAI API?
xAI does not publish specific rate limits publicly. The practical limits depend on your API tier and usage history. For high-volume production workloads, contact xAI's enterprise team at enterprise.x.ai to discuss dedicated capacity. OpenRouter provides rate limit transparency through its dashboard if you prefer routing through them.