May 2026 — What Actually Changed
- OpenAI / AWS — GPT-5.5 lands on Amazon Bedrock Apr 28, one day after Microsoft exclusivity ends. 4M weekly Codex users confirmed.
- DeepSeek — V4 drops Apr 24. V4-Flash at $0.14/$0.28/M tokens is the cheapest frontier-adjacent model in history. V4-Pro (1.6T params) is now the largest open-weight model ever.
- Kimi / Moonshot AI — K2.6 released Apr 20. 262K context, $0.60/$2.50/M tokens. Apache 2.0 CLI + modified MIT weights. 300-agent swarms now GA.
- xAI / Grok — Imagine Agent Mode (beta) Apr 30. Infinite canvas: brainstorm → images → video in one prompt. SuperGrok $30/mo required.
- Microsoft — Agent 365 launches May 1 at $15/user/mo. Enterprise governance for all AI agents across Copilot Studio, Foundry, and third-party systems.
- Anthropic — Claude Sonnet 4.8 confirmed in leaked source code, expected May 2026. Claude Mythos Preview confirmed for 11 cybersecurity partners only.
- Google — Gemini 3.1 Flash-Lite launches as first Flash-Lite in Gemini 3 series. Veo 3.1 adds video extension. Claude 3.7 Sonnet shutting down on Vertex May 11.
April ended with more significant model releases than most years manage in an entire quarter. GPT-5.5, DeepSeek V4, Kimi K2.6, Claude Opus 4.7, and Grok Imagine Agent Mode all shipped within days of each other. May opened immediately with Microsoft Agent 365 on May 1, and Claude Sonnet 4.8 is expected before the end of the month. This is every confirmed update that matters, organised by company, with the practical implication for each one.
Prices verified against official documentation and OpenRouter as of May 3, 2026. This page updates as new launches are confirmed throughout the month.
OpenAI — GPT-5.5, AWS Bedrock, and the End of Azure Exclusivity
Three things landed in the same 48-hour window at the end of April. On April 23, OpenAI released GPT-5.5 to Plus, Pro, Business, and Enterprise ChatGPT users. On April 27, Microsoft and OpenAI amended their partnership — ending seven years of Azure exclusivity and making OpenAI models available to any cloud provider. On April 28, GPT-5.5, Codex, and a new Amazon Bedrock Managed Agents product powered by OpenAI went live on AWS in limited preview.
GPT-5.5 matches GPT-5.4 per-token latency while reaching 93.6% on GPQA Diamond and 82.7% on Terminal-Bench 2.0. On SWE-Bench Pro it scores 58.6% — trailing Claude Opus 4.7's 64.3% on that specific benchmark, but leading on agentic terminal workflows. OpenAI says GPT-5.5 uses meaningfully fewer tokens for the same Codex tasks than GPT-5.4, meaning more work per dollar despite higher list prices. API pricing: $5 per million input tokens, $30 per million output tokens. The top-tier GPT-5.5 Pro variant runs $30/$180 per million tokens — roughly 12x GPT-5.4 for the highest-end reasoning work.
The AWS move is strategically more significant than the model release. More than 4 million people use Codex every week. AWS customers can now invoke GPT-5.5 through the same Amazon Bedrock APIs already used for Anthropic, Meta, and Amazon's own Nova family — and apply usage toward existing AWS spending commitments. Azure remains the primary OpenAI partner through 2032 under the restructured agreement, but the lock-in is gone. Enterprises with pre-negotiated AWS spending can now route OpenAI workloads through a single cloud account without establishing a separate Azure relationship.
What this means: The cloud AI market is now a three-hyperscaler competition. Every enterprise using Claude on Bedrock now has GPT-5.5 available through the same account, billing, and compliance stack. Model switching costs just dropped significantly.
DeepSeek — V4 Drops the Price Floor for Frontier-Adjacent AI
DeepSeek released V4-Pro and V4-Flash preview models on April 24. V4-Pro has 1.6 trillion total parameters with 49 billion active per request — the largest open-weight model available, overtaking Kimi K2.6 (1.1T) and more than doubling DeepSeek V3.2 (685B). V4-Flash runs 284 billion total parameters with 13 billion active. Both use a 1 million token context window and MIT licensing.
V4-Flash at $0.14 per million input tokens and $0.28 per million output tokens undercuts every major small model: GPT-5.4 Nano ($0.20/$1.25), Gemini 3.1 Flash-Lite ($0.25/$1.50), and Claude Haiku 4.5. V4-Pro at $1.74/$3.48 per million tokens is cheaper than Gemini 3.1 Pro ($2/$12), GPT-5.4 ($2.50/$15), Claude Sonnet 4.6 ($3/$15), and Claude Opus 4.7 ($5/$25). V4-Pro-Max outperforms GPT-5.2 and Gemini 3.0 Pro on reasoning benchmarks but trails GPT-5.4 and Gemini 3.1 Pro by approximately 3–6 months on knowledge tasks, per DeepSeek's own documentation. In Vals AI's Vibe Code Benchmark, V4 ranked first among open-weight models — approximately 10x V3.2's score — and outperformed closed-source Gemini 3.1 Pro.
DeepSeek says V4 has already replaced Claude Sonnet 4.5 as the primary agentic coding model used internally. The delivery quality is described as close to Opus 4.6 in non-thinking mode, with a gap remaining against Opus 4.6 in thinking mode.
What this means: The cost floor for frontier-adjacent AI dropped dramatically. V4-Flash is genuinely the cheapest option at its performance tier. Teams routing high-volume routine API tasks to V4-Flash can cut per-token costs by 80–90% versus Claude Sonnet 4.6 without a proportional quality loss on well-defined tasks. V4-Pro is the most credible option for enterprises that need self-hosting at near-frontier quality.
Kimi / Moonshot AI — K2.6 and the 300-Agent Swarm
Moonshot AI released Kimi K2.6 on April 20. The 1-trillion-parameter MoE architecture — 32 billion parameters active per request, 262,142 token context window — runs at $0.60/$2.50 per million tokens on the official Moonshot API, and $0.44/$2.00 on OpenRouter. In real-world coding benchmarks, K2.6 scored 87/100 (Tier A) — the only Chinese-developed model to reach that tier. On SWE-Bench Pro it outperformed GPT-5.4 at roughly one-fortieth the output token cost.
The open-source story is practically useful for compliance-constrained teams: the CLI ships under Apache 2.0, the model weights are available on HuggingFace under modified MIT, deployable with vLLM, SGLang, or KTransformers. Agent Swarm mode — coordinating up to 300 parallel subagents — is now generally available on paid plans. The Kimi Code CLI installs via npm, supports MCP servers configured for Claude Code without modification, and can be set as the backend for Claude Code by swapping the API endpoint. Subscription tiers start at Moderato ($19/mo) for K2.6 access plus Kimi Code; Agent Swarm unlocks from Allegretto ($39/mo).
What this means: Kimi K2.6 is the most credible cost-reduction lever for teams spending heavily on Claude Opus 4.7 API calls. Routing routine coding tasks to Kimi via the OpenAI-compatible endpoint costs 8–10x less per token. The Apache 2.0 CLI and MIT-licensed weights make it the only frontier-adjacent model teams can self-host and modify freely.
xAI / Grok — Imagine Agent Mode Launches on an Infinite Canvas
xAI began rolling out Grok Imagine Agent Mode in beta on April 30. It replaces the standard Grok chat interface with an infinite canvas where one text prompt triggers a full creative pipeline: brainstorm, image generation, editing, image-to-video conversion, video stitching, trim controls, fade effects, and export. Four preset workflow templates ship in the sidebar: Create Worlds, Short Film, UGC Product Stories, and Brand Identity.
The Short Film template handles prompts like "generate a 1-minute short film" end-to-end — the agent drafts a scenario, generates individual scene clips, stitches them, and produces a companion poster. Shareable custom Imagine templates are also rolling out, with three chain types: Photo → Video, Photo → Style Edit, and Photo → Edit → Video. A fourth Image Reference Edit type introducing @ mention syntax for reference images mid-prompt is in development.
Access requires SuperGrok ($30/mo) or X Premium+ ($40/mo). SuperGrok delivers 720p at up to 30 seconds per clip; the rollout is progressive and not all eligible accounts have the canvas interface yet. In March 2026 video benchmarks, Grok Imagine ranked first in Multi Image to Video Arena, Image-to-Video, and Video-to-Video categories — ahead of Veo 3.1, Sora, and Kling on those tasks. For the full breakdown, see our Grok Imagine Agent Mode guide.
What this means: This is the first live agentic creative canvas from a major AI lab available to general consumers. For content creators and brand teams, the UGC Product Stories template alone collapses what was previously a multi-tool, multi-subscription workflow into a single canvas session at $30/month.
Microsoft — Agent 365 and the Enterprise Governance Layer
Microsoft launched Agent 365 on May 1 at $15 per user per month — a dedicated governance and security control plane for AI agents, separate from Microsoft 365 Copilot. It manages agents built on Copilot Studio, Azure Foundry, and third-party systems. The same day, Microsoft launched the E7 "Frontier Suite" at $99 per user per month, bundling E5, Copilot, Agent 365, and the Entra Suite.
Agent 365 gives each deployed agent a unique identity, generates a full audit log of its actions, and integrates with enterprise security and compliance controls. It is the governance answer to a problem that has been slowing agentic AI adoption in large organisations: IT teams had no systematic way to audit, approve, or revoke access for autonomous AI agents running across their infrastructure. For organisations already running Copilot Wave 3 — which launched March 9 with Excel, Word, and PowerPoint agents plus multi-model orchestration across Claude, GPT, and Microsoft models — Agent 365 is the missing policy layer.
What this means: Enterprise-scale agentic AI deployment now has a compliance product. The primary IT blocker — "how do we audit what our agents are doing?" — has a $15/user/month answer from Microsoft. Expect adoption acceleration in regulated industries that have been waiting for this before deploying agents beyond pilot stage.
Anthropic — Sonnet 4.8 Incoming, Mythos Confirmed for Cyber Research
Anthropic has not officially announced Sonnet 4.8, but its existence is confirmed via the March 2026 Claude Code source code leak, which exposed internal version strings including sonnet-4.8. Based on Anthropic's release pattern — Sonnet ships 1–4 weeks after each Opus — the expected window is May 2026 following Opus 4.7's April 16 launch. Leaked references indicate Sonnet 4.8 will carry Opus 4.7's 3x vision resolution, xhigh effort level, and improved coding capabilities at the Sonnet pricing tier of $3/$15 per million tokens. That would represent a meaningful step up from Sonnet 4.6 for teams who cannot justify Opus pricing.
Claude Mythos Preview was confirmed in April. It is not a commercial model. Anthropic made it available to 11 companies for cybersecurity vulnerability research only, with no plans for general availability stated. The UK AI Security Institute found Mythos solved a 32-step cyber range called "The Last Ones" in 3 of 10 attempts — the first model to complete it end-to-end. The next best performer was Claude Opus 4.6, followed by GPT-5.4 and GPT-5.3 Codex tied.
On infrastructure: MCP crossed 97 million installs in Q1 2026 and moved under Linux Foundation open governance. Every major AI provider now ships MCP-compatible tooling — cementing it as the default inter-agent connection standard across the industry. Developers building multi-agent systems can treat MCP as settled infrastructure rather than an experimental protocol.
What this means: Sonnet 4.8 is the near-term Anthropic release most production teams should watch. If it delivers Opus 4.7-level vision and coding at Sonnet pricing, the cost-performance equation for Claude-based applications improves significantly. Check our Claude plan guide for updates when the release is confirmed.
Google — Gemini 3.1 Flash-Lite, Veo 3.1, and a Vertex Deprecation to Action Now
Google launched Gemini 3.1 Flash-Lite as the first Flash-Lite model in the Gemini 3 series at $0.25/$1.50 per million tokens — positioned for high-volume, latency-sensitive workloads where throughput matters more than top-end reasoning. Veo 3.1 added video extension in preview on Vertex AI, enabling generated videos to be extended frame-by-frame up to 4K output resolution with portrait video support across all resolutions. Gemini 2.5 Flash with Gemini Live API Native Audio reached General Availability, bringing Proactive Audio and Affective Dialog into production-ready deployments.
The operational item that requires immediate action: Claude 3.7 Sonnet is being shut down on Vertex AI on May 11, 2026. Any production applications using that model string on Google Cloud must migrate before that date. The recommended replacement is Claude Sonnet 4.6 (claude-sonnet-4-6), which is significantly more capable at the same $3/$15 per million token pricing. This shutdown affects Vertex AI only — the model remains on Anthropic's direct API. Gemini 1.5 models are already discontinued on Vertex; Gemini 3 Pro Preview was shut down and the endpoint now points to Gemini 3.1 Pro Preview.
What this means: Gemini 3.1 Flash-Lite fills the price tier below Flash for teams running high-volume classification or summarisation tasks. The Veo 3.1 extension is meaningful for teams building long-form video pipelines on Google infrastructure. But the most urgent item in this section is operational: if you run Claude 3.7 Sonnet on Vertex, migrate before May 11.
The Pattern Across All of It
Every update this month points in the same direction. Open-weight models — DeepSeek V4-Pro, Kimi K2.6 — are not "good for open-source." They are competitive on many production tasks at 5–15x lower cost than closed-source frontier models. The cloud wars ended as a strategic question: AWS, Azure, and GCP now all host the same frontier models following OpenAI's exclusivity expiry. Enterprise governance tooling has arrived at last: Agent 365 is the first purpose-built, major-vendor product in this category.
The practical implication: model-agnostic infrastructure is no longer a design principle — it is a cost-management necessity. The best model for a task in May 2026 may not be the best model in August 2026, and the cost of switching between cloud providers and model vendors just dropped significantly. Build your stack so that swapping a model is a parameter change, not a refactor.
Frequently Asked Questions
What is the biggest AI news in May 2026?
The most consequential development is the end of OpenAI's Azure exclusivity on April 27, 2026, followed immediately by GPT-5.5 launching on Amazon Bedrock on April 28. This opens every major enterprise cloud to OpenAI models for the first time, fundamentally changing how organisations procure and deploy AI. On the model pricing side, DeepSeek V4-Flash at $0.14 per million input tokens is the most aggressive cost drop in frontier-adjacent AI to date — MIT-licensed, open-weight, and available now.
Is GPT-5.5 available now and how much does it cost?
Yes. GPT-5.5 rolled out to ChatGPT Plus, Pro, Business, and Enterprise users from April 23, 2026, and is now the default model on paid ChatGPT tiers. API access launched April 24 at $5 per million input tokens and $30 per million output tokens. GPT-5.5 Pro — the highest-capability reasoning variant — costs $30 input / $180 output per million tokens. On Amazon Bedrock, GPT-5.5 is in limited preview as of April 28 and not yet available to all AWS accounts.
What is DeepSeek V4 and should I switch to it from Claude?
DeepSeek V4 comes in two variants. V4-Flash (284B parameters, 13B active) at $0.14/$0.28 per million tokens is cheaper than every major small model, including Claude Haiku 4.5. V4-Pro (1.6T parameters, 49B active) at $1.74/$3.48 per million tokens undercuts Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6, and Claude Opus 4.7 on cost. For well-defined tasks — code generation, summarisation, classification — V4-Flash delivers competitive performance at 80–90% lower cost than Claude Sonnet 4.6. For complex multi-file architecture work and the hardest reasoning tasks, Claude Opus 4.7 (87.6% SWE-Bench Verified) and GPT-5.5 remain stronger. The practical move: use V4-Flash for volume tasks, frontier models for complexity.
What is Grok Imagine Agent Mode and who can access it?
Grok Imagine Agent Mode is a beta feature that launched April 30 on the Grok web app. It replaces the chat interface with an infinite canvas where a single prompt triggers the full creative pipeline — image generation, editing, image-to-video, video stitching, trimming, and export — without leaving the page. Four preset templates cover Create Worlds, Short Film, UGC Product Stories, and Brand Identity. Access requires SuperGrok ($30/mo) or X Premium+ ($40/mo); the rollout is gradual and not all eligible accounts have the canvas yet. SuperGrok Lite ($10/mo) has limited Imagine access but Agent Mode availability at that tier is unconfirmed. Full details in our dedicated guide.
What is Microsoft Agent 365 and is it worth it?
Microsoft Agent 365 ($15/user/mo, launched May 1) is a governance and audit layer for AI agents — not a Copilot upgrade. It gives each deployed agent a unique identity, logs every action it takes, and integrates with enterprise security and compliance controls. It is relevant for organisations that have been holding back agentic AI deployment because IT teams had no way to see or control what autonomous agents were doing across company infrastructure. If your organisation is already running Copilot Wave 3 agents in Excel, Word, or SharePoint, Agent 365 is the compliance layer that makes those deployments enterprise-policy-compliant. The E7 Frontier Suite at $99/user/mo bundles it with E5, Copilot, and Entra.
When is Claude Sonnet 4.8 coming out?
No official date. The model was confirmed via the March 2026 Claude Code source code leak. Based on Anthropic's Sonnet-after-Opus release pattern and the April 16 Opus 4.7 launch, May 2026 is the expected window. Leaked code context suggests it will bring Opus 4.7's 3x vision resolution and xhigh effort level to the $3/$15 per million token Sonnet pricing. Monitor Anthropic's announcements and our Claude plan guide for confirmation.
Should I migrate off Claude 3.7 Sonnet on Google Vertex AI?
Yes, immediately. Claude 3.7 Sonnet is being shut down on Vertex AI on May 11, 2026. Migrate to Claude Sonnet 4.6 (claude-sonnet-4-6) before that date. Sonnet 4.6 scores 79.6% on SWE-Bench Verified, significantly outperforms 3.7 Sonnet on coding and reasoning tasks, and carries the same $3/$15 per million token API pricing. The shutdown affects Vertex AI only — the model remains available on Anthropic's direct API and on other platforms. If you are uncertain whether your applications use this model string, check your Vertex AI model configurations now.
How does Kimi K2.6 compare to Claude Code for coding tasks?
Kimi K2.6 scored 76.8% on SWE-Bench Verified versus Claude Opus 4.7's 87.6% — a real gap on complex multi-file work. For routine coding tasks, the cost difference is the deciding factor: Kimi K2.6 API at $0.60/$2.50 per million tokens versus Claude Opus 4.7 at $5/$25. Kimi Code CLI supports the same MCP servers as Claude Code without reconfiguration, and can be set as Claude Code's backend model by changing the API endpoint. The effective setup for most teams: Kimi K2.6 for high-volume routine tasks, Claude Opus 4.7 for architecture-level refactors and complex reasoning. See our full comparison in the Kimi vs Claude Code vs Codex guide.