Best AI Models May 2026: Ranked Leaderboard
GPT-5.5, Claude Opus 4.7, DeepSeek V4 — full ranking and benchmark comparison.
Google didn't announce it. Users just found it. Here's everything we know about the most important quiet drop Google has made in years.
Gemini 3.2 Flash is Google's next unreleased Flash-tier AI model — spotted in leaked builds before any official announcement. It sits above Gemini 3.1 Flash-Lite in the model hierarchy and is positioned as a faster, cheaper alternative to Gemini 3.1 Pro, while delivering near-Pro performance on coding and creative tasks.
The Flash branding has always meant speed and efficiency. Gemini 3 Flash — the currently released model — delivers Pro-level intelligence at Flash-level latency and cost. Gemini 3.2 Flash appears to push that ratio even further.
Google has not officially confirmed the model. Everything below is based on leaks, user reports, and data extracted from AI Studio API logs as of May 5–6, 2026.
Two discovery channels surfaced simultaneously on May 5, 2026.
First: A Reddit user on r/GeminiAI noticed their iOS Gemini app cycling through model versions in real time over 24 hours — shifting from Gemini 3 Flash to 3.1, then landing on 3.2 Flash. Alongside the model, they spotted a completely redesigned interface called 'Liquid Glass' — a pill-shaped prompt box, pulsating gradient background, and a model picker moved to a top-left dropdown.
Second: Gemini 3.2 Flash was found running silent benchmarks on the Eleuther AI Arena (also known as LM Arena), a third-party model evaluation platform. Google has historically used Arena for pre-launch stress testing, making this a strong signal of imminent release.
The leak also revealed a new Agents (Beta) tab in the Gemini sidebar — currently leading to a black screen, but clearly a placeholder for upcoming agentic features.
Leaked data from Google AI Studio puts Gemini 3.2 Flash at $0.25 per million input tokens and $2.00 per million output tokens. Here's how that stacks up against the current Gemini Flash family:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Gemini 3.2 Flash | $0.25 | $2.00 |
| Gemini 3 Flash | $0.50 | $3.00 |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 |
| Gemini 3.1 Pro | $1.50 | $5.00 |
If accurate, Gemini 3.2 Flash would be priced identically to 3.1 Flash-Lite on input tokens but considerably more expensive on output — suggesting it's positioned as a mid-tier model, not a pure cost play. The output price of $2.00/M is actually below Gemini 3 Flash's $3.00/M, making it a compelling upgrade path for teams currently on that model.
Important caveat: these numbers come from API Studio metadata in an unreleased build. Google has not confirmed them and pricing could change before launch.
Gemini 3.2 Flash's unofficial debut on LM Arena produced the most concrete performance signal we have. Early testers ran it through a range of tasks — and the results were striking.
The most-shared test: an ASCII animation benchmark. One user asked the model to generate a full-screen HTML ASCII animation of a detailed city on a hill with moving elements. Gemini 3.2 Flash produced a working animated city skyline — complete with a functioning windmill and building lights — in under two minutes. For comparison:
On LM Arena's structured evaluations, Gemini 3.2 Flash showed particular strength in three areas:
My honest take: a Flash model outperforming 3.1 Pro on creative coding tasks is the real headline here. If those Arena results hold up at general availability, this won't just be a cost-efficient alternative to 3.1 Pro — it may be the better model for certain workloads.
Where does 3.2 Flash fit in Google's increasingly crowded model lineup? Here's a positioning summary:
One noteworthy signal from the iOS leak: the new 3.1 Lite tier now replaces the dedicated Thinking option. Instead of a separate model, thinking is now a global toggle across all models. This mirrors what Google did with Gemini 3.1 Flash-Lite's expanded reasoning support — suggesting a broader platform shift toward thinking as a universal dial, not a model-level feature.
The naming is actually the most strategically important part of this leak. Most observers expected Google to jump from Gemini 3.1 to Gemini 3.5 — following the pattern set by OpenAI and others. Instead, Google appears to be moving to incremental versioning: 3.0, 3.1, 3.2, and presumably beyond.
This is a deliberate signal. It suggests Google is shifting to a more software-like release cadence — smaller, more frequent model updates rather than infrequent major releases. For developers, that's actually good news: it means improvements ship faster, migration paths are more predictable, and you're less likely to get blindsided by a massive capability jump that breaks your prompts.
The contrarian read: incremental versioning can also mask stagnation. If the gap between 3.1 and 3.2 is smaller than the leap between 3.0 and 3.1, calling it '3.2' might be Google buying time while the real next-gen model (Gemini 4?) remains in development.
Either way, Google's track record of shipping meaningful Flash updates is strong. The jump from Gemini 2.5 Flash to Gemini 3 Flash represented a generational leap in quality for everyday tasks. If 3.2 Flash does the same for coding and agentics, it will matter.
Google I/O 2026 is shaping up to be one of the most AI-dense conferences in the company's history. Based on leaks, official teasers, and product roadmap signals, here's what we're expecting:
Gemini 3.2 Flash is the most likely candidate for a formal announcement. Whether Google reveals additional 3.2 variants (Pro? Deep Think?) remains unclear, but the iOS leak strongly suggests 3.2 Flash will be the first out the door.
Google confirmed AI glasses with Warby Parker and Gentle Monster as eyewear partners, with Gemini AI and Project Astra handling the visual intelligence layer. A Q4 2026 consumer launch is expected.
Project Astra — Google's universal AI assistant with vision, memory, and tool use — is expected to get significant updates. It's the AI layer powering the most advanced Gemini Live features.
Google Cloud CEO Thomas Kurian teased 'a new version... very, very soon' on April 24. Whether that's Gemini 3.2 or an early Gemini 4 preview is still unclear.
Google is also expected to preview Aluminum OS (Android for PCs), Android 17, and agentic AI features across the Workspace suite.
Gemini 3.2 Flash is Google's next unreleased Flash-tier AI model, spotted in leaked iOS app builds and AI Studio metadata on May 5, 2026. It reportedly delivers performance above Gemini 3 Flash and near or exceeding Gemini 3.1 Pro on coding tasks, at a significantly lower price point.
No official date has been announced. Leaks list May 5, 2026 as the internal target date, though no public launch occurred. Google I/O 2026 (May 19–20) is the most likely window for an official reveal.
Based on leaked API Studio data: $0.25 per million input tokens and $2.00 per million output tokens. This would make it cheaper than Gemini 3 Flash ($0.50/$3.00) on output, and priced identically to Gemini 3.1 Flash-Lite on input.
Based on early Arena results, Gemini 3.2 Flash outperforms Gemini 3.1 Pro on certain creative coding tasks — including the well-circulated ASCII animation benchmark where 3.1 Pro produced broken code while 3.2 Flash succeeded in under two minutes.
It is not yet officially available. A small number of iOS users on version 1.2026.1710205 of the Gemini app have seen it appear via A/B testing. Developers can monitor Google AI Studio for early access.
Liquid Glass is a major visual overhaul spotted alongside the Gemini 3.2 Flash leak. It introduces a pill-shaped prompt input box, pulsating gradient background, and a top-left model picker dropdown. The rollout appears to be an A/B test.
Based on the Gemini 3 series architecture, a 1M token context window is expected — consistent with Gemini 3 Flash and 3.1 Flash-Lite. This has not been officially confirmed for 3.2 Flash specifically.
GPT-5.5, Claude Opus 4.7, DeepSeek V4 — full ranking and benchmark comparison.
Benchmark and cost comparison with real dollar estimates for 10M+ monthly API calls.
Deep-dive covering SWE-bench, GPQA Diamond, and ARC-AGI-2 scores.
Everything about Google's latest text-to-speech model.
Google's unified AI stack with multimodal embedding capabilities.
Complete guide to Gemini features across Google Workspace in 2026.