A Flash-Tier Model That Outruns Pro
Gemini 3.5 Flash signals a shift in how AI model tiers are defined. Announced at Google I/O, it is officially a Flash-tier system, historically the label for faster, cheaper versions of flagship Pro models. Yet on multiple benchmarks, Gemini 3.5 Flash now beats Gemini 3.1 Pro, Google’s mainline model launched only a few months earlier. That includes coding benchmarks and real-world agentic AI tasks, where the new model routinely posts higher scores while maintaining much lower latency. Google describes Gemini 3.5 Flash as delivering frontier-level intelligence without the traditional trade-off between quality and speed. It is also positioned as the strongest agentic and coding model in the Gemini lineup to date. This inversion of the usual hierarchy—where an efficiency-focused model leapfrogs a recent Pro release—highlights how quickly AI model performance is accelerating and how unstable old product tiers have become.

Benchmark Results: Coding, Tools, and Agentic AI Tasks
On coding benchmarks, Gemini 3.5 Flash posts a 76.2 percent score on Terminal-Bench 2.1, ahead of Gemini 3.1 Pro’s 70.3 percent. For scaled tool use, it reaches 83.6 percent on MCP Atlas, again surpassing the Pro model’s 78.2 percent. In multimodal reasoning, it records 84.2 percent on the CharXiv Reasoning benchmark, putting its understanding and reasoning in the same conversation as larger frontier models. The largest jump appears in agentic AI tasks. On GDPval-AA, a benchmark designed to measure real-world agentic behavior and multi-step problem-solving, Gemini 3.5 Flash scores 1656 Elo, far above Gemini 3.1 Pro’s 1314. That gap represents more than incremental tuning—it is a step-change in the model’s ability to plan, act, and reliably complete complex workflows, giving developers a more capable engine for automation and long-running agents.
Four Times Faster Than Frontier Models
Speed is the second part of Gemini 3.5 Flash’s story. Google says the model can output around 289 tokens per second, roughly four times faster than other frontier models. That throughput dramatically alters the economics and feasibility of AI agents that must operate continuously, handle long conversations, or work across massive codebases. Tasks that previously took developers days or auditors weeks can now be run in a fraction of the time while retaining frontier-level AI model performance. Crucially, this speed does not come at the expense of capability. Google positions Gemini 3.5 Flash as rivaling large flagship models on several dimensions, particularly in coding benchmarks and agentic workflows. By narrowing the quality gap with Pro-tier systems while operating at a substantially higher token rate, the model challenges the assumption that organizations must choose between latency and intelligence when deploying advanced AI applications.

Built for Long-Horizon, Action-Oriented AI
Gemini 3.5 Flash is engineered for long-horizon, agentic AI tasks rather than simple question-answering. It is designed to plan and reason across large codebases, orchestrate multiple subagents in parallel, and sustain multi-step workflows over extended periods. Under supervision, it can execute complex coding tasks, manage tooling via platforms like MCP Atlas, and support real-world automation scenarios where reliability and persistence matter as much as raw intelligence. Google’s Antigravity platform plays a central role here, providing an agent-first environment where Gemini 3.5 Flash can spin up specialized subagents to tackle parts of a workflow simultaneously. Early enterprise use cases include banks and fintechs automating multi-week processes. The model’s improved safeguards, including strengthened cyber and CBRN controls, are meant to keep this new level of autonomy within acceptable safety boundaries while still enabling practical, high-impact deployment.
New Default Model and a Compressed AI Hierarchy
Gemini 3.5 Flash is now Google’s default AI model, powering the Gemini app and AI Mode in Search worldwide, as well as the always-on personal agent Gemini Spark for early testers. Developers can access it via the Gemini API, Google AI Studio, Android Studio, and the Gemini Enterprise Agent Platform, placing frontier-level performance within reach across consumer, developer, and enterprise stacks. This release also illustrates how rapidly AI model tiers are collapsing. It took only a few months for a Pro-class model like Gemini 3.1 Pro to be overtaken by a Flash-tier successor. With Gemini 3.5 Pro still in internal testing, the pattern suggests a future where "Flash" no longer implies compromise, and frontier models are judged less by their label and more by their ability to act—planning, coding, and executing complex workflows at scale and speed.
