Google Gemini rate limits reset and new caps

What Google’s Latest Gemini Rate Limit Reset Means

Google’s latest reset of Gemini rate limits is a broad quota wipe and rule update that clears usage counters for free and paid users while introducing clearer, more predictable caps on how developers consume API capacity. This change sits on top of the newer compute-based system, where usage is measured by prompt complexity, model choice, tools, and chat length, instead of simple prompt counts. After developers reported hitting Gemini rate limits too quickly, especially with complex Gemini 3.1 Pro and 3.5 Flash workloads, Google responded by resetting all API quota counters back to zero and adjusting how much a single request can consume. For teams building on Gemini, the reset is both a clean slate and a signal that Google is still tuning its limits so experimentation, debugging, and day‑to‑day development are not blocked by opaque or unexpectedly strict caps.

Google Resets Gemini Rate Limits Again as 3.5 Flash Improves

From Prompt Counts to Compute: How Limits Are Changing

Google’s shift to compute-based Gemini rate limits means every call is scored by how demanding it is, not by a flat request count. Complex prompts, large file uploads, and long chats all consume more of a developer’s quota than short, text‑only questions. After the Google I/O 2026 rollout, many users complained that a handful of heavy Gemini 3.1 Pro prompts could drain most of their allowance. In response, Google is “capping the amount of quota a single prompt can use so you get more out of the Pro model,” according to coverage of Josh Woodward’s comments. The company has also clarified that failed requests do not count toward usage, easing concerns from teams stress‑testing uploads or advanced tools. Planned improvements include more detailed usage dashboards and notifications so developers can see which models and workloads are driving their quota consumption.

Gemini 3.5 Flash Fix and the Antigravity Quota Reset

Alongside these limit tweaks, Google has deployed an updated Gemini 3.5 Flash model inside its Antigravity environment to correct a performance regression. The earlier “Low-effort” variant cut token generation by roughly 45% versus the original Gemini 3.5 Flash, but developers reported sharp drop‑offs in output structure and reliability when tasks demanded deeper reasoning. The new iteration aims to close that blind spot, with Varun Mohan of Google DeepMind stating that it delivers higher endurance on harder software engineering tasks and performs better on difficult reasoning jobs. To encourage testing, Google has completely reset Gemini rate limits for all Antigravity users, both free and paid, so teams can benchmark the refined Gemini 3.5 Flash without worrying about leftover experimentation from the previous model. Effort-level variants such as Low, Medium, and High remain confined to Antigravity and are not exposed in the consumer Gemini app.

Free Tier Limits, Flash-Lite, and New Usage Caps

For free tier users, Google is easing the pressure by opening access to Gemini 3.1 Flash-Lite prompts without consuming quota, giving developers a low‑cost way to offload simpler tasks. This lighter path helps preserve remaining allowances for heavier calls to Pro or Gemini 3.5 Flash while keeping experimentation open for basic coding, drafting, and summarization. On the paid side, stricter per‑request caps on Gemini 3.1 Pro reduce the risk that a single multi‑file or long‑context query will exhaust an entire period’s developer usage caps. These changes sit alongside a fix for an Omni video bug that previously allowed one or two generations to drain large portions of quota. Together, the new free tier limits, Flash-Lite access, and Pro caps are designed to make usage costs more predictable while still encouraging developers to push the models on realistic workloads.

How Developers Should Adapt Their Gemini Usage Patterns

With a fresh API quota reset and more explicit Gemini rate limits, developers now have a window to tune their workflows. Teams should route lightweight tasks—like formatting code snippets or answering short questions—to Flash-Lite or the lighter Gemini 3.5 Flash variant where available, saving Pro calls for complex reasoning or large‑context operations. Monitoring the upcoming detailed usage breakdowns will be key to spotting which prompts over‑consume compute; refactoring those into smaller, staged calls can stretch quotas further. In Antigravity, developers should re‑benchmark the updated Gemini 3.5 Flash on their hardest tasks, comparing quality and token usage against earlier runs while quotas are fully reset. Many users are also asking for a weekly usage bar so they can see remaining allowance and reset timing at a glance, and Google has indicated it is watching that feedback, so more transparent tracking tools could arrive soon.