From Daily Prompts to Compute Credits: A Limits Overhaul Gone Wrong
Google’s recent overhaul of Gemini token limits replaced a simple daily prompt count with a complex, compute-based quota system. Instead of just counting calls, Gemini now estimates the “cost” of each interaction based on prompt complexity, enabled features, and the length of the conversation thread. Quotas reset every five hours and are also governed by broader weekly caps, especially on paid tiers tied to models like Gemini 3.5 Flash and 3.1 Pro. On paper, this shift should encourage fair sharing of resources. In practice, it has created new token limit issues that hit power users hardest. Heavy researchers, coders, and anyone using large context windows report burning through allowances in 40 to 60 minutes. For users paying for AI Pro-style plans, the move has made Gemini token limits feel less like a professional tool and more like a meter constantly running in the background.
Single Prompts, Five-Hour Caps: How Miscalibrated Limits Break Workflows
The most dramatic complaints focus on how quickly Gemini’s AI usage caps can be exhausted. One AI Pro subscriber demonstrated hitting a full five-hour limit with a single avatar-based video generation prompt that ran for just a few minutes and failed to deliver a result at all. Starting from 0% usage, the attempt raced to 100%, leaving the user locked out for hours despite getting nothing usable in return. Elsewhere, users testing Gemini 3.1 Pro report that a handful of complex messages can consume 60% of their short-term allowance, effectively ending serious work sessions well before they are done. These experiences suggest that the credit pricing for multimodal and long-context tasks is poorly calibrated. When one failed request can burn an entire window, Gemini’s token limits stop being a guardrail and start becoming a liability for anyone relying on stable, repeatable AI workflows.
Quota Whiplash: Antigravity’s 9x Gemini Quota Boost Reveals a Misfire
Backlash over Gemini’s nerfed limits has been especially loud among Antigravity users, who rely on Gemini for coding, research, and multi-hour projects. After a quiet reduction in AI Pro-style limits, Reddit quickly filled with accusations of a bait-and-switch as subscribers hit weekly caps after just a few sessions. Google’s response was telling. Varun Mohan from Google DeepMind first announced a tripling of Gemini rate limits and a reset of weekly quotas inside Antigravity. Shortly afterward, he announced another 3x increase, amounting to a 9x Gemini quota boost from the post-nerf baseline. Usage reportedly surged as soon as limits were raised, confirming that demand had been badly underestimated. Yet these improvements apply only inside Antigravity, leaving broader Gemini usage caps unchanged and, according to many users, still tighter than before. The rapid reversals underscore how far initial limits were from real-world needs.
Parity with Claude — But at a Shared Disadvantage
As the dust settles, many in the community note that Gemini’s new token limits now feel as restrictive as Claude’s. Instead of gaining an edge, Google appears to have reached parity at the level of a shared disadvantage: both ecosystems make sustained, high-intensity work harder than users would like. Gemini 3.5 Flash, promoted as fast and capable, is undercut by compute rules that throttle extended sessions, while access to older, trusted models like 3.1 Pro is heavily rationed. This mirrors complaints long leveled at Claude’s own usage caps, particularly for premium, high-context models. The result is a competitive landscape where leading AI tools converge not on capability alone, but on similarly aggressive ceilings. For professionals comparing platforms, the question becomes less which system is smarter and more which one interferes least with uninterrupted work — a framing that benefits none of the vendors.
A Reactive Pattern: Incremental Fixes Instead of User-Tested Limits
Google’s handling of Gemini token limits fits a familiar pattern: launch an ambitious update, underestimate real usage, then patch over the pain with incremental fixes. The new subscription hierarchy and compute-based caps arrived alongside model updates, only to be met with immediate backlash over throttling and perceived drops in reliability, especially with Gemini 3.5 Flash compared with 3.1 Pro. The fast, public responses from executives like Josh Woodward and Varun Mohan show that Google is listening, but also that its initial thresholds were set without enough real-world validation. Instead of debuting user-tested limits, the company is iterating in production while paying customers hit walls mid-project. Until Gemini usage caps are calibrated to support sustained professional workloads by default, every quota tweak will feel like another temporary fix—proof that Google is still chasing, rather than anticipating, how people actually use its most powerful AI models.
