From Daily Prompts to Compute Credits: How Gemini’s Limits Changed
Google’s overhaul of Gemini doesn’t just add new models and features—it fundamentally changes how access is rationed. Instead of simple daily prompt counts, Gemini now uses a compute-based system that scores each interaction by complexity, enabled features, and the length of the ongoing chat thread. Heavy tasks like video generation with Gemini 3.5 Flash or multi-step coding sessions on 3.1 Pro can burn through this allowance quickly. Quotas refresh every five hours, but they also roll up into a stricter weekly cap, meaning intensive users can be locked out after a few long sessions. For light prompts—recipe ideas, short emails, and quick fact checks—most subscribers may never notice. But for people who chose Gemini specifically for its long-context reasoning and coding support, the new Gemini token limits feel like a downgrade disguised as modernization.
Real-World Frustration: Hitting a Five-Hour Cap with One Prompt
The backlash isn’t theoretical; users are running into the walls of Gemini’s AI usage caps in minutes. One AI Pro subscriber shared a video showing how a single avatar-based video generation request took their five-hour quota from 0% to 100% in about three to four minutes—and the video failed to generate. That incident, which drew a “Yikes, let us take a look!” response from Gemini lead Josh Woodward, has become a rallying example of how poorly calibrated the new system can be. Elsewhere, power users report hitting limits on Gemini 3.1 Pro after just 40 to 60 minutes of focused coding or research, or consuming most of their allowance in a handful of long, context-heavy messages. Workarounds like starting fresh chats help a bit, but they undermine one of Gemini’s core advantages: persistent, rich context across a long conversation.
Gemini vs. Claude and Others: A Tough Token Limit Comparison
As Google tightens Gemini quota restrictions, users are inevitably comparing them with rivals. Many now describe Gemini’s limits as being on par with, or even more awkward than, Claude’s well-known constraints on large-context work. Claude also enforces strict ceilings, but Gemini’s compute-based model can feel less predictable: the same-length prompt may consume vastly different amounts of quota depending on prior chat history or whether video and advanced features are enabled. Early testers say Gemini 3.5 Flash often feels less reliable than the older 3.1 Pro model, compounding frustration when a costly request produces weak or failed output. In this token limit comparison, Gemini’s paywalled tiers—AI Plus, AI Pro, and AI Ultra—are meant to appeal to different user types, but reports of rapid throttling on Pro make some subscribers question whether the mid-tier plan really supports sustained coding, research, and creative workflows as advertised.
Google’s Partial Rollbacks and the Bigger Industry Trend
Public criticism has already forced Google to tweak its approach. After quietly reducing Gemini AI Pro limits, the company faced accusations of a bait-and-switch from users who suddenly hit weekly caps after only a few work sessions. In response, Google DeepMind’s Varun Mohan announced multiple quota boosts inside the Antigravity environment, effectively raising those limits ninefold compared to the post-nerf baseline and repeatedly resetting weekly usage. The move shows Google recognizes that some users are pushing Gemini harder than expected. Yet these increases apply only within Antigravity; the broader Gemini usage caps and compute model remain. Taken together with similar moves by other AI providers, Gemini’s strategy signals a wider industry shift: high-intensity, long-context AI access is becoming heavily metered, even for paying users, as companies balance costs, system stability, and the pressure to keep subscription prices attractive.
