From Flat Fees to Token Billing in AI Coding
Token billing in AI coding tools is a usage-based pricing model where developers pay according to the number of tokens an AI model reads and writes, tying costs directly to underlying API and compute consumption instead of a fixed subscription. On 1 June, GitHub Copilot dropped its flat-rate plans and moved every user to a token billing AI coding model built around GitHub AI Credits. A token is a small chunk of text; more complex interactions consume more tokens and therefore more credits. Code completions and Next Edit Suggestions remain unlimited, but chat, agent mode, and code review are now fully metered. This marks the end of predictable, all-you-can-use subscriptions for heavy users and signals a wider shift in usage-based AI tools, where the true cost of powerful models is pushed closer to the customer.
Why GitHub Copilot Pricing Had to Change
GitHub itself framed the Copilot change in blunt terms: the product no longer matches the simple editor helper it launched as. Modern Copilot now runs complex, agentic workflows that loop over your codebase, call tests, and retry for hours, burning through compute with no natural ceiling. Under the old GitHub Copilot pricing, a single flat fee meant light users were subsidising power users who ran long agent sessions. According to GitHub’s explanation quoted by Developer Tech, “GitHub Copilot simply is not the same product it was a year ago–it now powers far more complex, agentic workflows that consume far more compute.” Microsoft replaced Premium Request Units with GitHub AI Credits priced at USD 0.01 (approx. RM0.05) each, and tied charges to model API rates, turning every chat and code review into a metered resource that must pay for its own infrastructure.
The New Reality: Usage-Based AI Tools and Uncertain Bills
Token billing makes AI coding costs transparent, but not always comfortable. For Copilot, each plan now includes a monthly pool of GitHub AI Credits: 1,500 on Copilot Pro, 7,000 on Pro+, 20,000 on Max, with Business and Enterprise customers sharing 1,900 and 3,900 credits from organisational pools. Once you hit those limits, you only pay overages if you explicitly set an extra budget; leave it at zero and Copilot stops when credits run out. That behaviour turns the cloud-style meter into either a safety brake or a productivity cliff. Elsewhere in the ecosystem, the Cursor pricing model and other usage-based AI tools have followed this pattern. Anthropic’s new Claude Fable 5, for example, lists at USD 10 (approx. RM46) per million input tokens and USD 50 (approx. RM230) per million output tokens, and is already available inside Copilot under the same token-driven rules.
Investor Pressure and the End of Subsidised AI Coding Costs
Behind the pricing shift is a simple problem: the AI boom has been fuelled by investor subsidies that cannot last. On TechCrunch’s Equity podcast, Anthony Ha said, “This whole ecosystem is heavily, heavily subsidized by investor money… stuff that seems like it has no cost is, in fact, incredibly expensive.” As AI labs head toward public listings and file S‑1 documents, they must show paths to profit while their infrastructure bills climb. Token billing AI coding models let vendors match revenue to compute consumption instead of hiding limits behind vague “fair use” policies. Heavy users are already feeling the impact, with some developers reporting Copilot bills jumping from tens of dollars to hundreds when they turned on long-running agents. The flat-rate era gave developers psychological comfort, but it masked a reality that investors—and now customers—can no longer ignore.
Budget Strategies for Developers in a Token World
Usage-based AI tools do not have to wreck your budget if you treat tokens like any other cloud resource. First, watch the dashboard: Copilot’s preview bill feature and user-level budget controls show your projected spend before the end of the month. Second, set strict spending limits; with Copilot, leaving additional budget at zero ensures the tool stops instead of silently charging your card. Third, optimise prompts and workflows to cut wasted tokens—shorter, clearer instructions and narrower context windows cost less than sprawling, open-ended chats. For heavy agent use, reserve the strongest models only for tasks that need them and rely on cheaper options for routine queries. As Cursor pricing model changes and similar shifts spread, the old habit of treating AI as an unlimited flat-rate perk is over; treating AI coding costs as a line item is now part of the job.






