AI Compute Costs Have Quietly Overtaken Payroll
For many enterprises, AI is no longer a cheap software upgrade—it is a new line item bigger than headcount. Microsoft’s decision to cancel most direct Anthropic Claude Code licenses and push staff to GitHub Copilot CLI came after internal usage rapidly blew through its token budget. Uber hit a similar wall, burning through its entire AI coding tool budget for 2026 in just four months as teams competed to consume more tokens. Inside other tech giants, informal leaderboards and “tokenmaxxing” cultures have had the same effect: usage has exploded faster than prices can fall. Nvidia executive Bryan Catanzaro now says AI compute costs significantly exceed employee payroll, undercutting the idea that replacing human work with AI is inherently cheaper. For finance leaders, AI infrastructure expenses are starting to look less like tools and more like a second salary bill.
When GPUs Eat the Production Budget
Nowhere is the imbalance clearer than in cutting-edge AI content production. Hell Grind, an AI-generated film premiering at a major festival, cost USD 500,000 (approx. RM2,300,000) to make—of which USD 400,000 (approx. RM1,840,000) went purely to compute. Every character, set and explosion was rendered by AI models, demanding thousands of verbose prompts per clip and over 16,000 generated video snippets for just the opening segment. That workflow turned GPU time into the dominant expense, overshadowing traditional costs like on-location shoots or physical sets. This is a preview of where many high-intensity AI projects are headed: budgets in which GPU compute dominates, while human roles shift toward prompt engineering, curation and orchestration. For enterprises, it is a warning that sophisticated AI workloads can very quickly turn from an innovation experiment into the single largest cost driver in a project.

Rising Prices, Agentic AI, and the End of Flat-Rate Comfort
Model providers are responding to demand by raising prices and abandoning customer-friendly flat-rate plans. As new, more capable models launch, token pricing is increasing even as underlying hardware promises to become more efficient in the future. The gap comes from how AI is used. Agentic AI systems—autonomous agents that plan, call tools and iterate—burn through tokens orders of magnitude faster than classic chatbots. Some enterprises found themselves spending hundreds or thousands worth of tokens while only paying a modest seat fee, prompting providers like Microsoft to ditch seat-based pricing for GitHub Copilot in favor of usage-based models. Analysts now warn that lower per-token rates do not translate into cheaper deployments when total token volume explodes. The economics are shifting toward metered AI, where every task, workflow and agent is carefully monitored for its compute footprint.
Guaranteed Capacity: OpenAI Turns Scarcity into a Product
With demand for advanced models outstripping available hardware, OpenAI has started selling predictability itself. Its new Guaranteed Capacity program lets customers lock in long-term access to AI compute for one, two or three years, with discounts scaling with commitment length. Sam Altman says the world will remain capacity-constrained for some time, and enterprises are asking for assurances that their AI products and workflows will not stall when demand spikes. Guaranteed Capacity helps OpenAI plan a compute build-out it expects to reach roughly USD 600 billion (approx. RM2,760,000,000,000) by 2030, while customers gain a defined slice of its infrastructure. Yet this does not mean cheaper AI for end users. Instead, it formalizes AI infrastructure as a scarce, premium utility—much like reserved cloud instances—baking long-term AI compute costs directly into enterprise AI budgets and product planning.

Hardware Efficiency Will Fatten Vendor Margins, Not Slash Your Bills
GPU and accelerator vendors are racing to make each token cheaper to serve. Nvidia has spent heavily, including a USD 20 billion (approx. RM92,000,000,000) acquihire of Groq, while AMD, Intel, major cloud providers and others rearchitect hardware and systems to cut the cost per token. However, most of this new gear will not be widely deployed until around 2027, and the savings are unlikely to be passed through fully. Cheaper tokens primarily improve provider margins and help them handle skyrocketing inference demand without collapsing under their own AI infrastructure expenses. For enterprises, the strategic choice is stark: absorb these rising AI compute costs in the hope of productivity and revenue gains, or push them downstream through higher prices, new usage-based tiers, or premium AI add-ons. In either case, AI is evolving from a novelty into a core utility that must be budgeted as carefully as payroll itself.
