AI Cost Reduction: Why Big Tech Is Pulling Back

From AI Gold Rush to Cost-Conscious Reality

AI cost reduction is the emerging phase in which enterprises shift from aggressive, hype-driven deployment of large AI systems toward disciplined spending based on measurable productivity and clear business value. This shift is driven by rising token usage bills, unclear links between AI activity and customer outcomes, and growing skepticism among executives and engineers. Companies such as Uber, Microsoft, and Klarna rushed to adopt AI tools across engineering, customer service, and internal workflows, often encouraged by internal leaderboards or usage targets. Now, many are discovering that enterprise AI ROI is harder to prove than early dashboards suggested. Token-based pricing means that the more employees rely on AI assistants, the more unpredictable the costs become. This has triggered an AI spending pullback, not because AI fails outright, but because its benefits arrive unevenly, while invoices arrive every month.

Uber’s Tokenmaxxing Backlash and Budget Shock

Uber’s experience has become a reference point for the tokenmaxxing backlash. The company rolled out Anthropic’s Claude Code to around 5,000 engineers and saw enthusiastic adoption: 95% used AI tools monthly, 70% of code commits were AI-driven, and agentic AI feature usage climbed from 32% to 84% in a single month. Yet Uber burned through its entire annual AI tools budget in four months, with per‑engineer monthly API costs between USD 500 and USD 2,000 (approx. RM2,300–RM9,200). Uber’s COO Andrew Macdonald said, “It’s very hard to draw a line between one of those stats and ‘Okay, now we’re actually producing 25% more useful consumer features.’” Variable token pricing has turned into a budgeting headache, and the company’s experience is fueling wider concerns that tokenmaxxing leads to AI spend with little visible enterprise AI ROI.

Why Tech Giants Are Pumping the Brakes on Expensive AI Experiments

Microsoft, Salesforce and the Agentic Coding Cost Squeeze

Microsoft, long positioned as an AI leader, is also tightening the reins. Internally, the company began revoking Claude Code licenses and steering engineers toward its own GitHub Copilot CLI, particularly in its Experiences and Devices division. Claude Code had become “perhaps a little too popular,” highlighting a problem: when third‑party tools win engineer loyalty, they can inflate token bills and weaken control over AI cost reduction efforts. Other enterprise adopters face similar pressure. Salesforce, which pushed agentic coding widely among its engineers, found its initial token budget “an almost absurd underestimate” as agents generated far more calls than expected. Across these cases, agentic coding’s promise of faster output runs straight into token‑priced reality. The result is a careful AI spending pullback, as leaders focus on which workflows justify high‑intensity AI and which should revert to cheaper, simpler tools.

Customer Service Experiments: Efficiency at the Expense of Quality

While engineering teams wrestle with tokenmaxxing, customer service operations are discovering the limits of AI-driven efficiency. Klarna removed about 700 roles and relied on an OpenAI‑powered chatbot that at its peak handled two‑thirds to three‑quarters of all customer interactions. The result was a 22% drop in customer satisfaction and generic responses that failed on complex queries. Klarna later rehired human agents, with its CEO admitting that a focus on cost and efficiency led to lower quality. A similar pattern emerged at a major bank that replaced 45 call‑centre agents with a voice bot, only to see call volumes rise and queues expand until managers returned to the phones. These reversals show how enterprise AI ROI can turn negative when efficiency metrics ignore empathy, nuance, and problem resolution, pushing companies to rebalance automation with human judgment.

From Hype Metrics to Hard ROI and Smarter AI Use

The broader AI spending pullback marks a shift from usage‑driven vanity metrics to measured outcomes. Many firms set internal goals to “use AI everywhere,” tracked token consumption, or rewarded teams for building faster with AI. Now, leaders and engineers are asking when and how to measure returns. Reports on Claude Code usage show that the top 10% of users consumed about ten times more tokens than the median developer while producing only about twice the output, suggesting diminishing returns from heavy tokenmaxxing. Executives hear from peers that budgets are being blown, and some investors warn that the current surge in token demand may be a “crazy, rushed, temporary phase.” Rather than abandon AI, companies are moving toward targeted deployments, tighter guardrails on token spend, and clearer benchmarks linking AI activity to shipped features, customer satisfaction, and revenue.