AI ROI Measurement: From Token Maxxing to Value

The New AI Paradox: Measurable Usage, Invisible Impact

The modern enterprise AI paradox is the gap between detailed measurement of AI activity—such as tokens consumed or code generated—and the far weaker ability to prove that this activity improves products, margins, or customer value in a measurable way. Companies can now count every token, prompt, and code change with precision, yet still cannot show whether AI improves what customers see or what the business earns. This gap mirrors the old problem of using “lines of code” as a productivity metric: counting output says little about value. Organizations celebrate dashboards filled with AI usage graphs, but when executives ask which features, savings, or revenue came from those graphs, the answer is vague. The result is a growing tension between the visible cost of AI and the hidden, often unproven, return on that investment.

From Token Maxxing to Leaderboards: Usage as a Status Symbol

AI ROI measurement has been distorted by “token maxxing,” where consumption becomes a status symbol instead of a cost tied to value. Tech leaders such as Jensen Huang have even suggested that a $500,000 engineer or AI researcher should consume at least $250,000 (approx. RM1,150,000) in tokens a year, turning usage into an implicit performance goal. At Google, executives highlight processing more than 3.2 quadrillion tokens a month across products. Inside Meta, an internal leaderboard called Claudeonomics ranked more than 85,000 employees by token consumption and handed out titles like “Token Legend.” According to reporting on that leaderboard, the top 250 users burned through about 60 trillion tokens in 30 days. Yet as Amazon’s leadership later warned when they shut down their own leaderboard, these numbers say more about enthusiasm than about business impact or production value metrics.

Why Companies Can’t Prove AI Works Even as Usage Soars

Uber and Meta: Productivity Booms Without Clear ROI

Uber’s experience shows how enterprise AI adoption can deliver visible productivity gains while leaving ROI undefined. The company estimates that about 10% of code changes now come from autonomous agents, and internal teams report faster experimentation and higher throughput. Uber’s CEO speaks of “employees with superpowers,” and the company slowed hiring growth to redirect money into AI. Yet President and COO Andrew Macdonald concedes that Uber cannot connect higher token consumption to customer-facing outcomes, saying it is “very hard to draw a line” between usage stats and “25% more useful consumer features.” Meta’s internal token races display a similar pattern: massive AI-generated activity without a clear link to features, margins, or retention. The core problem is that organizations can count AI work in minute detail, but still lack mechanisms to track which of that work creates real business value.

Escaping AI Pilot Purgatory and Token Addiction

Many enterprises are trapped in AI pilot purgatory, where eye-catching demos and proofs of concept keep multiplying but rarely reach production. Kore.ai’s Cathal McCarthy argues that firms become “addicted to pilots,” chasing low-hanging fruit and quick wins that feel impressive but teach little about operating AI at scale. The excitement around early experiments, boosted by leaderboards and internal contests, hides the absence of production value metrics such as feature adoption, ticket deflection, or margin impact. Domo’s Ben Schein notes that you can “vibe-code a slick prototype” in an afternoon, but governance, security, and distribution cannot be improvised. The crucial shift is from measuring activity—tokens, prompts, experiments—to designing pilots that are wired into real workflows, with clear before-and-after baselines so that any AI impact on revenue, cost, or customer behavior can be measured instead of assumed.

Designing AI for Production Value, Not Token Burn

Executives from Domo and Kore.ai suggest that escaping AI pilot purgatory starts with redefining success. Rather than rewarding token consumption or number of pilots, they advocate focusing on production value metrics aligned with business outcomes: time-to-release for features, customer self-service rates, ticket resolution times, or error reductions in back-office tasks. According to enterprise survey data, 79% of organizations report productivity gains from AI at the individual level, yet only 29% see significant ROI and only 21% of S&P 500 companies can cite any measurable AI benefit. That 50-point gap reflects organizations that accelerate individual tasks without fixing bottlenecks at handoffs, approvals, and system integration. To close it, leaders need cross-functional ownership of AI, fewer but deeper pilots that go into production, and a measurement framework that follows AI work from tokens to features to customer outcomes and, finally, to profit and loss.