AI ROI Measurement: From Token Usage to Real Value

The new paradox of AI ROI measurement

AI ROI measurement is the effort by enterprises to connect detailed data on AI usage—such as token consumption, AI-generated code, and experiment volume—to clear changes in revenue, profit, product quality, or customer satisfaction. Enterprises now see AI activity at a granular level: tokens processed, prompts submitted, and code shipped by AI agents. Yet this visibility has not solved the core problem of measuring enterprise AI value. Usage dashboards display surging token counts and agent sessions, while product teams still struggle to answer a basic question: what changed for customers? The gap between AI activity metrics and business outcomes creates a new kind of measurement trap, where organizations optimize for tokens, tasks, and throughput rather than features, margins, or retention. As executives discover, more AI motion does not guarantee more value.

From token maxxing to cost anxiety

Early enterprise AI programs celebrated volume. Token consumption became a status symbol, fuelled by calls for heavy usage and public slides showing quadrillions of tokens processed each month. Inside large firms, internal leaderboards crowned “Token Legends” and “Session Immortals,” turning AI usage into a competitive game among tens of thousands of employees. The effect was predictable: soaring activity, unclear outcomes. According to reporting on one major platform company, 250 top users burned through about 60 trillion tokens in 30 days. Another large firm shut down its token leaderboard after leaders told employees to solve customer and business problems instead of chasing usage. An AI executive summed up the eventual reaction: when the bill arrives, leaders ask, “What are we even doing with it?” Usage had become detached from value.

Why AI Investments Generate Activity but Struggle to Prove Value

Uber’s productivity boom, ROI crisis, and the activity trap

Uber shows how advanced AI adoption can still fall into AI pilot purgatory. Roughly 10% of code changes now come from autonomous agents, and CEO Dara Khosrowshahi argues that higher throughput per person should support slower headcount growth and higher AI investment. Yet President and COO Andrew Macdonald points to a missing link: the company cannot tie higher AI usage to customer-facing gains. He notes that it is “very hard to draw a line” from token stats or agent output to “25% more useful consumer features.” This is the classic AI productivity metrics trap. AI speeds up task completion and code generation, while organizations remain constrained by workflows, approvals, and dependencies. Individual productivity rises, but friction shifts to the handoff layer between teams. Activity accelerates; measurable AI impact on products and P&L remains murky.

Why pilots and prototypes rarely translate into enterprise AI value

Enterprises have become addicted to AI pilots. Rapid prototypes and “wow” demos spread across departments, from coding assistants to internal chatbots, creating the appearance of progress. Kore.ai’s Cathal McCarthy argues that this pattern confuses novelty with learning: organizations take low-hanging use cases and call them wins, but real learning happens only at production scale. At the same time, leaders are haunted by statistics: 79% of organizations report individual productivity gains from AI, yet only 29% see significant ROI, and 95% of pilots show no measurable P&L impact. The problem is not that AI fails. It is that activity volume—more prompts, more tokens, more experiments—is a weak proxy for business impact. Without clear metrics that tie AI work to product outcomes, margins, or customer satisfaction, pilots pile up while strategy stalls.

Escaping AI pilot purgatory: from tokens to outcomes

Escaping AI pilot purgatory means treating measurement design as seriously as model selection. Counting tokens, agent sessions, or GPU utilization is no longer enough. Enterprises need AI productivity metrics that track value, not motion: cycle time from idea to feature in production, defect rates after AI-generated code changes, customer task completion rates in AI-assisted support, or margin shifts in AI-influenced workflows. Leaders also need governance that ties AI usage to accountable owners and explicit hypotheses: which customer metric, which cost line, which experience is meant to change? Industry executives now admit that the measurement problem is as important as the technology itself. Until organizations can link AI adoption to clear improvements in products, margins, or customer satisfaction, AI will keep generating colorful dashboards of activity while the business case remains unresolved.