From Token Counts to an AI ROI Measurement Crisis
AI ROI measurement is the challenge of linking detailed metrics about model usage—such as tokens consumed, agent activity, and generated code—to clear improvements in margins, products, or customer outcomes that matter to an organization. Enterprises can now see AI activity with microscopic precision: token dashboards, GPU utilization charts, and reports showing how many pull requests include machine-generated code. Yet these activity indicators fail to answer the only question executives care about: what did this change for customers and the P&L? Leaders can point to colorful usage graphs, but not to better unit economics or higher customer satisfaction. This gap between granular AI productivity metrics and hard business results is turning into a measurement crisis that keeps projects stuck at the demo stage instead of reaching reliable production value.
Uber’s Productivity Boom, ROI Fog
Uber is a prime example of the new enterprise AI implementation puzzle. CEO Dara Khosrowshahi says AI tools are creating “employees with superpowers,” and Uber can see that roughly 10% of code changes are generated by autonomous agents. AI-driven experimentation is up, token consumption is high, and teams across legal, marketing, and engineering report faster throughput. Yet President and COO Andrew Macdonald has admitted that the company “cannot draw a clear line” from these stats to more useful consumer features. He notes that without a direct link between AI usage and shipped customer value, the trade-off between headcount and AI spending is hard to defend. Productivity is visible on internal dashboards, but measuring AI value at the customer and profit level remains a black box, mirroring a wider market where only 29% of organizations report significant ROI from generative AI.
Token Maxxing: When Usage Becomes the Scoreboard
The measurement trap deepens when organizations treat tokens as the scoreboard. Inside tech firms, token consumption morphed into a prestige metric, with internal leaderboards ranking “power users” and titles like “Token Legend” awarded to the heaviest consumers. At Meta, one such dashboard tracked tens of trillions of tokens over a month. But usage chasing has real costs: Amazon shut down its internal leaderboard after leadership urged teams to solve customer problems instead of optimizing for usage, and Uber reportedly burned through its entire 2026 AI coding budget in four months after gamifying consumption. High token counts, like the old “lines of code” metric, confuse volume with value. As Domo’s Ben Schein notes, many executives now open their AI cost reports and ask, “What are we even doing with it?”—a question usage graphs cannot answer.

Pilot Addiction and the Handoff Bottleneck
Even when they avoid token maxxing, many companies get stuck in “pilot purgatory.” Kore.ai’s Cathal McCarthy says firms become “addicted to pilots,” mistaking impressive demos for progress. Most pilots chase low-hanging fruit, like faster drafting or basic chatbots, which display local productivity gains but do not rewire core processes. At the same time, AI accelerates task-level work while organizational structures remain slow. Developers ship code faster, marketers test more variations, and analysts produce more reports, but friction shifts to handoffs between teams, systems, and approvals. Local optimization collides with outdated workflows, so throughput increases in one function while integration becomes the constraint. This helps explain why 79% of organizations report individual productivity gains yet only 29% see significant AI ROI: execution speeds up, but coordination and measurement at the cross-team layer stay stuck.

From Activity Tracking to Measuring AI Value in Production
Moving from token maximization to production value demands new measurement approaches. Counting tokens, code changes, or model calls is useful for cost control, but it does not say whether AI systems improve products, margins, or customer outcomes. Enterprises need AI ROI measurement tied to real business levers: uplift in feature usage per AI-generated release, reduction in cycle time from idea to deployment, lower support contact rates after AI-driven improvements, or better unit economics in operations where agents take on work. That means designing pilots with explicit, P&L-relevant hypotheses and running them through production workflows, not isolated sandboxes. It also means tracking how AI-induced speed interacts with dependencies and approvals, so leaders can remove new bottlenecks. Until organizations measure value at this production layer, AI will look busy on dashboards while failing to prove its worth in the income statement.






