AI API pricing and the new economics of inference

Defining the AI Model Price War

The AI model price war is the rapid, aggressive reduction of AI API pricing by major model providers, turning high-end reasoning and agentic models from premium tools into low-cost infrastructural services and reshaping how developers budget for inference, choose providers, and design products around falling per-token costs rather than scarcity. DeepSeek’s permanent price cut on its V4 Pro API is the clearest signal of this shift. The company has turned a promotional 75% discount into its new baseline, charging 6 Renminbi for every 1,000,000 output tokens. According to DeepSeek, these rates are set at 25% of the original standard price and will not revert. That undercuts OpenAI’s GPT 5.5, listed at 30 USD (approx. RM138) for 1,000,000 output tokens at standard level and 180 USD (approx. RM828) at premium level, by wide margins.

The Great AI Model Price War and the New Economics of Inference

DeepSeek Pricing Sets a New Floor

DeepSeek V4 Pro now acts as a reference point for AI API pricing and model cost comparison across the industry. By fixing its rate at 6 Renminbi per 1,000,000 output tokens, and stating this equals 25% of its prior standard price, DeepSeek is treating high-end reasoning as a low-margin, high-volume business. This move lands at a time when many providers are raising rates due to computation shortages and higher hardware costs, turning DeepSeek into a pricing outlier. Its gap with GPT 5.5 is especially stark: the standard GPT 5.5 rate of 30 USD (approx. RM138) per 1,000,000 output tokens is described as about 30 times higher than DeepSeek V4 Pro, while the 180 USD (approx. RM828) premium tier is said to be more than 200 times higher. That spread forces other vendors to either follow prices down or defend a clear quality or compliance edge.

MiMo Joins the Low-Cost Reasoning Battle

MiMo V2.5 Pro’s pricing shows how fast inference costs are falling for reasoning models that target coding, agentic workflows and long-context tasks. Xiaomi’s MiMo API lists V2.5 Pro at about 1 USD (approx. RM5) per million input tokens and 3 USD (approx. RM14) per million output tokens for prompts up to 256,000 tokens, with higher long-context pricing beyond that. While the exact price comparison with DeepSeek V4 Pro depends on cache use, context length and provider markup, MiMo is clearly entering the same budget conversation for production-grade reasoning workloads. The key shift is that models built for heavy planning, tool use and long sessions are no longer priced like luxury software. Instead, they approach infrastructure-style economics, where competitive differentiation rests on quality, latency, reliability and features rather than the ability to pay steep per-token fees.

How Falling Inference Costs Transform Startup Economics

Lower inference costs are changing AI startup economics by reducing barriers to entry and making experimentation less risky. Teams building AI research assistants, coding workflows, legal review tools or data-cleaning agents can now run multiple iterations without hitting unsustainable token bills. Longer context windows, richer prompts and repeated tool calls become financially feasible earlier in a product’s life. For most startups, owning a foundation model is not necessary; they need reliable access and predictable AI API pricing. Competition among DeepSeek, MiMo, Qwen, Kimi and others gives founders a wider menu of tradeoffs in latency, context length and cost. It also makes multi-model routing attractive, since the cost of testing and switching between endpoints falls. Startups that understand these new economics can design pricing, usage caps and product scope around clear margins instead of discovering uncomfortable bills after launch.

Pressure on Incumbents and the Path to Market Consolidation

Incumbent providers now face a squeeze between customer expectations and their own cost structures. As capable open-weight or open-access models narrow the quality gap while undercutting AI API pricing, procurement teams are more likely to question renewals or ask for discounts on expensive contracts. Middleware aggregators like routing platforms face a similar test: when base model prices drop, their margins are harder to defend unless they offer clear value in routing, observability, fallback logic and governance. Yet falling prices also drive higher usage, which can benefit these intermediaries. Over time, the contest may shift from whose model tops leaderboards to who offers the lowest total cost of useful work, factoring cache pricing, long-context tiers, rate limits and reliability. The likely outcome is tighter competition in the short term and consolidation around providers and platforms that can sustain low inference costs at scale.