AI API Pricing Trends: Cheap Tokens vs Premium Models

What the AI price war is and why it matters now

The AI price war is the fast‑intensifying competition between model providers to win developers by cutting or raising API prices in radically different ways, forcing teams to choose between ultra‑cheap scale, premium capabilities, or a risky mix of both. For years, AI API pricing trends pointed in one direction: every new model promised more intelligence at equal or lower cost, especially in the budget tiers. That pattern has broken. DeepSeek V4 Pro and Xiaomi’s MiMo V2.5 Pro are pushing inference cost comparison toward an infrastructure‑like utility model, priced for constant, high‑volume agent workloads. Google, by contrast, is using Gemini 3.5 Flash price changes to probe how much value customers place on speed, tools, and ecosystem integration. The result is a split market where the same workload can be tenable or impossible depending on which side of the price war a builder chooses.

DeepSeek V4 Pro sets a new floor for AI API pricing trends

DeepSeek has turned a temporary promotion into a structural shock to AI API pricing trends. The company announced that DeepSeek V4 Pro’s API rate will remain at 25% of its original standard price, charging 6 Renminbi per 1,000,000 output tokens. According to TechNetBooks, this permanent cut “has essentially turned a temporary 75% price reduction into permanent rates” for developers. The contrast with frontier rivals is stark: OpenAI’s GPT 5.5 is listed at USD 30 (approx. RM138) per 1,000,000 output tokens on its standard tier and USD 180 (approx. RM828) for the premium tier at the same volume, giving DeepSeek a massive cost‑leadership advantage for output‑heavy workloads. For agents that loop, write long documents, or handle batch summarization, DeepSeek V4 Pro cost levels make previously expensive experiments possible. It signals a deliberate bid to win market share on price, even as hardware and compute costs rise.

The Great AI Price War: Cheap Tokens vs Premium Intelligence

MiMo V2.5 Pro joins the low-cost reasoning fight

MiMo V2.5 Pro shows how fast AI model pricing competition is spreading into demanding reasoning use cases. Xiaomi prices MiMo V2.5 Pro at about USD 1 (approx. RM5) per 1,000,000 input tokens and USD 3 (approx. RM14) per 1,000,000 output tokens for prompts up to 256,000 tokens. This makes MiMo a direct alternative to DeepSeek V4 Pro for teams building coding tools, research agents, and other reasoning‑heavy systems, where long contexts, frequent tool calls, and large outputs used to make the token bill “become the business model.” While DeepSeek undercuts it on pure output price, MiMo’s open‑weight design and focus on reasoning and agentic work add a different kind of value for developers who care about flexibility, model behavior, or deployment options as much as raw cost. The message to startups is clear: capable reasoning models are starting to be priced like infrastructure instead of luxury software.

Gemini 3.5 Flash price hikes signal the end of cheap tiers

While DeepSeek and MiMo race to the bottom, Google is pulling in the opposite direction. Gemini 3.5 Flash costs USD 1.50 (approx. RM7) per 1,000,000 input tokens and USD 9 (approx. RM41) per 1,000,000 output tokens, three times the USD 0.50 (approx. RM2) and USD 3 (approx. RM14) rates of Gemini 3 Flash. Earlier, Gemini 2.5 Flash was even cheaper at USD 0.30 (approx. RM1) input and USD 2.50 (approx. RM11) output, so prices have climbed sharply over consecutive releases. XDA Developers reports that an independent benchmark suite found Gemini 3.5 Flash cost roughly 5.5 times more to run than the previous Flash model because it both charges more per token and tends to generate more tokens on multi‑step agent work. At these levels, the gap between Gemini 3.5 Flash and higher‑tier Gemini 3.1 Pro shrinks, suggesting Google is testing how much customers will pay for “cheap” models with better features.

A split market and how developers should choose

These diverging moves create a clear bifurcation. On one side are cost‑leadership providers like DeepSeek and Xiaomi MiMo, racing to offer low inference cost comparison points that let startups run agentic workflows without blowing their budgets. On the other are performance‑premium vendors such as Google, which are pushing the Gemini 3.5 Flash price higher while relying on richer tools, long context, and ecosystem lock‑in to justify the spend. For a founder, the trade‑offs are concrete: ultra‑cheap tokens enable long sessions, aggressive iteration, and generous user quotas, while premium models may reduce development time with better reasoning, integrations, or reliability. A practical strategy is to treat model choice as a portfolio: use low‑cost engines for routine workflows and A/B test premium options on the narrow slices of your product where accuracy, tools, or latency clearly pay for themselves.