DeepSeek V4-Pro pricing slashed 75%

What DeepSeek’s 75% V4-Pro Price Cut Actually Means

DeepSeek’s permanent 75 percent price cut for its V4-Pro model is a major reset in AI economics, because a frontier-grade, long-context system is now priced at only a quarter of its launch cost, changing how developers, enterprises, and competitors think about the true floor for AI API pricing and long-term model affordability. V4-Pro’s token rates have dropped from about USD 1.74 (approx. RM8.00) per million uncached input tokens and USD 3.48 (approx. RM16.00) per million output tokens to roughly USD 0.435 (approx. RM2.00) and USD 0.87 (approx. RM4.00). According to The Tech Portal, “DeepSeek V4-Pro will continue to operate at only 25% of its launch cost.” The company has also cut cache-hit input pricing far more steeply, which is important for repeated prompts and persistent agents. Instead of a short promotion ending May 31, these rates are now the standard card, giving buyers a durable baseline.

DeepSeek’s 75% AI Price Cut Signals a New Era in Model Pricing

Why Long-Context AI Workloads Are the Big Winners

V4-Pro is built for long-context AI workloads, and the new pricing directly targets that segment. The model supports a one-million-token context window and can output up to 384,000 tokens in one request, so teams can feed entire codebases, long legal archives, or scientific datasets into a single session. DeepSeek’s API rate card now ranges from 0.025 to 6 yuan per million tokens, down from 0.1 to 24 yuan, which sharply lowers recurrent costs for chatbots, coding assistants, and retrieval-heavy systems that consume billions of tokens each month. WinBuzzer notes that V4-Pro can be “20 to 35 times cheaper than premium offerings from OpenAI, Anthropic, and Google” for some workloads, though exact savings depend on prompt design and output volume. Because cache-hit tokens are discounted further, repeated context in support bots or long-running AI agents becomes far less expensive than before.

How Huawei’s AI Chips May Be Enabling Aggressive Pricing

DeepSeek has not fully explained how it can sustain a 75 percent permanent discount, but attention has shifted to Huawei’s Ascend AI chips. Earlier, DeepSeek admitted that limited access to high-end compute capacity forced V4-Pro pricing much higher than its cheaper Flash model, at times up to 12 times more. As Huawei’s Ascend 950 hardware becomes a key option after restrictions on other advanced chips, that constraint appears to be easing. Digital Trends reports that usage costs for V4-Pro now span 0.025 to 6 yuan per million tokens, a steep drop from the 0.1 to 24 yuan range. For API buyers, this suggests a supply-side improvement rather than a temporary subsidy: if DeepSeek can secure reliable access to domestic AI silicon, it can pass lower inference costs through as permanent AI model cost reduction without depending on short-lived promotions or loss-making tactics.

Implications for Developers, Businesses, and Rival AI Providers

For developers and enterprises, the permanent DeepSeek V4-Pro pricing change removes a major uncertainty. WinBuzzer highlights that the shift “prevents the lower API tier from reverting after May 31,” giving finance teams a stable cost curve and engineering teams confidence to scale long-context workloads. For high-volume users processing billions of tokens every month, The Tech Portal notes that the savings can add up to millions of dollars annually. This puts pressure on both regional rivals such as Kimi, Qwen, and MiniMax and global leaders whose premium models still carry far higher list prices. If V4-Pro remains between 20 and 35 times cheaper for some workloads, other AI providers may need to rethink tiering, context-window strategy, and discount structures. The competitive question now is whether they match DeepSeek on price, differentiate on quality, or double down on specialized features.

A Structural Shift in AI Pricing Strategy

The most important signal in DeepSeek’s move is that the 75 percent cut is permanent, not a time-limited discount. That suggests confidence in its cost structure, from Mixture-of-Experts design — with 1.6 trillion total parameters but only about 49 billion activated per inference — to access to more affordable hardware. The price reset also shows that long-context AI workloads are no longer a niche, high-margin corner of the market but a mainstream battleground. For buyers, this may mark the start of a new expectation: frontier-level models with million-token context windows do not have to carry premium AI API pricing. For providers, it hints at a future in which competitive advantage comes from efficient architectures, strategic chip partnerships, and predictable pricing rather than short-lived promotions. If others respond in kind, the entire AI stack could become far cheaper than many had budgeted.