DeepSeek V4-Pro pricing shake-up for AI costs

What DeepSeek’s Permanent V4-Pro Price Cut Means

DeepSeek’s permanent 75% price reduction for its V4-Pro AI model is a structural shift in AI model cost reduction, turning a temporary promotion into a new baseline that can change how developers and enterprises plan long-context workloads and compare API pricing across providers. The company has locked in a quarter of the original launch price instead of reverting to higher rates after the initial discount window. According to Technology.org, V4-Pro API prices now span from 0.025 to 6 yuan per million tokens, down from 0.1 to 24 yuan. WinBuzzer notes that this makes V4-Pro a standing lower-cost option for teams that were bracing for a reset after May 31. By making the discount standard pricing rather than marketing, DeepSeek signals that its underlying compute economics—and possibly its supply chain—have changed in a lasting way.

DeepSeek’s 75% V4-Pro Price Cut Rewrites AI Model Economics

DeepSeek V4-Pro Pricing and Long-Context Workloads

For developers, the most direct impact is on long-context AI workloads that consume high token volumes. DeepSeek’s rate card keeps V4-Pro at roughly USD 0.435 (approx. RM2.00) per million uncached input tokens, USD 0.87 (approx. RM4.00) per million output tokens, and a lower cache-hit rate for reused context, according to WinBuzzer. DeepSeek has positioned V4-Pro as a one-million-token context model, which means a single request can cover extensive documents or large multi-turn conversations. Those characteristics normally drive token counts—and bills—upward. By moving V4-Pro to a permanent quarter-price tier, DeepSeek turns what used to be a high-end option into something that can underpin daily tools such as coding assistants, retrieval-heavy document systems, and support bots. The headline is not only cheaper tokens, but more predictable API pricing comparison for teams planning high-volume deployments.

Huawei’s Ascend Chips and the New Cost Curve

The pricing shift points toward changes in compute supply, with Huawei’s Ascend AI chips likely playing a central role. DeepSeek previously warned that V4-Pro could cost up to 12 times more than its Flash model because of “constraints in high-end compute capacity.” Technology.org reports that V4-Pro now leans on Huawei’s Ascend 950 chips, and DeepSeek had forecast lower prices once those chips shipped in volume later in the year. Digital Trends adds that Ascend 950 hardware has become increasingly important as a substitute for more restricted AI accelerators. If Huawei hits its 2026 shipment targets mentioned by WinBuzzer, DeepSeek may be able to sustain aggressive pricing over time. In practice, this suggests that AI infrastructure deals are now tightly tied to chip ecosystems, not only to model quality or software optimizations.

Budget Predictability for Developers and Enterprises

Replacing an expiring discount with a standard V4-Pro price transforms financial planning more than it changes model features. WinBuzzer points out that API buyers now avoid a hard reset that would have raised costs after May 31, giving finance teams a single, durable rate card to model. For enterprises running document search, code generation, or knowledge assistants, long-context features can drive millions of tokens per day, so the difference between promotional and permanent pricing is material. With V4-Pro now at 0.025 to 6 yuan per million tokens across usage types, as highlighted by Technology.org and Digital Trends, teams can commit to larger rollouts without fearing near-term price whiplash. This stability also enables clearer API pricing comparison against rivals, shifting vendor evaluations from “what if prices jump next month?” toward long-term total cost of ownership.

Competitive Pressure on Global AI Model Pricing

DeepSeek’s move lands in a market where API prices were already trending downward, but V4-Pro’s permanent discount raises the stakes. WinBuzzer notes that for some workloads, V4-Pro now appears 20 to 35 times cheaper than premium offerings from OpenAI, Anthropic, and Google, though exact savings depend on prompt structure and output volume. That gap will be hard for competing providers to ignore, especially as buyers scrutinize AI infrastructure deals and total inference cost rather than brand recognition alone. Digital Trends suggests this could intensify a global AI price war as other firms seek ways to scale performance while cutting token rates. If Huawei’s chip ecosystem continues to support DeepSeek’s economics, rivals may have to respond with their own cost-saving architectures, creative caching, or new volume tiers to stay credible in budget-sensitive API comparisons.