DeepSeek’s Permanent 75% API Price Cut Is Rewriti...

From Short-Term Promotion to Structural DeepSeek API Pricing Reset

DeepSeek has converted what looked like a temporary promotion into a structural reset of DeepSeek API pricing. The company confirmed that the 75% V4-Pro discount will become standard after the current period ends on May 31, effectively fixing prices at one quarter of the original launch level. The updated rate card lists DeepSeek-V4-Pro at USD 0.435 (approx. RM2.00) per million uncached input tokens and USD 0.87 (approx. RM4.00) per million output tokens, down from reference prices of USD 1.74 (approx. RM8.00) and USD 3.48 (approx. RM16.00). Cached input drops similarly, while the lighter DeepSeek-V4-Flash is even cheaper. This shift is more than a sale; it sets a new floor for long-context model costs. For developers, the V4-Pro discount permanent status reframes expectations around AI model cost reduction and makes affordable AI development with high-capacity models far more realistic.

DeepSeek’s Permanent 75% API Price Cut Is Rewriting the Economics of AI Development

Why a Permanent 75% Cut Matters for AI Startup Budgeting

For AI-native companies, usage-based model costs often erode margins in ways that look more like infrastructure than software. Every support interaction, generated report, or coding task carries a token bill. DeepSeek’s decision to make the V4-Pro discount permanent changes those spreadsheets. Instead of planning around an expiring promotion, teams can model multi-year AI startup budgeting on the new lower baseline. The reduced input cache-hit prices—cut to one tenth of launch levels across the lineup—are especially important for agents, coding assistants, and document-heavy workflows that repeatedly reuse the same context. Lower costs can make small-ticket AI features viable for students, solo workers, and small businesses that cannot absorb premium pricing. It also lets teams keep more context in prompts instead of aggressively trimming, summarizing, or downgrading experiences, potentially improving product quality while still delivering affordable AI development at scale.

Huawei’s Ascend Chips and the Infrastructure Behind Cheaper Models

Such an aggressive and permanent AI model cost reduction implies something has shifted in the underlying infrastructure. DeepSeek previously acknowledged that limited access to high-end compute pushed V4-Pro pricing far above its cheaper Flash sibling. Now, industry attention is turning to Huawei’s Ascend AI chips as a likely enabler of the new economics. As Huawei’s Ascend 950 and related hardware expand in the ecosystem, they offer an alternative supply of AI compute that can bring inference costs down. Reports suggest that for some workloads, V4-Pro now undercuts premium models from major Western providers by factors ranging from 20 to 35, highlighting just how dramatic the shift is. While future chip shipment targets may still constrain how broadly DeepSeek can extend access, the current pricing signals a new phase where model performance can improve even as per-token costs fall sharply.

Predictable Budgets and the New Competitive Baseline for Long-Context Models

Locking in the discount gives enterprises and startups something they value more than a short-lived bargain: predictability. With the risk of a post-promotion price snapback removed, teams planning coding copilots, document search, and other high-volume long-context workloads can treat V4-Pro’s lower rates as a stable part of their cost models. That clarity will shape multi-quarter contracts, roadmap decisions, and capacity planning. At the same time, rivals like Kimi, Qwen, and MiniMax now face tougher comparisons in budget-sensitive RFPs. DeepSeek is effectively trading margin for reach, forcing other providers to justify higher invoices on grounds like reliability, latency, data policy, and ecosystem fit. DeepSeek API pricing thus becomes a reference point: even companies that never integrate V4-Pro will benchmark against it when negotiating, pushing the broader market toward more aggressive pricing for comparable long-context capabilities.

How Lower AI Costs Could Transform Consumer and Business Applications

Cheaper, predictable access to powerful models reshapes how products are conceived. When each million tokens of input and output costs far less, teams can build richer agents, more persistent copilots, and deeper retrieval systems without fearing runaway bills. That can accelerate the adoption of advanced AI features in both consumer and enterprise apps, from continuous research assistants to always-on customer support and learning tools. Lower cache-hit pricing particularly benefits document-centric and support workloads, where repeated context reuse is the norm. Instead of minimizing context to save on tokens, teams can optimize for usefulness and accuracy. Not every company will shift wholesale to DeepSeek—concerns such as latency, trust, and data handling still matter—but the new pricing baseline widens the space for experimentation. In practice, affordable AI development at this scale could move advanced capabilities from premium upsells into default, everyday app experiences.

DeepSeek’s Permanent 75% API Price Cut Is Rewriting the Economics of AI Development

From Short-Term Promotion to Structural DeepSeek API Pricing Reset

Why a Permanent 75% Cut Matters for AI Startup Budgeting

Huawei’s Ascend Chips and the Infrastructure Behind Cheaper Models

Predictable Budgets and the New Competitive Baseline for Long-Context Models

How Lower AI Costs Could Transform Consumer and Business Applications