DeepSeek API Pricing Cut and the New AI Cost Curve

What DeepSeek’s Permanent API Discount Really Means

DeepSeek’s permanent 75% V4-Pro API discount is a long-term reset of AI model cost, turning a short promotion into a structural price change that forces developers, startups, and rival providers to rethink how they budget, build, and deploy AI-powered products. Instead of reverting on May 31, 2026, the discounted DeepSeek API pricing becomes the official rate. The company states that V4-Pro now costs $0.435 per million uncached input tokens and $0.87 per million output tokens (approx. RM2.01 and RM4.02), with cached input priced at $0.003625 (approx. RM0.02) per million tokens. DeepSeek-V4-Flash is lower still, at $0.14 per million input tokens and $0.28 per million output tokens (approx. RM0.65 and RM1.30). According to Startup Fortune, “the model will be officially adjusted to one quarter of its original price,” turning the discount into the new baseline.

DeepSeek’s Permanent 75% Price Cut Rewrites AI Model Economics

How Lower AI Model Costs Reshape Developer Budgets

The permanent 75% price cut is more than a cheaper bill; it alters the AI development budget for anyone building apps, agents, or internal tools. AI-native products often look like software but behave like metered infrastructure, where every support reply, research run, or code-generation task consumes tokens that show up as hard costs. When DeepSeek API pricing drops to a quarter of launch levels, low-margin or low-ticket use cases become more realistic. Startups serving students, small firms, or solo professionals can afford richer features without wiping out gross margins. The move also encourages developers to keep more context in prompts instead of aggressively trimming or summarizing to save tokens, which can improve answer quality. For early-stage teams debating between external APIs, fine-tuned open models, or narrow in-house systems, the cost side of that comparison now looks very different.

Why Cache Pricing and Model Tiers Matter for Startups

DeepSeek’s changes go beyond the headline discount on V4-Pro. Its pricing page shows that input cache-hit costs across the model lineup have dropped to one tenth of launch levels, which is significant for agents, coding assistants, support bots, and document-heavy workflows that reuse the same instructions or reference files many times. In parallel, the company now presents a clear two-step ladder: V4-Flash as the cheaper everyday workhorse, and V4-Pro for harder tasks such as long-document reasoning or higher-value automation. When applications route traffic between those tiers and lean on cheap cache hits, the effective AI model cost reduction can be far larger than 75% on paper. This architecture lets startups design for both performance and cost, instead of constantly pruning features to protect margins, and could normalize multi-model routing as a standard product pattern.

Chips, Competition, and the New AI Price War

DeepSeek’s permanent discount also reflects a shift in the underlying hardware landscape. The company has tied its V4 models to Huawei’s Ascend 950 chips, and earlier said that limited access to high-end compute forced V4-Pro pricing as high as 12 times the Flash tier at launch. As Ascend 950 supernodes begin arriving in volume, that constraint seems to be easing, enabling lower inference costs and a more aggressive stance on pricing. Technology.org notes that “V4-Pro API costs now range from 0.025 to 6 yuan per million tokens, or about $0.0035 to $0.83 (approx. RM0.02 to RM3.84).” If other providers cannot match this curve, they will need to justify higher prices with reliability, latency, tooling, or trust. In the meantime, DeepSeek’s move signals that the AI price war is entering a new phase shaped by chip ecosystems as much as by model quality.

Opportunities and Trade-offs for Developers and Startups

For smaller companies and independent developers, DeepSeek’s permanent discount could accelerate adoption of AI features that once looked too expensive to run at scale. Lower per-token costs make it feasible to add AI into more workflows, from internal dashboards to customer-facing tools, without committing to enterprise-level pricing. However, price is only one axis of choice. Teams still have to weigh reliability, latency, data policies, regional restrictions, and user expectations when picking a provider. Some enterprise buyers may be cautious about centralizing sensitive operations on a single external model, regardless of cost. The practical outcome is that many teams will treat DeepSeek as a serious option in a multi-provider stack, testing it for cost-sensitive workloads while keeping alternatives for critical paths. In that sense, the permanent API discount makes the AI market more competitive and more flexible at the same time.