DeepSeek API pricing slash reshapes AI costs

What DeepSeek’s Permanent 75% Price Cut Actually Means

DeepSeek’s decision to make its 75% V4-Pro API discount permanent is a structural change in AI model cost reduction, turning a temporary token promotion into a new baseline for AI startup expenses and developer expectations around inference pricing. Instead of reverting to earlier rates after May 31, DeepSeek has locked in the V4-Pro price at one quarter of its original level, with the company’s pricing page confirming that the sale price is now the real price. The DeepSeek API pricing now lists V4-Pro at USD 0.435 (approx. RM2.00) per million uncached input tokens and USD 0.87 (approx. RM4.00) per million output tokens, down from crossed-out reference prices of USD 1.74 (approx. RM8.00) and USD 3.48 (approx. RM16.00). Cached input falls to USD 0.003625 (approx. RM0.02) per million tokens, and DeepSeek-V4-Flash is cheaper again, at USD 0.14 (approx. RM0.65) for input and USD 0.28 (approx. RM1.30) for output per million tokens.

DeepSeek’s Permanent 75% API Price Cut Rewrites Startup AI Costs

How Lower DeepSeek API Pricing Reshapes Startup Economics

For AI-native startups, DeepSeek’s move goes far beyond a smaller invoice. Many young companies face a margin squeeze because their products look like software subscriptions but behave like metered infrastructure: every support reply, coding task, or generated report adds to the token bill. According to Startup Fortune, “AI-native startups have been living with a difficult margin problem” as usage scales. A 75% permanent cut in V4-Pro costs lets founders redraw their spreadsheets when choosing between external APIs, fine-tuned open models, or narrow in-house systems. Low-ticket AI features aimed at students, small businesses, or solo operators become more viable, because the per-request expense falls sharply. Teams can also keep more context inside prompts instead of spending engineering time trimming instructions or summarizing documents to stay within tight cost limits.

Caching, Long Context, and Product Design Freedom

DeepSeek’s price reset is not only about base token rates; it also includes steep reductions for cache hits, which changes how certain products can be designed. The company cut input cache-hit prices across its lineup to one tenth of launch pricing, a shift that directly benefits agents, coding assistants, customer support systems, and document-heavy workflows that resend the same instructions and files many times. WinBuzzer notes that developers planning “high-volume long-context AI workloads” now gain a steadier budget baseline with the permanent V4-Pro tier. Lower cache pricing encourages teams to keep richer histories, larger knowledge bases, and more detailed instructions in their systems without constantly pruning to save tokens. That flexibility can improve answer quality and reliability for end users while still keeping AI model cost reduction front and center in business planning.

Competitive Pressure and Huawei’s Growing AI Chip Role

DeepSeek’s decision also sends a competitive shock through the AI market. V4-Pro now carries a 75% price cut that, according to WinBuzzer, can make it 20 to 35 times cheaper than some premium offerings from OpenAI, Anthropic, and Google for certain workloads, turning V4-Pro and V4-Flash into aggressive benchmarks in budget-sensitive comparisons. At the same time, Technology.org reports that V4-Pro API costs now range from 0.025 to 6 yuan per million tokens, down from 0.1 to 24 yuan, with the model leaning on Huawei’s Ascend 950 chips. Digital Trends adds that earlier shortages of high-end compute made Pro access cost up to 12 times more than the Flash tier, but Huawei’s Ascend ecosystem is easing that constraint. If Ascend 950 shipments keep scaling, DeepSeek’s pricing could signal a new normal in inference costs and intensify the AI pricing battle.

What Startups Should Watch Next in the AI Price War

Even with the V4-Pro discount permanent, price is only one factor in choosing an AI provider. Startups still need to test reliability, latency, data policy, model behavior, tool calling, and any regional limits before committing. DeepSeek’s move, however, redraws the reference frame for AI startup expenses: teams building coding tools, document search, or agentic workflows can now budget around long-term DeepSeek API pricing instead of assuming promotions will vanish. Competing providers must either justify higher prices with clear advantages in quality or reliability, or respond with cuts of their own. As Huawei’s AI chip ecosystem matures, supply will influence how long DeepSeek can hold these levels, but the signal to the market is already clear: high-end AI capabilities are no longer tied to premium-only price points, and experimentation with more ambitious, context-heavy products is now financially safer for smaller players.