AI coding agents and the new developer pricing race

AI coding agents and the week pricing dropped again

AI coding agents are software assistants that connect large language models to developer workflows so they can read codebases, run tools, edit files, and manage multi‑step programming tasks with less manual effort from humans. Over a single 72‑hour window in May, three different AI coding agents pushed developer pricing models lower and sharpened the value debate. Cursor released Composer 2.5, its latest in‑house model tuned for code. Anthropic announced new infrastructure for Claude Managed Agents at its Code with Claude London developer event. Alibaba brought its Qwen 3.7 Max API online for general coding use. In parallel, DeepSeek‑native projects like the Reasonix terminal agent highlighted a different tactic: cache‑first designs to cut API cost on long sessions. Together, these moves give teams clearer cost‑benefit comparisons and show how quickly the AI coding agent market is changing.

Cursor’s Composer 2.5 pushes per‑task prices down

Cursor’s Composer 2.5 is a third‑generation proprietary coding model built on the open‑source Kimi K2.5 base and trained on 25 times as many synthetic coding tasks as its March predecessor. The standard tier is priced at USD 0.50 (approx. RM2.30) per million input tokens and USD 2.50 (approx. RM11.50) per million output tokens, with a faster default variant at USD 3.00 (approx. RM13.80) input and USD 15.00 (approx. RM69.00) output. According to Developer‑Tech, Cursor’s own CursorBench v3.1 shows Composer 2.5 scoring around 63% accuracy at roughly USD 0.50 (approx. RM2.30) per task, while Claude Opus 4.7 at its default setting lands near USD 7 (approx. RM32.20) per task. Even allowing for vendor bias, that gap creates a new pricing reference point for high‑end AI coding agents and pressures rivals to defend their premium.

Anthropic and Alibaba focus on infrastructure and access

Anthropic’s Code with Claude London event focused less on raw model benchmarks and more on the infrastructure around Claude Managed Agents. Self‑hosted sandboxes, now in public beta, let teams run agents and execute tools inside their own infrastructure while Anthropic keeps the orchestration loop. MCP tunnels, in research preview, allow Claude agents to connect to private internal systems without exposing public endpoints, routing encrypted traffic through an on‑premise gateway. Both features arrive with clear caveats about beta status and as‑is guarantees, but they narrow the gap for enterprises that worry about data leaving their perimeter. At the same time, Alibaba’s Qwen 3.7 Max API went live on Model Studio, adding another frontier‑level option and reinforcing the trend toward more accessible AI coding agents at lower effective price points, even when vendors emphasize platform integration as much as token rates.

Reasonix and DeepSeek show a cache‑first path to API cost reduction

While Cursor, Anthropic and Alibaba compete on frontier capability and platform reach, Reasonix targets a more specific lever: API cost reduction for long‑running terminal sessions. Reasonix is an open‑source DeepSeek‑native AI coding agent for the terminal that uses DeepSeek prefix caching to avoid reprocessing the same codebase context and instructions on every turn. The project describes its identity as “MCP first‑class · plan mode · cache‑first loop · MIT licensed,” pairing Model Context Protocol support with a workflow designed around repeated shell use. Its savings claim rests on a single‑day study from May 1, 2026, reporting about USD 12 (approx. RM55.20) of usage instead of about USD 61 (approx. RM280.60) under a frontier‑model baseline, so the evidence is early. Even so, cache‑first designs like this show how DeepSeek terminal agents can compete by making long sessions cheaper instead of chasing ever‑larger models.

What these pricing shifts mean for developer bottom lines

For active users of frontier‑model coding agents, monthly spending can reach USD 150 to 250 (approx. RM690 to RM1,150) under Reasonix’s framing, so pricing changes have a direct impact on team budgets. Composer 2.5 resets expectations for what a high‑accuracy coding model can cost per task, while Anthropic’s sandboxes and MCP tunnels focus on making that spend acceptable in stricter environments. DeepSeek‑native caches move savings upstream by reducing the tokens sent in the first place. Developers now compare not just headline rates, but full developer pricing models: IDE integrations versus terminal agents, managed workspaces versus open‑source tools, and cache‑first sessions versus stateless prompts. The pace of releases over 72 hours shows that AI coding agents are evolving on startup timelines, and teams that review their stack only once a year risk overpaying for intelligence that became cheaper last week.