AI coding agents: Cursor vs Anthropic vs Alibaba

AI coding agents defined and a 72‑hour market shock

AI coding agents are software systems that pair large language models with tools, memory, and execution sandboxes so they can understand repositories, edit code, run tests, and iterate on tasks with minimal human prompts, turning raw model capability into practical, semi‑autonomous coding intelligence tools for everyday development work. Within 72 hours in May, Cursor, Anthropic, and Alibaba each shipped significant AI coding agent updates, compressing what developers pay for frontier‑level capability more sharply than any single release earlier in the year. Cursor released its Composer 2.5 model on May 18, Anthropic announced new infrastructure features for Claude Managed Agents at its Code with Claude London event on May 19, and Alibaba’s Qwen 3.7 Max API went live the same weekend. Together, these launches did more than add incremental features: they pushed price floors down and exposed clear differences in developer pricing models and deployment assumptions.

Cursor’s Composer 2.5: capability claims and a low price floor

Cursor’s Composer 2.5 is its third‑generation proprietary coding model, built on the open‑source Kimi K2.5 base and trained on 25 times as many synthetic coding tasks as its March predecessor. This time Cursor named the base model upfront after earlier criticism about disclosure. The standard tier comes in at USD 0.50 (approx. RM2.30) per million input tokens and USD 2.50 (approx. RM11.50) per million output tokens, with a faster default variant at USD 3.00 (approx. RM13.80) input and USD 15.00 (approx. RM69.00) output. On Cursor’s own CursorBench v3.1, Composer 2.5 scores around 63% accuracy at roughly USD 0.50 (approx. RM2.30) per task, while Claude Opus 4.7 at its default setting scores comparably at approximately USD 7 (approx. RM32.20) per task. That gap, if it holds in third‑party tests, turns Cursor vs Anthropic into a cost‑per‑task story as much as a raw quality comparison.

Anthropic focuses on secure agent infrastructure, not headline prices

Anthropic used its Code with Claude London event to target the biggest missing piece for many enterprise and regulated teams: keeping AI coding agents close to their own infrastructure. Self‑hosted sandboxes, now in public beta, let teams run Claude Managed Agents and execute tools inside their own environments, while Anthropic still controls the agent orchestration loop. Launch partners include Cloudflare, Daytona, Modal, and Vercel, with a bring‑your‑own‑sandbox path for custom setups. MCP tunnels, in research preview, allow Claude agents to connect to private internal systems and databases without public endpoints, using encrypted traffic through a lightweight gateway on the private network. Both features are early and ship with explicit caveats, so teams needing strong stability guarantees are not the main audience yet. But they show Anthropic prioritising secure, large‑scale multi‑agent workflows over raw token price, aiming to anchor developer pricing models around workflow value instead of simple usage meters.

Alibaba’s Qwen 3.7 Max: closed weights, caching discounts, and hidden costs

Alibaba’s Qwen 3.7 Max API went live on Alibaba Cloud Model Studio ahead of its formal summit announcement, and stands out by being closed‑weight, unlike many of the company’s earlier open‑weight releases. Pricing is set at USD 2.50 (approx. RM11.50) per million input tokens and USD 7.50 (approx. RM34.50) per million output tokens, with a 90% discount on cached input tokens that brings those cached calls down to USD 0.25 (approx. RM1.15) per million. On the Artificial Analysis Intelligence Index it scores 56.6, and SWE‑Bench Verified sits at 72.5. There is a practical trade‑off: extended thinking is enabled by default, making the model verbose in long coding loops. Developers report effective costs running three to four times the headline rate unless they cap max_tokens. One notable compatibility perk is that Qwen 3.7 Max supports the Anthropic Messages protocol, enabling drop‑in use inside existing Claude Code integrations.

What the new competition means for developer pricing models

Taken together, these three launches show how fast coding intelligence tools are evolving and how competitive the economics have become. Cursor is pressing the price floor down with in‑house models tuned for code, while Anthropic is racing on cadence and infrastructure, pairing Opus 4.8 and forthcoming Mythos with tools for multi‑hundred‑agent enterprise workflows. According to Michael Parekh’s AI‑RTZ newsletter, Anthropic shipped Opus 4.8 only 41 days after 4.7, underscoring this rapid cycle. For developers, the choice is shifting from “which model is smartest” to “what does a solved task cost, and where does the code run”. Teams sensitive to spend may start mixing agents: low‑cost models for routine refactoring, higher‑end managed agents for complex, regulated workflows, and cached calls wherever possible. The competitive sprint of Cursor vs Anthropic vs Alibaba signals a future where pricing is not static but an actively tuned part of engineering practice.