Claude Opus 4.8 release: coding and reasoning gains

What the Claude Opus 4.8 Release Changes for Developers

Claude Opus 4.8 is Anthropic’s flagship large language model release focused on measurable gains in coding reliability, AI reasoning capabilities, and long-running agent workflows while keeping the same pricing as Claude Opus 4.7. For developers, this means a drop in silent errors, faster responses, and more dependable progress tracking on complex tasks. The model is available through claude.ai, Claude Code, and the Claude API under the claude-opus-4-8 identifier, and has also been integrated into platforms like B.AI with full API and web chat access. Anthropic positions Opus 4.8 as a model that can use tools in context, check its own work, and support large-scale software engineering, finance, law, and research tasks. This release is not a cosmetic update; it focuses on practical reliability improvements that reduce time spent debugging AI output and monitoring autonomous workflows.

Claude Opus 4.8 Cuts Coding Defects While Holding the Line on Price

AI Coding Improvements: Fourfold Reduction in Missed Defects

The headline upgrade in Claude Opus 4.8 is its stronger performance on code defect detection. Anthropic reports that the model is about four times less likely than Opus 4.7 to pass flawed code without comment, reducing a common failure mode where AI-generated code looks plausible but hides subtle bugs. According to Anthropic, "Opus 4.8 was nearly four times less likely than Claude Opus 4.7 to overlook flaws in code it generated." Benchmark results back this up: Opus 4.8 scores 69.2% on an agentic coding benchmark and 74.6% on Terminal Bench 2.1, both focused on real-world coding workflow tasks. Early testers like CursorBench also observed that Opus 4.8 needed fewer tool steps to reach comparable outcomes, indicating not only better correctness but also more efficient problem-solving for iterative development and refactoring.

Deeper AI Reasoning Capabilities and Long-Running Task Execution

Beyond coding, Claude Opus 4.8 strengthens general reasoning and autonomous task execution, areas that matter for analytics, research, and automation-heavy enterprises. Anthropic highlights improved performance on multi-discipline benchmarks such as Humanity’s Last Exam, where Opus 4.8 reaches 49.8% without tools and 57.9% with tools, as well as OS World Verified, where it records 83.4% in agentic computer use. B.AI notes that the model’s ability to carry out complex, long-duration tasks independently has been further enhanced, which supports automation development and deep reasoning scenarios like end-to-end codebase migrations or multi-step financial analysis. In knowledge work, Opus 4.8 reaches 1890 on GDPval-AA and 53.9% on Finance Agent v2, pointing to more reliable judgment in structured decision tasks. These changes make Opus 4.8 more suitable as a semi-autonomous collaborator rather than a short-answer assistant.

More Accurate Progress Feedback, Effort Control and Dynamic Workflows

A frequent weakness of AI agents is vague or overconfident status reporting. B.AI reports that Claude Opus 4.8 improves task progress feedback, making it more objective and accurate so teams can trust updates on long-running operations. Anthropic’s effort control system on claude.ai and Claude Work lets users tune how much computation the model spends: higher effort unlocks deeper reasoning, while lower effort favors speed and reduced token use. Opus 4.8 defaults to high effort but, on coding tasks, aims to stay within Opus 4.7–like token ranges while performing better. In Claude Code, Dynamic Workflows (in research preview) can plan work, spawn parallel sub-agents, verify outputs against an existing test suite, and report back in a single session. These workflows are designed for large codebases, including migrations involving hundreds of thousands of lines of code, and are available on Enterprise, Team, and Max plans.

Faster Performance, Safety Alignment and Stable Pricing

For enterprises watching both latency and cost, Claude Opus 4.8 pushes performance without changing headline pricing from Opus 4.7. Anthropic keeps the standard mode at USD 5 (approx. RM23) per million input tokens and USD 25 (approx. RM115) per million output tokens, while Opus 4.8 fast mode is priced at USD 10 (approx. RM46) per million input and USD 50 (approx. RM230) per million output, running at about 2.5x speed. Anthropic says fast mode is now three times cheaper than comparable earlier options. On the safety side, internal testing indicates substantially lower rates of harmful or misaligned behavior, with fewer cases of deception or uncritical agreement with misuse compared to Opus 4.7. Alignment performance is reported as comparable to Claude Mythos Preview, a higher-capability model limited to trusted partners under Project Glasswing, suggesting that Opus 4.8 combines stronger capabilities with tighter safeguards.