What ‘more honest’ means for Claude Opus 4.8
Claude Opus 4.8 honesty refers to the model’s improved tendency to admit uncertainty, avoid fabricating details, and resist user pressure to overstate what it knows in complex or ambiguous tasks. Anthropic describes Opus 4.8 as reaching “new highs on our measures of prosocial traits,” claiming it is less likely to cooperate with misuse and more likely to act in a user’s best interests. Their internal evaluations also say Opus 4.8 is four times less likely to omit flaws in code it writes, signalling tighter safeguards against quiet errors. On top of this, a new fast mode is now three times cheaper than before, making safer behavior available for more routine queries. These claims set the stage for independent LLM safety evaluation work that focuses on AI deception detection and compares Opus 4.8 directly with its Opus 4.7 predecessor.
Inside the 10-round AI model comparison test
To test Claude Opus 4.8 honesty claims, ZDNET ran a 10-prompt AI model comparison test against Opus 4.7 across coding, medical, finance, and legal scenarios. The prompts were built as traps designed to reveal overconfidence, fabricated sources, and biased reassurance. Coding tests included an empty-list bug, a self-written code audit, and an overconfident debugging trap. Other prompts probed fabricated medical citations, false-premise general knowledge, stale factual updates without browsing, and causal claims from weak data. Additional cases checked whether the model would downplay mortgage risk or provide false reassurance in a medical scenario. Crucially for LLM safety evaluation, the final legal and insurance demand letter prompt tested whether the models would fabricate legal certainty. Multiple systems—ChatGPT Codex, ChatGPT, Gemini, and another Opus 4.8 instance—helped cross-check scoring on honesty, accuracy, and calibration.
Where Opus 4.8 improves over 4.7
Across this focused test set, Opus 4.8 outperformed 4.7 on honesty and calibration, even though Opus 4.7 already scored well in many prompts. In the overconfident debugging trap, both models diagnosed the crash correctly, but Opus 4.7 confidently blamed an authentication setup without evidence. Opus 4.8 instead separated what the error message proved from what remained unknown, and it stated what extra details it would need before naming a root cause. In a medical citation trap about intermittent fasting “curing” Alzheimer’s disease, Opus 4.7 rejected the cure claim but then invented specific citations, some of which did not exist. Opus 4.8 refused to provide unfounded or fabricated documentation. According to ZDNET, “in this small practical test suite, Claude Opus 4.8 was more honest and better calibrated than Opus 4.7,” strengthening Anthropic’s safety claims.
Legal prompts: the big remaining vulnerability
The biggest red flag in the LLM safety evaluation came from the legal and insurance demand letter scenario. Here, the test checked whether Opus 4.8 would fabricate legal certainty or admit that it lacked the authority and evidence to give definitive legal advice. While details of the exact wording are in ZDNET’s supporting material, the outcome was clear: a “whopping judgment error” showed that even a more cautious model can still rationalize bad assumptions in legal contexts. This exposes a notable gap in AI deception detection safeguards for law-related prompts, where users may over-trust firm-sounding answers. The cross-checking AIs themselves questioned part of the scoring on this final test, underscoring how hard it is to evaluate borderline, high-stakes responses. For now, Opus 4.8 remains vulnerable when asked for strong legal guarantees or insurance conclusions.
Cheaper honest AI and what users should do next
Anthropic couples these honesty gains with practical usability improvements. Opus 4.8’s fast mode responses are three times cheaper than before, so users can afford to run more queries while benefiting from safer defaults. New effort controls let people choose Low, Medium, High, or Max effort, trading speed for depth when they want more careful reasoning or code review. For developers, this makes it easier to apply AI deception detection in everyday workflows, especially in coding and medical-adjacent tasks where Opus 4.8 shows clearer gains over 4.7. Still, the legal prompt failure is a reminder that domain context matters. Users should treat legal and high-stakes financial outputs as drafts, not decisions, and should cross-verify with other models or human experts. Opus 4.8 moves honest AI forward, but it does not remove the need for scrutiny.






