Mythos Meets cURL: One Low-Severity Win, Plenty of Hype
Anthropic’s Mythos has been framed as a powerful, even too-dangerous-to-release AI for code vulnerability detection. Yet when it was pointed at cURL’s extensively tested codebase through Project Glasswing, the results looked underwhelming. According to cURL creator Daniel Stenberg, the Mythos-driven scan initially flagged five supposed “confirmed security vulnerabilities.” After several hours of manual triage by the cURL security team, that list shrank to a single genuine issue, now slated for disclosure as a low-severity CVE alongside cURL 8.21.0. The other items were either already-documented limitations or plain bugs, not security flaws. Stenberg credits Mythos with clearly described non-security bug reports, but concludes that its performance is incremental at best. In his view, the Mythos bug finder did not uncover issues at a rate or depth that surpassed other AI security testing tools already used on cURL, making Anthropic’s dramatic marketing claims look more like branding than breakthrough.
Firefox’s Bug Surge: Mythos, Opus, and the Power of the Harness
Mozilla’s experience with Mythos paints a more flattering picture, but one that still complicates the hype narrative. In April, Firefox logged fixes for 423 security bugs, up sharply from 76 in March and far above its previous monthly average. Mozilla says Mythos Preview identified 271 of these issues in Firefox 150, with additional help from Anthropic’s Opus 4.6 model. The standout wins include a high-severity, 20-year-old heap use-after-free flaw reachable via the XSLTProcessor DOM API and a set of sandbox escape bugs that traditional fuzzing struggled to uncover. Yet Mozilla’s engineers emphasize that the real shift came from their “agentic harness”—middleware that structures tasks, guides prompts, and filters AI output to boost the signal-to-noise ratio. In other words, Anthropic Claude security capabilities mattered, but so did the glue code around them, suggesting tooling and workflow may be as important as model quality.

Is Mythos Really Special, or Just Better Packaged?
Comparing cURL and Firefox highlights an awkward question: is Mythos truly a step-change in AI security testing, or simply a competent model wrapped in strong marketing? cURL’s long history with tools like AISLE, Zeropath, and OpenAI Codex Security has already yielded hundreds of bug fixes and roughly a dozen CVEs in under a year. Against that baseline, Mythos delivered a single low-impact vulnerability plus some routine bugs. Meanwhile, commentators skeptical of Anthropic’s “too powerful to release” framing have shown that more accessible models, such as Sonnet 4.6 and Haiku 4.5, can find meaningful issues when paired with well-designed harnesses like Wirken and auditing skills such as Lyrik. Some of these results overlap with Mythos findings, blurring the line between special-access models and off-the-shelf AI. The evidence so far suggests Mythos is capable, but its edge may come more from integration and workflow than from raw model superiority.
AI Bug Hunting’s Marketing Gap and What Comes Next
Taken together, these deployments expose a growing gap between AI marketing narratives and measurable security outcomes. Mythos has demonstrated real value—helping Mozilla uncover hard-to-fuzz sandbox escapes and validating prior hardening against attacks like prototype pollution. Yet cURL’s experience underscores that not every mature project will see dramatic gains from a new AI bug finder, especially when they already use multiple analyzers and fuzzers. More broadly, Mythos illustrates that Anthropic Claude security stories cannot hinge solely on model mystique. Success depends on disciplined middleware, careful triage, and teams willing to iterate on harness design. As AI-driven code vulnerability detection matures, the industry will likely move away from “magic model” promises toward transparent benchmarks, reproducible workflows, and shared tooling. For now, Mythos is less a singular breakthrough than a case study in how powerful models, structured agents, and human reviewers must work together to deliver tangible security improvements.
