AI Vulnerability Detection, False Positives and Trust

What AI Vulnerability Detection Is—and Why False Positives Matter

AI vulnerability detection is the use of machine-learning and large language models to scan code for potential security flaws, automatically suggesting where bugs, misconfigurations, or exploitable patterns might exist before attackers can exploit them in the real world. Anthropic’s Mythos model is a leading example: in early testing, it scanned more than 1,000 open source projects and flagged 6,202 high or critical bugs, including complex, multi-step issues that traditional code scanning tools often miss. But the same probabilistic behavior that makes these systems powerful also makes them noisy. When a model is asked to find bugs, it tends to label uncertain patterns as possible vulnerabilities, generating false positive alerts. These benign findings still land in security queues, demanding triage and manual review. The result is alert fatigue, slower response times, and growing skepticism about how much security automation teams can safely rely on.

Mythos: Impressive Scale, Imperfect Accuracy

Anthropic’s Mythos illustrates both the promise and the limits of current AI vulnerability detection. After its Project Glasswing preview launch, Anthropic reported that Mythos had scanned more than 1,000 open source projects and passed 1,752 of its most severe findings to six independent security research firms. Those reviewers found a 9.4% false positive rate and confirmed 62.4% of the bugs as genuinely high or critical severity. On paper, that rate is comparable to many automated code scanning tools. In practice, scale magnifies the burden: hundreds of incorrect or low-value alerts still require human verification. Mythos can chain multi-step attacks and even uncover vulnerabilities like the WolfSSL issue CVE-2026-5194, which Anthropic rated CVSS 9.1. Yet each speculative multi-step finding can take longer to validate, turning a statistical error rate into a significant operational load for already stretched security teams.

Why Models Hallucinate Bugs in Benign Code

False positives in AI vulnerability detection are not accidents; they follow from how these models work. Large language models are trained to predict likely patterns, not to guarantee factual accuracy. When pointed at code and asked for weaknesses, they generate plausible attack paths, sometimes hallucinating bugs where none exist. As Cloudflare’s Grant Bourzikas put it, “Ask a model to find bugs, and it will find them, whether the code has any or not.” This probabilistic nature encourages hedged language—“possibly,” “potentially,” “could in theory”—that outnumbers solid, reproducible vulnerabilities. For a triage queue, those caveats do not reduce the workload: analysts must verify each claim. And because the same query can produce different findings at different times, deterministic workflows become harder to maintain. Instead of clean, repeatable code scanning tools, security teams get a stream of high-variance suggestions that demand careful filtering.

Enterprise Interest: Cisco’s Experiment and the Hybrid Model

Despite accuracy limits, large enterprises are moving quickly to test AI code scanning tools. Cisco reported that Anthropic’s Claude Mythos Preview and OpenAI’s GPT 5.5-Cyber scanned 1.8 billion lines of code in eight weeks, covering more than 25 programming languages. According to Cisco’s Anthony Grieco, this would have taken its advanced security team about eight years using earlier methods. Cisco claims a false positive rate under 3% by pairing these models with a “human-guided harness,” where AI suggestions are validated before reaching engineering teams. This approach treats AI as a force multiplier, not an autonomous decision-maker. Mythos is part of Anthropic’s Project Glasswing partner program, which now spans about 200 organizations. Early partners like Palo Alto Networks reported finding dozens of CVEs in a month—far more than their usual disclosure pace—but those results were still filtered through human review to protect developers from a wall of warnings.

What Security Teams Should Do Next

For security teams, the lesson is not to reject AI vulnerability detection but to use it carefully. Mythos and similar code scanning tools can uncover serious vulnerabilities at a speed and scale that manual review cannot match, yet false positive alerts will remain part of the package. Teams should plan for a hybrid model: AI for broad, high-speed scanning and humans for prioritization, validation, and final decisions. That means building workflows that separate exploratory findings from actionable vulnerabilities, and tracking metrics like investigation time per alert, not only raw false positive percentages. It also means treating AI outputs as hypotheses to test, not truths to implement. As more companies join programs like Project Glasswing, the most successful teams will be those that gain speed without surrendering trust in their security automation or overwhelming their analysts.