MilikMilik

Microsoft’s MDASH AI Found 16 Windows Flaws Humans Missed—Here’s How Agentic Security Works

Microsoft’s MDASH AI Found 16 Windows Flaws Humans Missed—Here’s How Agentic Security Works

From Research Curiosity to Production-Grade Agentic AI Security

Microsoft’s MDASH system signals a major shift in how Windows vulnerability detection is conducted. Instead of relying on a single large model, MDASH orchestrates more than 100 specialized AI agents to hunt for bugs across complex codebases. Microsoft describes it as a multi-model agentic scanning harness: a structured pipeline that takes real-world code, identifies attack surfaces, probes them for weaknesses, and then filters and validates the results. In internal testing, MDASH helped uncover 16 previously unknown Windows vulnerabilities in networking and authentication components, including four Critical remote code execution flaws in the TCP/IP stack and IKEv2 service. Microsoft argues that this marks the point where AI threat hunting has moved from experimental research into enterprise-ready defense, as MDASH can approximate the reasoning patterns of professional security researchers while operating at machine speed and scale.

Microsoft’s MDASH AI Found 16 Windows Flaws Humans Missed—Here’s How Agentic Security Works

How 100+ AI Agents Work Together to Find Real Bugs

Traditional scanners often drown security teams in noisy results. MDASH tackles this by assigning different roles to its fleet of AI agents. Some agents specialize in pattern-based bug discovery, scanning for risky memory operations or suspicious control flows. Others act as auditors and debaters, attempting to reproduce issues, challenge earlier findings, and eliminate duplicates. Disagreement between models becomes a useful signal: when one agent flags a vulnerability and another cannot refute it, MDASH increases its confidence in that finding. This agentic AI security approach mirrors the workflow of human experts who brainstorm, test, and cross-check each other’s conclusions. Microsoft notes that no single model is optimal at every stage, so MDASH dynamically combines cutting-edge and smaller, efficient models to cover the entire pipeline—from initial detection through to proof-of-exploitability, with a strong emphasis on minimizing false positives.

The 16 Hidden Windows Flaws MDASH Brought to Light

MDASH’s most visible impact so far is its contribution to Microsoft’s May Patch Tuesday release, where it helped identify 16 vulnerabilities in Windows networking and authentication components. Many of these issues were reachable over the network without credentials, increasing their potential impact. Among the most serious were four Critical remote code execution flaws affecting tcpip.sys—a key part of the Windows TCP/IP stack—and the IKEEXT service that supports IKEv2 IPsec connections. One vulnerability, tracked as CVE-2026-33827, involved a use-after-free bug triggered by crafted IPv4 packets, while another, CVE-2026-33824, stemmed from a double-free in specific IKEv2 responder configurations. These bugs required reasoning across multiple files, code paths, and ownership patterns—areas where standard scanners and single-model tools often struggle—highlighting how agentic AI security can expose subtle, multi-step weaknesses before attackers discover them.

Benchmarks, Recall Scores, and What They Mean for Defenders

To gauge MDASH’s reliability, Microsoft tested it against both synthetic and historical vulnerabilities. On a private driver seeded with 21 known bugs, MDASH reportedly found every single one with zero false positives. When measured against five years of confirmed Microsoft Security Response Center cases, it reached 96 percent recall in clfs.sys and 100 percent recall in tcpip.sys, suggesting it would have caught most past flaws in those components. On CyberGym, a public benchmark with 1,507 real-world vulnerability reproduction tasks, MDASH achieved an 88.45 percent success rate and outperformed other AI models, including Claude Mythos and GPT 5.5. While Microsoft avoids claiming that these numbers will automatically generalize to every codebase, they indicate that AI threat hunting is maturing into a consistent, repeatable capability—potentially giving defenders an edge in the ongoing arms race with adversaries using AI to find new exploits.

What Enterprise Security Teams Can Expect Next

MDASH is already in use within Microsoft’s own security engineering organizations and is being piloted with a limited set of enterprise customers via private preview. Because the system can approximate professional offensive researchers, Microsoft is intentionally controlling access to reduce the risk of abuse. For security teams, the long-term implications are significant. Instead of focusing mainly on reactive patching after incidents or disclosures, organizations could integrate agentic AI security into their development and test pipelines to continuously discover vulnerabilities before they reach production. MDASH’s design—combining multi-model reasoning, automated validation, and prioritization—suggests future tools will act as always-on AI threat hunting companions. While broader availability and integration details are still emerging, enterprises that prepare now by modernizing their code review workflows and tooling will be better positioned to take advantage of this next generation of Windows vulnerability detection.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!