How Microsoft’s MDASH Agentic AI System Finds Win...

From Patch Tuesday to Proactive Defense

Microsoft’s MDASH system has already proven its value by uncovering 16 previously unknown Windows vulnerabilities, including four Critical remote code execution (RCE) flaws. These issues, found in components such as the Windows TCP/IP stack and IKEv2 service, were shipped as part of the May 12 Patch Tuesday security update. Rather than waiting for external researchers or attackers to surface weaknesses, Microsoft’s Autonomous Code Security and Windows Attack Research and Protection teams are using MDASH to sweep Windows networking and authentication code for exploitable conditions. The company frames this as a response to an increasingly asymmetric security landscape, where attackers are adopting AI to escalate the speed and sophistication of intrusions. By embedding an AI-driven vulnerability discovery system directly into its engineering pipelines, Microsoft is attempting to shift from reactive patching toward continuous, proactive Windows security hardening.

How Microsoft’s MDASH Agentic AI System Finds Windows Vulnerabilities Before Attackers Do

Inside MDASH: A Multi-Model Agentic AI Security Harness

MDASH is described as a multi-model agentic scanning harness, meaning it coordinates more than 100 specialized AI agents rather than relying on a single model. Each agent has a focused role: some scan Windows code for patterns that resemble known bugs, others validate or refute those findings, while additional agents merge duplicates and attempt to prove exploitability. This pipeline mirrors the workflow of human security researchers—triaging suspicious areas, verifying evidence, and reducing noise before results reach engineers. Microsoft emphasizes that the surrounding orchestration is as important as the models themselves. Disagreement between agents is treated as a signal; if an auditing agent flags a potential issue and a debating agent cannot disprove it, the finding’s credibility increases. The result is an AI cybersecurity tool designed to surface genuine Windows security flaws without overwhelming enterprise teams with false positives or low-value alerts.

Performance Benchmarks: Recall, CyberGym Scores, and RCEs

Beyond the 16 live Windows vulnerabilities, MDASH has been benchmarked across controlled and historical cases. Microsoft reports that in a planted-driver test, MDASH found all 21 seeded vulnerabilities with zero false positives. It achieved 96% recall on 28 historical cases in clfs.sys and 100% recall on seven tcpip.sys cases, indicating strong coverage across different code paths. On the CyberGym benchmark—designed to evaluate AI agents’ ability to uncover software bugs—MDASH posted an 88.45% score, outperforming other AI cybersecurity tools such as Anthropic’s Claude Mythos and OpenAI’s GPT 5.5, according to Microsoft. These metrics matter because security teams care less about model counts and more about whether a vulnerability discovery system consistently finds real, exploitable flaws. The four Critical RCE vulnerabilities uncovered reinforce that MDASH is not only winning synthetic benchmarks but is also surfacing high-impact issues in production Windows components.

What MDASH Means for Enterprise Security Teams

MDASH is currently in a limited private preview, with Microsoft’s own security engineering teams using it alongside a small group of enterprise customers. For defenders, the system promises to automate noisy early-stage triage, allowing human analysts to focus on validating and mitigating confirmed vulnerabilities instead of sifting through speculative findings. Because most of the MDASH-discovered flaws were reachable from a network position without credentials, the tool is already demonstrating value in spotting dangerous, attack-surface-facing bugs. For enterprises exploring agentic AI security, MDASH signals a shift toward embedded, autonomous vulnerability discovery systems that integrate directly into development and security workflows. While broad rollout and independent validation are still pending, early results suggest that adopting such AI cybersecurity tools could become a standard practice for organizations that rely heavily on Windows and want to identify RCE vulnerabilities before threat actors do.

The Broader Shift to Agentic AI Security

MDASH’s launch highlights a broader industry movement toward agentic AI security, where multiple cooperating agents continuously inspect codebases rather than serving as one-off scanners. Microsoft positions MDASH as evidence that AI vulnerability discovery has moved from research curiosity to production-grade defense. The system’s multi-model architecture, debate mechanisms, and high CyberGym score suggest that orchestrated AI can rival or enhance human-led bug hunting at enterprise scale. At the same time, Microsoft is keeping MDASH in a restricted preview, echoing how other major AI providers are limiting access to powerful defensive tools. This cautious rollout reflects concerns about dual-use capabilities and the need to validate reliability in real-world environments. For security leaders and educators, MDASH offers a concrete example of how AI can be used not just to generate code, but to systematically inspect, challenge, and harden it against emerging threats.

How Microsoft’s MDASH Agentic AI System Finds Windows Vulnerabilities Before Attackers Do

From Patch Tuesday to Proactive Defense

Inside MDASH: A Multi-Model Agentic AI Security Harness

Performance Benchmarks: Recall, CyberGym Scores, and RCEs

What MDASH Means for Enterprise Security Teams

The Broader Shift to Agentic AI Security