MilikMilik

Microsoft’s MDASH AI Uses Swarms of Agents to Unearth Hidden Windows Flaws

Microsoft’s MDASH AI Uses Swarms of Agents to Unearth Hidden Windows Flaws

From Zero-Days to Patch Tuesday: What MDASH Just Found

Microsoft’s new MDASH AI security system has already delivered concrete results: it identified 16 previously unknown Windows vulnerabilities, all of which were patched in the May 12 Patch Tuesday release. Among them were four critical RCE flaws affecting core networking and authentication components, including the Windows TCP/IP stack (tcpip.sys) and the IKEv2 service (IKEEXT). Many of these issues were reachable from a network position without credentials, significantly raising their risk profile for enterprise environments. Microsoft’s security teams argue this marks a turning point in Windows vulnerability discovery. Instead of waiting for attackers or external researchers to stumble on weaknesses, MDASH continuously probes code for exploitable paths before they can be abused. The move reflects a broader strategic response to what Microsoft calls an “increasingly asymmetric battle,” where attackers are already leveraging AI to accelerate discovery and exploitation of software bugs.

Microsoft’s MDASH AI Uses Swarms of Agents to Unearth Hidden Windows Flaws

Inside MDASH: Over 100 Agentic AI Models Working in Concert

MDASH is built as a multi-model, agentic AI scanning harness rather than a single monolithic tool. Microsoft has developed more than 100 specialized AI agents, each tuned for different tasks in Windows vulnerability discovery. Some agents comb through code to flag suspicious patterns, others validate whether an apparent bug is real, while additional agents de-duplicate overlapping reports and attempt to prove exploitability. The system runs these agents across both large frontier models and smaller distilled models, orchestrating them as a pipeline. Codebases are ingested, high-risk areas are identified, and findings are progressively filtered, challenged, and escalated. Microsoft emphasises that this structured workflow mirrors how human security researchers operate, but at machine scale and speed. Crucially, MDASH incorporates a “debate” stage, where auditor and debater agents argue over findings; unresolved disagreements increase the confidence that a suspected issue is a genuine vulnerability worth human attention.

Benchmarking MDASH: CyberGym Scores and Real-World Validation

To convince sceptical security teams, Microsoft has coupled MDASH’s live vulnerability finds with controlled testing results. In internal experiments with planted driver bugs, MDASH reportedly detected 21 of 21 injected vulnerabilities with zero false positives. Across 28 historical MSRC cases in clfs.sys, it achieved 96% recall, and it posted 100% recall on seven historical tcpip.sys cases. On the CyberGym benchmark, which evaluates AI agents’ ability to uncover software bugs, MDASH reached an 88.45% score, outperforming other AI models including Anthropic’s Claude Mythos and OpenAI’s GPT 5.5, according to Microsoft. The company frames these metrics as evidence that MDASH is more than a lab curiosity: it can repeatably surface serious issues in complex, real-world codebases. For enterprise defenders, the combination of benchmark performance and tangible Windows vulnerability discovery suggests that AI-driven triage is beginning to deliver production-grade value in security testing workflows.

What MDASH Means for Enterprise Security Strategies

For enterprises, MDASH signals a shift from reactive patching toward proactive Windows vulnerability discovery powered by agentic AI agents. Microsoft is currently using the system internally across its security engineering teams and has opened a limited private preview to a small set of customers. That cautious rollout reflects both demand and concern: tools capable of approximating professional security research could be misused if widely accessible. Still, the strategic message is clear. As attackers adopt AI to scale reconnaissance and exploit development, defenders will need comparable automation to keep pace. MDASH is designed to handle noisy early-stage triage, reducing false positives before issues reach human analysts. Organisations that integrate similar AI-driven pipelines into their enterprise security system can potentially shorten exposure windows, improve prioritisation of critical RCE flaws, and free scarce human experts to focus on complex, high-impact investigations rather than manual code review.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!