From Point Models to Agentic AI Vulnerability Discovery
AI vulnerability discovery has moved beyond single chatbots toward orchestrated systems designed for automated code auditing and security testing. Instead of relying on one large model, leading vendors are building multi-agent architectures that coordinate scanning, reasoning, and validation across massive codebases and attack surfaces. The goal is vulnerability detection automation that compresses discovery timelines from months to minutes without sacrificing rigor. Microsoft, Google, and Tenable are converging on a similar pattern: agentic AI wrapped in strict workflows, extensive telemetry, and explicit guardrails. These platforms plug directly into existing engineering and security pipelines, treating AI as a security testing AI fabric rather than a standalone assistant. Yet, even as automation accelerates, all three providers stress that human review remains essential for confirming exploitable issues, deciding on patches, and maintaining overall security integrity.
Microsoft MDASH: Multi-Model Agents for Deep Code Audits
Microsoft’s MDASH platform illustrates how automated code auditing is being re-architected around coordinated agents. MDASH brings together more than 100 specialized AI agents that collaboratively scan, validate, debate, and prove vulnerabilities across complex products such as Windows, Hyper-V, and Azure. Rather than a single prompt chain, MDASH runs as a multi-stage pipeline, with different agents responsible for detection, lifecycle analysis, concurrency reasoning, deduplication, and exploitability checks. Microsoft reports that MDASH scored 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerabilities and reached 96% recall on historical clfs.sys issues and 100% recall on tcpip.sys cases reviewed internally. The company positions MDASH as model-agnostic, emphasizing that orchestration, validation, and automated proof generation matter more than any individual model. This architecture is designed to reduce false positives while surfacing practically exploitable flaws that security teams can prioritize for action.

Google CodeMender: Guarded Rollout with Mandatory Human Review
Google’s CodeMender takes a different but complementary approach, focusing on controlled access and strict human oversight. Introduced as a security-focused AI agent, CodeMender is designed to find software vulnerabilities, trace root causes, and propose patches that undergo extensive testing before any human approval. It combines Gemini Deep Think models with static and dynamic analysis, differential testing, fuzzing, and SMT solvers to build a robust security testing AI stack. Google is now expanding API access to vetted security experts, allowing them to integrate CodeMender into existing engineering and security pipelines while still keeping the tool out of general release. Every AI-generated patch remains subject to mandatory human review, including validation, rollback checks, policy assessment, and production-readiness testing. By keeping access gated and enforcing human control over remediation, Google aims to harness vulnerability detection automation without enabling misuse or over-trusting automated fixes.
Tenable Hexa AI: Automating Exposure Management and Risk Prioritization
Tenable Hexa AI extends AI vulnerability discovery beyond code into full exposure management. As the agentic AI engine of the Tenable One Exposure Management Platform, Hexa AI uses advanced multi-step reasoning and Model Context Protocol support to build custom agents and workflows that operate at machine speed. The system leverages Tenable’s Exposure Data Fabric to turn fragmented technical signals into prioritized, business-aligned intelligence, bridging the gap between vulnerability discovery and remediation. Hexa AI orchestrates automated remediation workflows, from creating and routing tickets to generating custom policies and audit-ready reports, enabling teams to act quickly on critical exposures. It integrates directly with existing security and IT tools, so organizations can use built-in agents or deploy their own to automate end-to-end workflows. In effect, Tenable is using security testing AI not just to find issues, but to operationalize response across the entire attack surface.

Balancing Automation with Human Judgment in Security Operations
Across MDASH, CodeMender, and Hexa AI, a common pattern emerges: AI-driven systems are dramatically improving vulnerability discovery efficiency, yet none of them remove humans from the loop. Microsoft relies on multi-agent validation and exploitability checks to cut false positives before they reach security teams, but engineering and response staff still make final decisions. Google explicitly mandates human review on every CodeMender-generated patch, and keeps access restricted to expert testers to reduce misuse risks. Tenable focuses on automating exposure management workflows and prioritization, while leaving risk acceptance, policy changes, and production deployments under human control. As frontier models accelerate vulnerability detection automation, enterprises are learning that the biggest gains come when AI handles scale, correlation, and orchestration, and humans handle context, governance, and accountability. The future of automated code auditing will be measured not only in findings, but in how safely those findings translate into reliable fixes.

