From Static Analyzers to Autonomous LLM Security Testing
Traditional static analyzers overwhelm engineers with long lists of warnings, many of which turn out to be false positives. Sandyaa, an open-source bug hunter from SecureLayer7, takes a different route by using large language models to read entire codebases, trace how data flows, and autonomously spot exploitable flaws. Instead of just flagging suspicious patterns, the tool builds context across files, splitting repositories into token-aware chunks and revisiting them through multiple recursive passes. This approach lets Sandyaa reason about call chains, state transitions, and data transformations in a way that more closely resembles a human code auditor. For development teams, it represents a new class of LLM security testing tool: one that not only detects weaknesses, but also understands how they might be practically abused. The result is a streamlined path from raw source code to a prioritized list of concrete, high-confidence security issues.
Recursive Analysis and AI Exploit Generation in Practice
Sandyaa’s analysis pipeline is built around what the project calls Recursive Language Models. A controlling model orchestrates a Python REPL that chunks files, runs regex filters, and spawns sub-LLM queries, allowing it to handle repositories larger than a single context window. Eight recursive phases drive the audit: call-chain tracing, data-flow expansion, self-verification, vulnerability chaining, proof-of-concept refinement, contradiction detection, assumption validation, and exploitability proof. For every confirmed bug, the tool writes a detailed report into a findings directory, including an analysis narrative, evidence.json mapping claims to file paths and line numbers, and a Python proof-of-concept exploit. Crucially, Sandyaa includes an attacker-control analyzer that drops issues unreachable from untrusted input, cutting down on purely theoretical flaws. While proof-of-concept execution is opt-in and disabled by default, the system can run its own exploits to validate impact when enabled, offering concrete, working demonstrations of vulnerabilities for security teams and developers.
Coverage, Noise Reduction, and Building Trust in Automated Findings
Sandyaa targets a broad spectrum of weaknesses, from memory-safety bugs like use-after-free, buffer overflow, type confusion, and double-free to higher-level logic issues such as authentication bypass, TOCTOU, and state machine errors. It also hunts for injection flaws—SQL, command, XSS, SSRF, and path traversal—alongside cryptographic misuse, concurrency races, integer overflows, signedness errors, and unsafe APIs including deserialization, XXE, and prototype pollution. To earn trust, SecureLayer7 tightened the verification stack until reviewing Sandyaa’s output became more efficient than reading code from scratch. Self-verification, contradiction detection, vulnerability chaining, and the attacker-control filter all serve to reduce false positives. This emphasis on precision has already surfaced real-world issues, including a SQL injection in MariaDBFilterExpressionConverter and a JSONPath injection in PgVectorStore AbstractFilterExpressionConverter within the Spring AI project, demonstrating that automated vulnerability detection coupled with AI exploit generation can produce actionable, high-impact findings.
Open-Source Release and Implications for Security Workflows
By releasing Sandyaa under an MIT license on GitHub, SecureLayer7 has effectively democratized access to advanced LLM security testing. The tool runs end-to-end with no interactive prompts, accepting either a local directory or a Git URL, and integrates with existing developer workflows through the Claude Code CLI. Some phases can optionally use Gemini via the gemini CLI, with configuration handled through a .sandyaa/config.yaml file that controls target paths, chunk sizes, severity thresholds, and output options. This open-source bug hunter lowers the barrier for smaller teams to adopt automated security auditing that was previously the domain of specialized tooling and experts. It also encourages community scrutiny and extension of the Recursive Language Models architecture, which may help further refine noise reduction, coverage, and interoperability across platforms such as macOS, Linux, and WSL2-based setups.
Balancing Automation, Human Oversight, and Future Security Testing
Sandyaa illustrates both the power and the tension inherent in AI-driven security tooling. On one hand, automated vulnerability detection and AI exploit generation can drastically reduce the time from discovering a bug to demonstrating its impact, making security reviews more targeted and efficient. On the other, autonomous exploit creation and optional proof-of-concept execution raise questions about safe deployment, governance, and access control within organizations. SecureLayer7’s choice to make exploit execution opt-in and to filter out unreachable paths shows an effort to embed guardrails, but human oversight remains essential. Security engineers still need to validate findings, interpret context, and decide remediation priorities. As development teams adopt tools like Sandyaa, the most effective workflows will likely blend automated LLM-driven analysis with expert review, using AI to surface high-value issues while humans handle risk assessment, architectural improvements, and long-term security strategy.
