Search as Code for AI Agents

What Search as Code Is and Why It Matters

Search as Code is a Perplexity architecture where AI agents generate and execute Python search workflows instead of calling a fixed search endpoint, turning retrieval steps into editable code and enabling automated search workflows tailored to each task. Instead of firing off the same API call on repeat, an agent writes a mini search program: retrieve candidates, filter noisy pages, remove duplicates, and rerank the survivors. This design aims to bridge AI reasoning and search execution so agents can adapt their approach as research problems grow more complex. It also makes Perplexity agents Python-first citizens, since Python is the initial runtime after internal tests with Rust, TypeScript, and Bash. For developers, the appeal is clear: search as code promises more controllable logic, potential token savings, and clearer visibility into how the model turned raw web pages into a final answer.

Inside the Architecture: Model, Sandbox, and Agentic Search SDK

Perplexity’s Search as Code stack has three main layers: the model as control plane, a restricted compute sandbox, and the Agentic Search SDK. The model writes Python that runs inside the sandbox, while the SDK exposes functions for retrieval, filtering, deduplication, and reranking. Instead of a static response template, each task can assemble its own pipeline of search steps. Generated scripts can show which pages were searched, which candidates were discarded, and which ranking rules shaped the final answer. That transparency comes with a trade-off: the generated code path becomes part of the system’s trust boundary, so teams must review it, especially for sensitive workloads. Because Python is the first supported runtime, AI agents Python workflows stay familiar to most developers but now require code review, sandbox policy tuning, and logging to ensure the delegated search logic behaves as intended.

From Pre-Built Search Calls to Agent-Generated Workflows

Traditional AI search workflows revolve around a tight loop: query, read the results, refine the query, then repeat until the model converges on an answer. Search as Code pushes more of that loop into generated Python, letting Perplexity agents plan multi-step retrieval strategies in advance. Multi-query features in the existing Search API—such as support for up to five queries per request—hint at how much manual tuning was already wrapped around “simple” search calls. Now, instead of wiring that logic by hand, developers can delegate it to an agent that writes the orchestration code. This shift puts search logic closer to how software engineers design pipelines: compose primitives, add conditionals, and track intermediate artifacts. It turns search as code into a bridge between generative reasoning and deterministic retrieval, which is essential for more autonomous research automation.

Claims, Benchmarks, and the Validation Gap

Perplexity reports strong internal results for Search as Code on software vulnerability research. One quoted benchmark claims that a CVE vendor-advisory task “reached 100 percent accuracy while using 85.1 percent fewer tokens than its baseline.” Those numbers come from a run over 200 CVEs published between 2023 and 2025 and place Search as Code ahead of several rivals on four of five benchmark rows, with a tie against OpenAI on Humanity’s Last Exam. However, these are company-run comparisons rather than independent evaluations. The broader context matters: Perplexity’s LiveBrowseComp notes that search-augmented agents can lose 25 to 40 points on fresh questions, while closed-book accuracy stays under 2 percent. Before teams rely on claimed token savings and accuracy gains, they should reproduce results against alternatives like OpenAI, Exa, Parallel, TinyFish, and Tavily under their own workloads and constraints.

What This Shift Means for Research Automation

Search as Code blurs the line between AI reasoning and search execution, giving Perplexity agents more autonomy to design and run their own retrieval logic. In environments like Perplexity Computer and the Perplexity Agent API, this autonomy ties into a wider system that routes work between local and cloud models and must enforce clear trust boundaries. Generated retrieval code can either reduce verification work—by encoding repeatable pipelines—or move that work into sandbox policy, debugging, and maintenance. For research teams, the practical question is not whether search as code sounds exciting, but whether it keeps citation quality high, controls token costs, and keeps retrieval predictable as tasks grow. As agentic products across the market race toward delegated workflows, the winners will likely be the ones that pair automated search workflows with transparent evidence trails and reliable safety guarantees.