AI Coding Agents in Enterprise Workflows

AI Coding Agents Move From Sidekick to Production System

AI coding agents are software systems that use large language models to perform autonomous code generation, run validation steps, and iterate on fixes inside real engineering workflows, shifting automation from isolated suggestions to integrated production tooling. Enterprises are moving beyond chat-like coding assistants toward platforms that treat agents as first-class actors inside build, test, and deployment pipelines. This changes how code reaches production: agents propose changes, execute tests, and refine results before humans review and merge. The promise is faster maintenance, more consistent operations, and AI workflow automation that can run continuously instead of per-developer sessions. At the same time, these systems expand the trust boundary, because generated code and orchestration logic become part of critical infrastructure. The central challenge now is not only how well models write code, but how reliably organizations can validate, monitor, and audit what those agents do at scale.

Dropbox Nova: A Platform for Orchestrated Engineering Agents

Dropbox’s Nova platform treats AI coding agents as reusable building blocks inside a shared engineering execution layer rather than as isolated tools. Nova runs agents in isolated cloud sessions tied to specific monorepo commits, with direct access to Bazel builds, CI pipelines, observability systems, and internal tooling. Within each session, agents follow a "propose, validate, iterate" loop: they generate code, run real tests and builds, inspect failures, and refine their changes until the results pass or limits are reached. Engineers can start these sessions via web UI, CLI, or APIs, and internal services can call Nova programmatically, turning AI coding agents into components of wider AI workflow automation. A major design choice keeps branching and merging outside the agent, preserving deterministic release processes and clear audit trails. In practice, Nova acts as the enterprise AI platform that turns autonomous code generation into something engineers can monitor and govern.

How Companies Are Deploying AI Coding Agents to Write Production Code at Scale

From Feature Code to Maintenance: Nova’s Early Use Cases

While Nova can generate new feature code, Dropbox reports its most effective uses are operational and maintenance workflows. One prominent example is Deflaker, a Nova-based system that analyzes logs from passing and failing tests, proposes fixes, validates them through repeated CI runs, and retries until it either stabilizes the test or hits retry limits. Nova also supports large-scale framework migrations and dependency upgrades, where earlier automation tools were brittle and hard to recover from when they failed. By running these tasks inside Nova, teams share guardrails, observability, and review patterns instead of building one-off scripts. This illustrates how AI coding agents can shift engineering time away from repetitive clean-up and toward higher-level design work. It also shows that the surrounding platform—hermetic testing, contextual integrations, and deterministic workflows—matters as much as raw autonomous code generation quality.

Perplexity’s Search as Code: Agents Writing Retrieval Workflows

Perplexity’s Search as Code applies AI coding agents to information retrieval rather than application logic, letting a model generate Python workflows that control how search runs. Instead of calling a fixed search endpoint in a loop, an agent writes code inside a restricted sandbox using an Agentic Search SDK. That code can compose steps for candidate retrieval, filtering, deduplication, and reranking, turning search flows into inspectable programs. Generated scripts show which pages were considered and how ranking decisions were made, but they also expand what teams must review, since selection logic now lives in model-written code. Perplexity reports that its CVE vendor-advisory task achieved "100 percent accuracy while using 85.1 percent fewer tokens than its baseline," though these results come from its own benchmark and require independent replication. The approach shows how autonomous code generation can cut token use by moving repeated reasoning into executable workflows.

Validation, Oversight, and the Emerging Role of Agent Managers

Both Nova and Search as Code highlight a shift in software work: engineers are spending less time typing code and more time defining workflows, constraints, and validation for AI coding agents. Benchmark claims alone are not enough for enterprise deployment, because real systems depend on how agents behave inside monorepos, CI pipelines, and production incident workflows. Teams need repeatable validation frameworks, clear audit logs, and ways to compare AI-driven workflows against alternatives. Perplexity, for instance, urges developers to test its architecture against options from OpenAI, Exa, Parallel, Google, TinyFish, and Tavily, underscoring the need for independent evaluation. As enterprise AI platforms mature, a new responsibility emerges: managing agents themselves—choosing where they run, what they can change, how their outputs are reviewed, and how failures trigger human intervention. The focus of engineering is gradually moving from code writing toward agent management and oversight.