What Real-World AI Coding Agents Actually Deliver...

From Skepticism to Daily Dependence on AI Coding Agents

When ClickHouse first experimented with AI coding agents on its substantial C++ codebase, results were uneven. Early tools could handle JavaScript boilerplate and throwaway scripts, but they struggled to navigate complex production systems. For many engineers, this reinforced the belief that AI-assisted programming was more toy than tool. That perception shifted dramatically with the arrival of more capable models such as Claude Opus 4.5. Engineers began with over-specified, tightly scoped tasks: small C++ changes, targeted bug investigations from CI logs, and limited feature work. Each success broadened trust and use cases, eventually making AI coding agents part of everyday software development productivity. By the time Opus 4.6 arrived, ClickHouse engineers were comfortable delegating non-trivial work, discovering that the right combination of models, tooling and workflow discipline could transform agents from occasional helpers into reliable partners embedded in their development lifecycle.

Where AI Agents Shine: Boilerplate, CI, and Test Reliability

ClickHouse’s most tangible productivity gains come from aligning AI coding agents with repetitive, precision-heavy tasks. For boilerplate and integrations—build-system tweaks, configuration edits across many files, infrastructure manifests—agents often make fewer mistakes than humans and never tire of tedious work. In version control workflows, agents handle merge conflicts so consistently that “agent does, human reviews” is now the norm, freeing reviewers to focus on architecture instead of syntax policing. The most dramatic win has been in fixing flaky tests and CI issues. ClickHouse runs tens of millions of tests daily across hundreds of commits and pull requests, and it never silences or retries failures, so every issue demands investigation. With AI-assisted test debugging, they submitted hundreds of pull requests in weeks and slashed findings from roughly 200 per day to just a handful per 10 million runs—enough, in their view, to justify the entire AI investment.

Hidden Costs: Plausible Mistakes and Skill Amplification

The same qualities that make AI-assisted programming powerful also introduce subtle risks. ClickHouse engineers report that agents are excellent at reading logs, proposing hypotheses, and iterating quickly—yet they are equally adept at generating plausible but wrong explanations. This means outcomes depend heavily on the human in the loop. Experienced SREs can use agents to converge faster on the truth, while less seasoned developers may be led astray by confident but incorrect suggestions. The team frames agents as multipliers: strong engineers become sharper and more productive; weaker engineers can cause more damage, faster. One notable success story involved a hard concurrency bug that had resisted multiple human attempts but was eventually fixed with a one-line change proposed by an agent after extensive reasoning and testing. Even so, ClickHouse emphasizes that judgment, validation and robust CI guardrails are non-negotiable, because AI coding agents do not replace the need for deep understanding.

Designing Workflows Around AI-Assisted Programming

ClickHouse’s journey underscores that code generation tools only deliver sustainable value when woven carefully into existing workflows. They categorize AI-assisted programming into three levels: simple copy-paste from chat, integrated agents in CLI or IDE that read and modify the codebase, and more experimental autonomous setups running in isolated environments. Most real productivity today comes from the middle layer, where agents run commands, edit files, build, test and occasionally commit under human supervision. Success depends on disciplined prompting and documentation: concise CLAUDE.md or AGENTS.md guides, explicit specifications of files and functions, and a culture of always validating with tests, fuzzing and randomized checks. Teams are advised to start with boilerplate, merge conflicts and refactors, then graduate to more complex tasks as confidence grows. ClickHouse also keeps multiple model providers on hand to hedge against downtime, reflecting a pragmatic, tool-first mindset rather than blind enthusiasm.

Beyond Hype: Weighing Long-Term Value and Future Directions

ClickHouse’s experience adds nuance to ongoing debates among technologists about the long-term value of AI coding agents. Their results suggest that when thoughtfully deployed, agents can create a widening productivity gap between teams that embrace them and those that do not. Yet this advantage is contingent on continued investment in testing infrastructure, prompt hygiene and developer education. The company is already pushing into more advanced use cases: triaging bug reports, automatically reverting bad changes, agent-driven testing of new features and continuous analysis of problematic workloads. Fully autonomous multi-agent loops, however, remain experimental, with tooling and reliability still maturing. For engineering leaders, the lesson is less about replacing developers and more about augmenting them. AI coding agents are best treated as powerful tools of thought that demand restraint, structure and critical oversight—delivering real gains while avoiding the hidden costs that come from over-reliance or under-supervision.

What Real-World AI Coding Agents Actually Deliver in Large C++ Projects

From Skepticism to Daily Dependence on AI Coding Agents

Where AI Agents Shine: Boilerplate, CI, and Test Reliability

Hidden Costs: Plausible Mistakes and Skill Amplification

Designing Workflows Around AI-Assisted Programming

Beyond Hype: Weighing Long-Term Value and Future Directions