Two Conflicting Stories About AI Coding Agents
AI coding agents sit at the center of a growing divide in software engineering. On one side is ClickHouse, whose CTO describes a year of experiment-driven adoption that has turned agents into everyday tools across a demanding C++ codebase. On the other is George Hotz, who argues that the industry’s embrace of agentic coding will become “one of the most costly mistakes” in its history. Both perspectives are grounded in hands-on experience. ClickHouse reports tangible efficiency gains in routine engineering work, especially since more capable models arrived in late 2025. Hotz, after months of building with agents on real projects, concluded that he could consistently outperform them working manually. For enterprises deciding whether to standardize on AI coding agents, this clash of narratives crystallizes the central question: do immediate productivity boosts outweigh the risk of subtle, accumulating quality problems?
Inside ClickHouse’s Year of C++ Codebase Automation
ClickHouse breaks AI-assisted coding into three levels: simple chat-based copy-paste, integrated agents inside the CLI or IDE, and more autonomous multi-agent systems running in controlled environments. For most day-to-day work, engineers now rely on level‑2 agents that read the C++ codebase, run builds and tests, and edit files while humans supervise. The tipping point came with a more capable model, after which agents shifted from JavaScript snippets and one-off scripts to credible help with C++ features and CI bug investigations. Clear wins emerged: boilerplate configuration changes, merge conflict resolution, automated code review, and especially fixing flaky tests in a massive CI pipeline running tens of millions of tests per day. ClickHouse even deploys autonomous agents to open pull requests, claiming that this single CI and test-stability use case alone justifies its investment in AI coding agents.

George Hotz’s Warning: Slop at Scale and the Organizational Trap
George Hotz does not dispute that AI coding agents can produce code quickly; his concern is what that code represents. He characterizes agents as sophisticated statistical mimics of programming, capable of generating output that looks correct while hiding subtle breakage that becomes harder to spot as models improve. After using agents on systems like tinygrad and real hardware reverse-engineering tasks, he found a consistent pattern: agents accelerate initial progress but stall at the polish and correctness stage, turning completion into a “slot machine” of retries. His critique is sharpest at the organizational level. High performers tend to spot agent-generated sloppiness, but weaker engineers now generate far more code with far less discernment. Combined with an explosion of AI-driven pull requests on major platforms, Hotz predicts a “golden era for buckets and buckets of slop” and a corresponding decline in truly high-quality software.
Productivity vs. Quality: Reconciling Metrics and Long-Term Risk
ClickHouse and Hotz are not describing different technologies so much as different optimization targets. ClickHouse measures success in software development productivity: fewer hours on repetitive tasks, faster CI stabilization, and better triage of enormous test volumes. In that frame, AI coding agents excel, particularly where problems are highly structured and well-specified. Hotz instead optimizes for deep correctness, maintainability, and architectural clarity. From that vantage point, agents appear dangerous precisely because they are so good at producing plausible code that organizations may not scrutinize deeply enough. The tension lies between short-term metrics—more pull requests, fewer visible flakes—and long-term concerns about subtle bugs, design erosion, and cultural overreliance on automated fixes. Both views imply that the real risk is not using agents, but using them without clear boundaries and uncompromising standards for what gets merged.
How Enterprises Should Approach AI Agent Adoption Risks
For enterprises, the ClickHouse–Hotz contrast suggests a pragmatic middle path. AI coding agents appear most defensible when aimed at constrained, low-ambiguity tasks: boilerplate C++ codebase automation, repetitive configuration edits, merge conflict resolution, and mechanical CI cleanups where human reviewers still control final changes. Organizations should resist blanket mandates that every engineer “use AI,” which risk amplifying weak judgment and flooding codebases with barely-reviewed output. Instead, they can define explicit agent usage policies, require human ownership for architecture and critical paths, and treat agent contributions like those from junior engineers who need rigorous review. Metrics must go beyond raw throughput to include defect rates, rollback frequency, and long-term maintainability indicators. ClickHouse’s case study shows that disciplined adoption can pay off; Hotz’s critique is a reminder that without discipline, the biggest cost of AI coding agents may be invisible until it is very hard to undo.
