AI Coding Agents Deliver Real Productivity Gains—...

From Chat Prompts to Agentic Software Development

AI coding agents have quickly evolved from browser-based chat helpers to embedded developer productivity tools. Early use focused on copying snippets from chat windows into editors—useful for exploration, but clumsy for real work. Newer approaches center on agentic software development, where tools plug directly into the CLI or IDE, read codebases, run tests, and propose commits. ClickHouse describes this progression as three levels: basic copy-paste assistance, hands-on agents integrated with development workflows, and fully autonomous multi-agent systems running in isolated environments. Most practical value today appears at the second level, where agents augment developers but are not left entirely on their own. This middle ground reflects a broader industry shift: moving beyond isolated AI-assisted programming experiments toward continuous, session-based tooling that can understand context, execute commands, and iterate on changes within real projects.

AI Coding Agents Deliver Real Productivity Gains—But Critics Warn of Hidden Costs

ClickHouse’s C++ Experience: Measurable Gains, Narrow Scope

ClickHouse’s engineering team offers one of the clearest case studies of AI coding agents embedded in a large, complex codebase. For much of 2025, agents struggled to navigate the project’s substantial C++ code, proving helpful mainly for JavaScript boilerplate or small Python scripts. That perception shifted with the arrival of Claude Opus 4.5, which the CTO reports made agents usable for daily work on the main C++ repository. Starting with tightly specified micro-tasks, then debugging from CI logs, and finally shipping small features, the team saw consistently better-than-expected outcomes. These successes were not universal—autonomous, long-running agent loops still produced dubious results, and some work remained more efficient by hand. But the pattern suggests that, under careful human guidance and within well-bounded tasks, AI coding agents can convert hype into concrete productivity gains on demanding systems software.

Reasonix and the Push for Cost-Efficient Long Sessions

As usage grows, the economics of AI coding agents have become a central concern. Reasonix, a newly launched DeepSeek-native terminal coding agent, tries to tackle this problem directly with a cache-first architecture. Instead of repeatedly sending the same project context to a model during long shell sessions, Reasonix relies on DeepSeek prefix caching to reuse shared context across turns. The project’s own framing notes that active users of frontier-model agents can spend between USD 150 (approx. RM690) and USD 250 (approx. RM1,150) per month, making cost optimization a serious design constraint. Reasonix positions itself as a terminal-first, MIT-licensed assistant that runs on macOS, Linux, and Windows and targets developers already comfortable with Node.js and local tooling. Its promise is not bigger models, but more efficient, workflow-aware AI-assisted programming that preserves context without inflating bills during extended terminal-based development sessions.

George Hotz’s Warning: Sophisticated Mimicry and Organizational Risk

Not everyone is convinced that agentic coding is a net positive. George Hotz, known for work on early device jailbreaks and AI tooling, argues that AI coding agents represent “one of the most costly mistakes in the field’s history.” After months of using agents to write parts of tinygrad and reverse hardware, he concluded he could have done each task better and faster manually. His critique hinges on the idea that agents are sophisticated statistical mimics, not programmers: their code often appears plausible while hiding subtle errors, making problems harder to detect. Hotz also highlights an organizational trap. High performers tend to catch sloppy output, but lower performers may not—yet agents amplify everyone’s throughput. The result, he warns, is the rapid accumulation of low-quality code and technical debt across large teams, where review capacity cannot realistically scale with AI-boosted output.

When AI Coding Agents Create Value—And When They Don’t

Taken together, these perspectives sketch a nuanced picture of AI coding agents. ClickHouse’s experience suggests that, in hands-on workflows with bounded tasks, agents can meaningfully accelerate bug triage, routine refactors, and small feature work, especially once models understand a large codebase. Tools like Reasonix add an economic layer, making long-running, terminal-centric sessions more affordable through context caching. Yet Hotz’s critique underscores real risks: overreliance on agents for complex architecture, subtle debugging, or poorly specified problems can produce hard-to-detect defects and structural bloat. The emerging best practice is selective adoption. Use agentic software development where tasks are well-scoped, tests are strong, and human review is rigorous. Avoid blanket mandates to “use AI” and resist handing critical design or integration work to autonomous loops. In that middle ground, AI coding agents look less like a looming disaster and more like powerful, specialized tools.

AI Coding Agents Deliver Real Productivity Gains—But Critics Warn of Hidden Costs

From Chat Prompts to Agentic Software Development

ClickHouse’s C++ Experience: Measurable Gains, Narrow Scope

Reasonix and the Push for Cost-Efficient Long Sessions

George Hotz’s Warning: Sophisticated Mimicry and Organizational Risk

When AI Coding Agents Create Value—And When They Don’t