AI Agents Just Solved Decades-Old Math Problems—A...

From Task Automation to Autonomous AI Research

AI agents are rapidly evolving from handy productivity tools into systems capable of genuine discovery. At one end, Google is rolling out consumer-facing agents like Gemini Spark, information agents that replace traditional alerts, and a Daily Brief that reads your inbox and calendar for you. These services are framed as digital helpers that quietly manage tasks in the background, but most remain locked behind Google’s Ultra subscription at USD 100 (approx. RM460) per month. That paywall has raised questions about who actually benefits from the next wave of AI agents. At the other end of the spectrum, research-grade systems like Google DeepMind’s AlphaProof Nexus are showing what happens when agents stop merely organizing our work and start doing the work itself—tackling hard problems, testing ideas, and iterating without continuous human supervision. Together, they signal a broader shift: AI agents are moving into the core of how information is generated, not just consumed.

How AlphaProof Nexus Cracked Decades-Old Erdős Problems

AlphaProof Nexus is a purpose-built AI discovery system for mathematical problem solving. It combines a powerful language model (Gemini 3.1 Pro) with Lean, a formal proof verification language. Instead of outputting informal, natural-language arguments that human experts must painstakingly check, the agent produces fully formal proofs that Lean’s compiler can verify line by line. If any step fails, the system immediately rejects the proof and the agent tries again, creating a tight feedback loop. On top of this LLM–Lean core, the full-featured “Agent D” adds an evolutionary search system that maintains and ranks many proof sketches, plus AlphaProof, a reinforcement learning theorem prover subagent for tough subgoals. Using this architecture, the system autonomously solved 9 of 353 open problems from the Erdős catalog—including questions that had resisted humans for more than half a century—demonstrating that AI agents can now participate in front-line mathematical research rather than merely summarizing existing work.

Cheap at Scale: The Economics of AI-Driven Problem Solving

What makes AlphaProof Nexus stand out is not only what it solved, but what it cost. DeepMind reports an inference cost of just a few hundred dollars per Erdős problem, and notes that even a simpler version of the agent—just the language model plus Lean, without evolutionary search or AlphaProof—managed to crack all 9 problems in a post-hoc analysis. The advanced agent still matters: on tougher challenges, it was several times more cost-efficient than the basic setup at comparable success rates. But the core lesson is that once the infrastructure and mathematical libraries exist, the marginal cost of attempting each new conjecture becomes surprisingly low. If research groups can spin up autonomous AI research agents for the price of a single traditional experiment or conference trip, academic workflows could be reorganized around large, parallel batches of AI-scouted ideas rather than a handful of human-led bets.

What This Means for Scientific Discovery Beyond Mathematics

Mathematics is a particularly friendly domain for AI agents: it is digital, self-verifying, and rewards incremental improvements. AlphaProof Nexus builds on earlier systems like AlphaEvolve, which used evolutionary search to crack long-standing Ramsey problems, and extends the approach into a rigorous, machine-checkable framework. While its current strengths cluster where Lean’s libraries are mature—combinatorics, number theory, convex optimization—the same architectural principles could extend to fields where formal verification or simulation feedback can stand in for human judgment: from algorithm design to parts of physics and engineering. Already, the agent has contributed to algebraic geometry, optimization, quantum optics, and graph theory. As AI discovery systems improve, they will likely function as co-researchers: rapidly testing conjectures, searching parameter spaces, and generating candidate proofs or designs that humans refine. The locus of creativity may shift from solitary insight toward human–agent collaborations operating across thousands of automated research threads.

The Access Gap: Who Gets to Use Powerful AI Agents?

The contrast between DeepMind’s research breakthroughs and Google’s consumer strategy raises a difficult question: who will actually wield these powerful AI agents? AlphaProof Nexus showcases the frontier of autonomous AI research, but systems with similar underlying capabilities are beginning to appear in commercial products. Yet many of Google’s most advanced agentic features—like Gemini Spark and proactive information agents—sit behind a USD 100 (approx. RM460) per month paywall. That pricing and limited rollout risk entrenching a divide between institutions and individuals who can afford state-of-the-art AI agents and those who experience AI only as a smarter search bar. If AI agents problem solving becomes central to competitive research and innovation, access will shape which labs, companies, and even students can meaningfully participate in autonomous AI research. The next challenge is not only building more capable AI discovery systems, but making them equitably available.

AI Agents Just Solved Decades-Old Math Problems—And Research May Never Look the Same

From Task Automation to Autonomous AI Research

How AlphaProof Nexus Cracked Decades-Old Erdős Problems

Cheap at Scale: The Economics of AI-Driven Problem Solving

What This Means for Scientific Discovery Beyond Mathematics

The Access Gap: Who Gets to Use Powerful AI Agents?