AI Co-Scientists, Lab Automation and Wet Lab Gaps

AI co-scientists: powerful on paper, weak at the bench

AI co-scientists are software agents, often built on large language models, that aim to support or automate scientific work from hypothesis generation through experimental execution and data interpretation in both digital and physical lab environments. For now, they are far more helpful on screens than at the bench. According to the Pistoia Alliance, 54% of life science professionals report AI value in regulatory submissions and reporting, but only 1% see value in the wet lab. Most life science teams use generative models for documents, not pipettes. Bench researchers complain that traditional electronic lab notebooks do little to bridge this gap, which is why many are turning to public AI tools on personal accounts to interpret data or draft protocols. The result is a fragmented landscape where wet lab AI remains experimental, untrusted, and mostly disconnected from real automation.

Why AI Co-Scientists Still Struggle in the Wet Lab

Diverging architectures: where to put the model and the lab data

Vendors are taking very different architectural paths to make AI co-scientists useful. Sapio Sciences started by embedding a natural-language chat box inside its electronic lab notebook, but its Elain agent now connects through Anthropic’s Model Context Protocol to act across email, ELN, and reporting tools. Benchling is wiring its AI Scientist architecture directly into experiment design, ordering, and data capture, while also exposing a Model Hub and GPU-backed model runs so teams can swap and specialize models. Meanwhile, platform players like Perceptic focus on the “connective tissue” that links pharma’s scattered AI tools with proprietary data, treating the model as one of many services. These divergent bets all grapple with the same problem: LLMs excel at text, but lab value depends on how tightly they are wired into structured experimental data, protocols, and instruments.

Benchling, Sapio, and Potato: different routes to grounded wet lab AI

Benchling’s view is that an AI co-scientist only matters if it can influence what happens in the lab, not just the notebook. President Ashu Singhal breaks experimentation into thirds: repetitive assays worth automating on workcells, work better sent to CROs, and ad hoc experiments that should remain in human hands. Benchling’s response includes one-click ordering with CRO partners and deeper links between design, inventory, and automation, so wet lab AI can trigger physical work. Sapio’s Elain agent, by contrast, starts from the ELN and expands outward, using Anthropic’s Claude Cowork to query data and assemble reports or workflows on request. Emerging platforms like Ginkgo’s Cloud Lab and AI agents such as EstiMate add another model: scientists describe experiments in plain language and the system prices and schedules runs on robotic fleets, turning AI prompts into concrete lab actions.

Why lab automation is the missing bridge for wet lab AI

The limiting factor for wet lab AI is not only model quality but also how well systems connect intent to physical execution. Singhal notes that even when labs have good hardware or strong CRO partners, teams still spend heavy manual effort pushing the right inputs to automation and cleaning outputs so data becomes useful. Lab automation platforms that standardize protocols, reagents, and data schemas can give AI co-scientists a reliable surface to act on: choosing parameters, scheduling runs, and routing results back into design cycles. In fully or partially autonomous facilities, like Ginkgo’s Cloud Lab or emerging lights-out labs, AI agents can move from drafting suggestions to initiating experiments. Without this bridge, co-scientists remain sophisticated advisors stuck on the sidelines, unable to affect the messy reality of plates, tips, and assays.

Trust, grounding, and the path from pilots to scaled adoption

Trust is the main brake on scaling wet lab AI. Christian Baber of the Pistoia Alliance notes that pharma companies enforce a hard rule: “Nothing goes directly from a transformer model to an agency. It has to be looked at first.” That mindset extends to experimental decisions, where hallucinations can waste materials or compromise safety. Grounding AI co-scientists in real-world lab data, instrument logs, and structured protocols helps reduce this risk, but only if scientists can see and audit each step. Conversational interfaces, like Sapio’s voice-driven querying, make systems easier to use yet do not replace human review. As automation matures and architectures tie models closer to reliable lab data, life science teams may relax strict human-in-the-loop controls. Until then, the 1% adoption figure is a reminder that wet lab AI must prove itself experiment by experiment.