MilikMilik

Anthropic Open-Sources Petri and Hands It to Meridian Labs: What the 3.0 Upgrade Changes for AI Alignment

Anthropic Open-Sources Petri and Hands It to Meridian Labs: What the 3.0 Upgrade Changes for AI Alignment

From In‑House Tool to Independent Alignment Infrastructure

Anthropic has donated Petri, its open-source toolbox for AI alignment testing, to the nonprofit Meridian Labs while simultaneously releasing Petri 3.0. The move mirrors Anthropic’s earlier decision to hand the Model Context Protocol to a neutral steward and is framed as a way to keep Petri independent of any single AI lab. Petri has already been central to Anthropic’s internal AI model evaluation pipeline, used for every Claude model since Claude Sonnet 4.5. It also underpins the alignment framework at the UK AI Security Institute, which relies on Petri to assess research-sabotage risks and to run pre-deployment checks on advanced models like Claude Mythos and Opus 4.7. By transferring an actively used, production-grade toolkit rather than a static code dump, Anthropic and Meridian are positioning Petri as shared infrastructure for open source AI safety and cross-lab benchmarking.

Anthropic Open-Sources Petri and Hands It to Meridian Labs: What the 3.0 Upgrade Changes for AI Alignment

Petri 3.0’s Modular Auditor–Target Split

The Petri 3.0 update begins with a fundamental architectural change: the clear separation of the auditor model from the target model. Earlier versions tightly coupled these components, making it difficult to adjust the judging logic without reshaping how the model under test was wired into the framework. The new modular design introduces a defined interface between the auditor and target, allowing researchers to swap or fine-tune each independently. This matters because AI alignment testing tools can influence what they detect; a fixed auditor or static prompt template can inadvertently mask differences between models or overfit to one testing style. With Petri 3.0, teams gain finer control over AI model evaluation, enabling more credible comparisons across model families, deployment environments, and governance assumptions without treating any one configuration as the default.

Dish: Bringing Alignment Tests into Real Deployment Scaffolds

A standout feature in the Petri 3.0 update is Dish, a new extension in research preview designed to close the gap between lab tests and real-world deployments. Traditional AI alignment testing often happens in sterile, clearly signposted setups, making it easier for models to infer they are being evaluated and adapt their behavior. Dish counters this by running audits inside real agent scaffolds such as CLI-based orchestration tools and coding assistants, using the model’s authentic system prompts and runtime environment. This means Petri can test not just raw model outputs, but behavior shaped by wrappers, guardrails, tool-calling configurations, and orchestration rules. For enterprises and public-sector teams, Dish turns open source AI safety checks into something closer to a dress rehearsal for production, revealing how alignment holds up once models are embedded in complex, tool-rich applications.

Bloom and Targeted Behavior Checks Beyond Pass–Fail Scores

Petri 3.0 also integrates more tightly with Bloom, Anthropic’s automated behavior-evaluation tool, which focuses on specific, predefined behaviors instead of broad pass–fail judgments. Used together, Dish and Bloom enable Petri to zoom in on where and when alignment fails. Rather than merely flagging that a system misbehaved, the toolkit helps identify which conditions triggered the failure and how much of the problem comes from the model itself versus the surrounding application logic. This is especially valuable when differentiating model risk from design flaws in orchestration layers, guardrails, or tool integrations. For alignment researchers and engineering teams, Bloom-based checks turn Petri into a diagnostic suite: it can highlight narrow, high-stakes failure modes, support regression testing after model or prompt updates, and offer more actionable insights than generic benchmark scores.

Democratizing Alignment Testing for Researchers and Enterprises

Handing Petri to Meridian Labs and deepening its integration with the broader Meridian stack—alongside tools like Inspect and Scout—signals a push toward shared, open evaluation infrastructure. Inspect already offers hundreds of pre-built tests, agent evaluations, and sandboxed execution, giving Petri an immediate home inside a mature AI model evaluation environment. For independent researchers, public agencies, and enterprises, this reduces reliance on proprietary vendor tools and lowers the barrier to running robust AI alignment testing at scale. The modular auditor–target split, Dish’s deployment-aware audits, and Bloom’s targeted behavioral checks together create a flexible toolkit that can plug into existing workflows. The open source AI safety ecosystem gains a neutral, extensible platform, while Meridian faces the practical challenge of proving it can turn Petri into software that is easier to trust, operate, and maintain than lab-owned alternatives.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!