MilikMilik

Anthropic Hands Petri to Meridian Labs: How Petri 3.0 Rewires Open AI Alignment Testing

Anthropic Hands Petri to Meridian Labs: How Petri 3.0 Rewires Open AI Alignment Testing

From Lab Tool to Public Infrastructure for AI Alignment Testing

Anthropic’s decision to donate Petri, its open-source AI alignment testing toolkit, to nonprofit Meridian Labs marks a structural shift in how safety evaluations are governed. Petri has already been integral to Anthropic’s own model assessments, including every Claude release since Claude Sonnet 4.5, and it underpins alignment pipelines used by major evaluation institutes. By handing stewardship to Meridian, Anthropic is deliberately separating Petri’s future from any single model provider, echoing its earlier transfer of the Model Context Protocol to a neutral foundation. This move matters because alignment test results are only as trustworthy as the independence of the tools that generate them. With a dedicated evaluation-focused nonprofit now responsible for Petri’s roadmap, researchers, public-sector teams, and enterprises gain a testing framework that aspires to be vendor-agnostic, easier to trust, and better aligned with shared safety goals rather than any one lab’s product strategy.

Anthropic Hands Petri to Meridian Labs: How Petri 3.0 Rewires Open AI Alignment Testing

Inside Petri 3.0: Modular Auditor–Target Design and Realistic Test Scenarios

Petri 3.0’s most consequential change is an architectural overhaul that cleanly separates the auditor model from the target model under evaluation. Earlier versions intertwined these components, making it hard to adjust the judging logic without reshaping the system under test. The new modular split introduces a defined interface between auditor and target, so teams can fine-tune scoring logic, prompts, or comparison strategies independently from the models they are assessing. This matters because evaluation frameworks can subtly shape the behaviors they detect; a fixed auditor across very different systems risks hiding important differences or overfitting to a single testing style. By decoupling these roles, Petri 3.0 gives practitioners a more flexible, production-ready alignment harness that can adapt to diverse deployment environments, model families, and governance assumptions, while preserving consistent, repeatable workflows that scale across research, pre-deployment checks, and ongoing safety audits.

Dish and Bloom: New Petri 3.0 Features for Production-Aware AI Safety Evaluation

Beyond its new architecture, Petri 3.0 introduces Dish and Bloom, two extensions that deepen how AI safety evaluation can mirror real-world use. Dish, currently in research preview, runs audits inside live agent scaffolds—such as command-line interfaces and code-oriented agents—so the target model encounters its actual system prompts, orchestration rules, and guardrails. This helps address a long-standing problem: models often behave differently when they realize they are being tested. Bloom complements Dish by automating checks for specific behaviors, enabling more targeted, fine-grained inspections instead of broad pass-or-fail judgments. Used together, Dish and Bloom help teams pinpoint where failures originate: in the model itself, the surrounding application logic, or the way tools and prompts are wired together. For organizations building safer AI systems, these Petri 3.0 features bring alignment testing closer to production reality, turning open source AI tools into practical, diagnostic instruments rather than purely academic benchmarks.

Meridian’s Open Evaluation Stack and the Democratization of Alignment Testing

Petri’s move to Meridian Labs embeds it in a broader open evaluation stack that already includes Inspect and Scout. Inspect, co-developed with leading security institutes, offers more than 200 pre-built evaluations, support for agent testing, tool calling, and sandboxed execution. By placing Petri alongside these frameworks, Meridian reduces the operational friction of AI alignment testing: existing users can plug Petri’s alignment checks into workflows they already orchestrate and compare. This integrated environment is crucial for democratizing access to sophisticated safety tools, especially for public-sector teams and smaller organizations that lack bespoke infrastructure. The nonprofit stewardship also draws a clearer line between evaluation tooling and model-provider incentives, aiming to strengthen neutrality and comparability across vendors. Meridian now carries the practical burden of proving that community-governed, open source AI tools can be easier to deploy, maintain, and trust than lab-owned alternatives, potentially setting a new norm for transparent safety evaluation practices.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!