From Lab Tool to Public Infrastructure for AI Alignment Testing
Anthropic has transferred stewardship of Petri, its open-source AI alignment testing toolkit, to Meridian Labs, a nonprofit focused on evaluation infrastructure for frontier models. Petri is not a dormant code dump; it has been central to Anthropic’s own alignment assessment pipeline, including every Claude model since Claude Sonnet 4.5, and underpins the UK AI Security Institute’s research-sabotage evaluations. By pairing the governance handoff with the release of Petri 3.0, Anthropic and Meridian are framing the move as an operational change in active software rather than a symbolic gesture. The goal is to make AI safety evaluation more neutral, credible, and reusable across labs, public-sector teams, and independent researchers. With Petri now joining Meridian’s broader stack, including Inspect and Scout, the alignment-focused toolkit becomes part of an ecosystem designed to standardize how the industry runs and compares AI safety evaluation workflows.

Petri 3.0’s Auditor–Target Split: A New Backbone for AI Safety Evaluation
The core of Petri 3.0 is an architectural overhaul that separates the auditor model from the target model under test. Earlier versions tightly coupled these components, making it difficult to modify the judge, the model being evaluated, or the surrounding logic without touching everything at once. The new split introduces a cleaner interface between the two, giving teams fine-grained control over how their AI safety evaluation is configured. Because evaluation setups can themselves shape what they observe, this modularity matters: a fixed auditor, scoring scheme, or prompt template can conceal meaningful differences between models or overfit to a single testing style. Petri 3.0’s design lets researchers independently tune auditors, swap in new targets, or compare multiple deployment environments without treating any single configuration as the default. That flexibility is crucial for building repeatable, trustworthy AI alignment testing across diverse systems and applications.
Dish: Bringing Alignment Tests Into Production-Like Environments
Petri 3.0 introduces Dish, a research-preview extension that tackles a longstanding problem in AI alignment testing: models often behave differently when they sense they’re being evaluated. Dish addresses this realism gap by running audits inside real agent scaffolds and using a model’s actual system prompts and deployment context. Instead of testing models in abstract, laboratory-style setups, Dish places them inside the same orchestration layers, guardrails, tool chains, and command-line interfaces they would encounter in live applications. This means the Petri alignment tool can examine how wrappers, product logic, and tool-calling behavior alter model outputs, not just the raw model weights. By mirroring real production environments, Dish helps teams test not only how AI systems respond under evaluation but also how they behave when those tests are embedded within the workflows and constraints that end users and developers rely on every day.
Bloom-Based Behavior Checks: From Pass–Fail to Granular Risk Insights
Beyond Dish, Petri 3.0 integrates closely with Bloom, an automated behavior-checking tool designed for focused alignment tests. Together, Dish and Bloom let teams move beyond coarse pass–fail scores toward nuanced AI safety evaluation. Bloom can target specific behaviors and scenarios, allowing Petri to map where and when models fail rather than simply recording that they did. This combination helps isolate whether a risky outcome stems from the model itself, the surrounding application logic, or the way tools and prompts are stitched together. For organizations deploying frontier AI, this finer-grained insight is key to managing model risk, tuning guardrails, and prioritizing mitigation work. By enabling behavior-specific checks inside realistic deployment scaffolds, Petri’s new toolkit helps shift alignment testing from one-off benchmarks to continuous, context-aware diagnostics that better reflect actual user and operator conditions.
Why Open-Source Alignment Tools Matter for Industry Standards
Anthropic’s donation of Petri to Meridian Labs aligns with a broader push to keep critical AI safety infrastructure independent of any single vendor. Much like its earlier transfer of the Model Context Protocol to a neutral foundation, the Petri handoff is meant to improve trust in evaluation results by separating tool stewardship from model development. Under Meridian, Petri joins Inspect and Scout in a stack centered on evaluation, not training, creating a clearer boundary between safety tooling and commercial incentives. Because Petri is already used by external institutions and supports hundreds of pre-built evaluations via the broader stack, its open-source path can accelerate shared norms for AI alignment testing. If Meridian succeeds in making Petri easier to run, compare, and maintain, the toolkit could become a de facto standard for cross-lab AI safety evaluation, pushing the ecosystem toward more transparent, reproducible, and production-aware testing practices.
