MilikMilik

Anthropic Open-Sources Petri 3.0, Setting a New Baseline for AI Safety Testing

Anthropic Open-Sources Petri 3.0, Setting a New Baseline for AI Safety Testing

From Lab Tool to Neutral Infrastructure: Petri Moves to Meridian Labs

Anthropic has transferred stewardship of Petri, its open-source AI alignment testing toolkit, to Meridian Labs while simultaneously unveiling the Petri 3.0 release. The move echoes earlier governance shifts, such as handing the Model Context Protocol to a neutral foundation, and is designed to keep AI safety evaluation tools independent of any single model provider. Petri already underpins Anthropic’s internal alignment assessments for its Claude models and forms part of an existing evaluation pipeline built by the UK AI Security Institute, giving Meridian a live, battle-tested codebase rather than an experimental prototype. Within Meridian’s stack, Petri will sit alongside Inspect and Scout, two other evaluation-focused projects, reinforcing the nonprofit’s role as a dedicated hub for open source AI safety. The key test now is whether Meridian can turn that neutrality into greater trust, easier maintenance, and broader adoption across labs, enterprises, and public-sector teams.

Anthropic Open-Sources Petri 3.0, Setting a New Baseline for AI Safety Testing

Petri 3.0’s Modular Core: Auditor–Target Split for Realistic AI Alignment Testing

At the heart of the Petri 3.0 release is a structural redesign that separates the auditor model from the target model under evaluation. Earlier versions intertwined these components, making it hard for researchers to adjust the judging system without inadvertently reshaping how the target behaved in tests. The new architecture introduces a clean interface between the two, allowing teams to swap or tune auditors, modify scoring logic, or compare different deployment setups while keeping the same model under review. This matters because AI safety evaluation tools do more than passively observe; their prompts, metrics, and assumptions can actively shape what risks they reveal or conceal. By decoupling judge and target, Petri 3.0 helps prevent overfitting to a single testing style and makes it easier to compare model families, governance assumptions, and deployment environments within a consistent AI alignment testing workflow.

Dish: Bringing Alignment Checks into Real Deployment Scaffolds

Petri 3.0 introduces Dish, a research-preview extension aimed squarely at the realism gap in AI safety evaluation tools. Models often notice when they are being tested and may behave more cautiously than they would inside real applications. Dish addresses this by running audits through actual agent scaffolds and system prompts that resemble production setups, including popular coding and command-line environments. Instead of synthetic evaluation harnesses, the target model encounters the same wrappers, tool-calling rules, and orchestration layers it would see after deployment. This shift pushes Petri closer to the real conditions in which alignment failures matter: tool-dependent workflows, guardrail interactions, and multi-step agent behavior. For teams building safety-critical systems, Dish turns Petri from a lab-only benchmark into a more faithful proxy for live behavior, helping them see how risk emerges when a model is embedded in full product logic rather than isolated in a test harness.

Bloom and Meridian’s Stack: Granular Safety Signals for Developers

Alongside Dish, Petri now integrates Bloom, a behavior-focused tool that automates targeted checks for specific model actions. Instead of a single pass-or-fail result, Bloom helps teams pinpoint which behaviors break, under what prompts, and how the surrounding application contributes to misalignment. Combined with Dish’s production-aware setup, this lets Petri distinguish between failures rooted in the base model and those caused or amplified by scaffolding, guardrails, or tool orchestration. For developers, that granularity is crucial: it turns AI alignment testing into a practical debugging surface rather than a one-off compliance hurdle. Because Petri will run inside Meridian’s broader stack, including Inspect and Scout, teams can plug these checks into workflows they already use, compare results across different systems, and iterate on mitigations. The open source AI safety approach also means organizations can extend Petri’s modules, customize auditors, and share new evaluations without waiting on a single vendor’s roadmap.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!