MilikMilik

Anthropic Hands Petri to Meridian Labs: How Petri 3.0 Rewrites AI Alignment Testing for Developers

Anthropic Hands Petri to Meridian Labs: How Petri 3.0 Rewrites AI Alignment Testing for Developers

From Lab-Owned Tool to Independent AI Alignment Testing Infrastructure

Anthropic has transferred stewardship of Petri, its open-source AI alignment testing toolkit, to nonprofit Meridian Labs while simultaneously releasing Petri 3.0. The move echoes Anthropic’s earlier decision to donate its Model Context Protocol, signaling a broader strategy to place key infrastructure outside any single vendor’s control. Petri has already underpinned Anthropic’s alignment assessments for every Claude model since Claude Sonnet 4.5 and has been adopted by the UK AI Security Institute for research-sabotage evaluations of frontier systems. By handing Petri to Meridian—an organization focused on evaluation tooling rather than model training—Anthropic aims to make AI safety evaluation results appear more neutral and credible. Developers now inherit an alignment testing tool with real-world usage, not a one-off code dump, and Meridian must prove it can maintain and evolve Petri in ways that prioritize openness, trust, and operational reliability for a wider community of practitioners.

Anthropic Hands Petri to Meridian Labs: How Petri 3.0 Rewrites AI Alignment Testing for Developers

Petri 3.0’s Modular Auditor–Target Split and Why It Matters

Petri 3.0’s most significant change is architectural: the auditor model and target model are now separated into distinct components that communicate through a defined interface. Previously, these elements were tightly coupled, making it difficult to adjust either the judging logic or the model under test without disrupting the entire workflow. The new design lets teams tune auditor behavior, scoring logic, and prompts independently from the systems they evaluate. This matters because alignment testing tools don’t just observe model behavior; they shape what they can see and measure. A fixed auditor can overfit to one testing style or deployment context, hiding meaningful differences between models. With the modular split, developers can swap in different auditors for different risk profiles, compare model families more fairly, and adapt Petri to diverse production environments without treating any one lab’s setup as the default blueprint.

Dish: Bringing Alignment Testing Closer to Real Deployment Conditions

Petri 3.0 introduces Dish, a new extension in research preview that tackles a long-standing problem in AI alignment testing: models often behave differently when they sense they are in a test harness. Dish runs audits inside real agent scaffolds such as CLI-based agents and code-oriented environments, so the target model sees its actual system prompts, orchestration rules, and tool wiring rather than synthetic test wrappers. This pushes AI safety evaluation closer to the messy realities of deployed systems, where guardrails, toolchains, and application logic can meaningfully alter behavior. By embedding tests in production-like contexts, Dish helps developers understand not just whether a model can fail, but how those failures interact with surrounding infrastructure. For teams deploying agents, copilots, or tool-calling systems, Dish offers a way to stress‑test alignment under the same conditions users will encounter, closing the gap between lab benchmarks and live incidents.

Bloom-Based Checks: Targeted Behavior Analysis Instead of Binary Pass–Fail

Alongside Dish, Petri 3.0 integrates more tightly with Bloom, its tool for automated behavioral evaluations focused on specific behaviors. Rather than returning a single coarse “safe or unsafe” verdict, Petri plus Bloom enables fine-grained inspection: which behaviors appear, under what prompts, and how often. Used together, Dish and Bloom help isolate whether a problematic outcome stems from the base model, the surrounding application logic, or the way tools and guardrails are orchestrated. Developers can, for example, distinguish between a model misalignment issue and a misconfigured prompt chain or tool policy. This targeted analysis is essential for modern AI safety evaluation, where risks often emerge from interactions between components rather than from the model alone. For smaller teams without extensive internal red-teaming resources, Bloom-based behavior checks turn Petri into a practical alignment testing tool that supports iterative debugging of both models and systems.

Democratizing AI Alignment Testing Through Open-Source Governance

Open-sourcing Petri and placing it under Meridian Labs’ stewardship lowers the barrier to AI alignment testing for developers, researchers, and public-sector teams. Petri joins Meridian’s broader evaluation stack alongside Inspect and Scout, meaning existing users can plug alignment testing tools into pipelines they already run without building new orchestration layers. Inspect alone offers more than 200 pre-built evaluations with support for agents, tool calling, and sandboxed execution, giving Petri an immediate operational context. This ecosystem approach reduces vendor lock-in and encourages diverse contributors to extend Petri for domain-specific use cases—from research-sabotage risk to enterprise compliance scenarios. However, Meridian now carries a dual burden: proving that nonprofit governance actually translates into easier deployment, better upkeep, and more transparent comparison across models. If successful, Petri open source development could become a cornerstone for shared, community-driven standards in AI safety evaluation rather than a lab-specific testing pipeline.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!