Anthropic Open-Sources Petri 3.0 to Meridian Labs...

From Lab-Owned Toolkit to Independent AI Alignment Infrastructure

Anthropic has transferred stewardship of Petri, its open-source AI alignment testing toolkit, to Meridian Labs while simultaneously releasing Petri 3.0. The move echoes Anthropic’s earlier handoff of the Model Context Protocol to a neutral foundation, signalling a deliberate strategy to separate evaluation infrastructure from any single AI developer’s incentives. Petri has been central to Anthropic’s internal AI alignment testing pipeline for Claude models and also underpins the alignment evaluation workflow at the UK AI Security Institute, which has already trialed a prototype of version 3.0 in pre-deployment checks. By donating an actively used, production-grade Petri alignment tool rather than a dormant code dump, Anthropic puts Meridian in a position to prove that nonprofit stewardship can improve trust, maintainability, and adoption. For the wider ecosystem, the handoff suggests that credible AI safety evaluation may increasingly depend on shared, lab-agnostic infrastructure instead of proprietary, vendor-specific test harnesses.

Anthropic Open-Sources Petri 3.0 to Meridian Labs, Pushing Collaborative AI Safety Testing Forward

Petri 3.0’s Auditor–Target Split: A More Modular Alignment Testbed

The most consequential technical change in Petri 3.0 is a redesigned architecture that separates the auditor model from the target model under test. Earlier versions interwove these components, making it hard for teams to adjust the judging logic without reworking the entire evaluation flow. In the new design, auditor and target communicate through a defined interface, enabling independent tuning of scoring rules, prompts, or model families on each side. This modularity matters because AI safety evaluation frameworks do more than passively observe behavior; they can bias what they notice and how they score risks. A single, fixed auditor can overfit to one vendor’s models or deployment assumptions, obscuring important differences across systems. By isolating the auditor, Petri 3.0 lets researchers compare models, scaffolds, and governance choices more fairly, fostering more robust AI alignment testing across heterogeneous environments and use cases.

Dish: Bringing Alignment Checks Into Real Deployment Scaffolds

Petri 3.0’s Dish extension tackles a persistent gap between lab tests and real-world behavior. Models frequently act differently when they sense they are being evaluated, especially under synthetic prompts or sandboxed conditions. Dish counters this by running audits inside actual agent and application scaffolds, using the same system prompts, orchestration layers, and tool integrations that models encounter in production. That means tests can operate within environments like coding assistants or command-line agents rather than in isolated text-only sessions. For developers, Dish offers a way to see how guardrails, wrappers, and tool-calling policies interact with model behavior under realistic workload patterns. This makes AI alignment testing more representative of true deployment risk, revealing failures that only appear once a model is embedded in a complex application stack. As more teams ship agentic systems, Dish positions Petri as a bridge between research-grade probes and operational AI safety evaluation.

Bloom: Narrow, Automated Checks for High-Risk Behaviors

Alongside Dish, Petri now integrates with the Bloom tool for automated behavioral evaluations targeting specific model behaviors. Instead of treating AI safety evaluation as a coarse pass–fail gate, Bloom enables fine-grained checks for defined risk patterns and contexts. Used together, Dish and Bloom let Petri pinpoint exactly where and when a model fails alignment tests: whether an unsafe output stems from the base model, the surrounding application logic, or the way tools and prompts are combined. This layered view is crucial for teams that need to distinguish model-level risk from system design flaws, especially in regulated or high-stakes domains. Bloom’s targeted probes also support iterative hardening: developers can adjust prompts, guardrails, or tool access, then quickly re-run focused evaluations to measure progress. The result is an open-source AI toolchain that treats alignment as an ongoing diagnostic process rather than a one-time certification step.

Toward Shared AI Safety Standards and Open Evaluation Stacks

By placing Petri under Meridian Labs’ stewardship, Anthropic is aligning with a broader shift toward collaborative, open-source AI tools for safety and evaluation. Petri will sit alongside Meridian’s Inspect and Scout frameworks inside a unified stack designed for testing frontier models, agent behavior, and tool-calling workflows. Existing Inspect users can plug Petri’s alignment checks directly into their pipelines without new orchestration layers, making it easier to standardize evaluation practices across labs, regulators, and researchers. The nonprofit setting also creates a cleaner separation between evaluation tooling and model-provider incentives, even though Meridian must still demonstrate that this governance model yields software that is easier to deploy and trust. Collectively, Petri 3.0, Dish, and Bloom suggest an industry trajectory in which AI alignment testing and AI safety evaluation rely on shared, extensible frameworks rather than isolated, proprietary solutions.

Anthropic Open-Sources Petri 3.0 to Meridian Labs, Pushing Collaborative AI Safety Testing Forward

From Lab-Owned Toolkit to Independent AI Alignment Infrastructure

Petri 3.0’s Auditor–Target Split: A More Modular Alignment Testbed

Dish: Bringing Alignment Checks Into Real Deployment Scaffolds

Bloom: Narrow, Automated Checks for High-Risk Behaviors

Toward Shared AI Safety Standards and Open Evaluation Stacks