From AI Experiments to Production: Red Hat’s AgentOps Vision
With Red Hat AI 3.4, Red Hat is sharpening its focus on the difficult leap from proof-of-concept models to reliable, production-grade AI systems. The release introduces an AgentOps framework that treats intelligent agents as first-class, operational workloads rather than experimental side projects. Red Hat positions its AI strategy around four pillars: efficient inference, tight integration with enterprise data, accelerated agent deployment across hybrid infrastructure, and a unified AI platform that can run any model in any agent on any hardware or cloud. In practice, AgentOps brings integrated tracing, observability, evaluations, and lifecycle management for agents, regardless of the underlying agent framework. By adding identity, governance and performance controls, Red Hat aims to turn the emerging “agentic era” into something enterprises can monitor, secure and scale, moving Red Hat AI production from isolated lab environments into the same operational discipline as other business-critical applications.

Model-as-a-Service: A Governed Backbone for Enterprise AI Deployment
At the core of Red Hat AI 3.4 is a Model-as-a-Service (MaaS) layer designed to simplify how enterprises consume and govern models across hybrid cloud AI environments. MaaS exposes pre-trained AI and machine learning models as shared, on-demand resources via API endpoints, while providing a single controlled interface for developers. That same interface allows administrators to track usage and enforce policies, turning model access into a managed service rather than a sprawl of one-off deployments. RHAI 3.4 builds on high-performance distributed inference with vLLM and the llm-d engine, adding features like request prioritisation so latency-sensitive, interactive traffic can share endpoints with background workloads without performance collapse. Speculative decoding support is aimed at improving response times and reducing cost per interaction. Together, these capabilities form an inference substrate that can support everything from simple chatbots to resource-hungry agents within a consistent enterprise AI deployment model.
Metal-to-Agent Infrastructure and Hybrid Cloud AI Operations
Red Hat describes its “metal-to-agent” capabilities as the connective tissue between bare-metal hardware and the autonomous agents running on top of it. The idea is to manage the full stack: hardware, Kubernetes, models, and agents, with consistent security and observability controls across on-premises data centres and public clouds. SPIFFE/SPIRE-based cryptographic identity management replaces static keys with short-lived tokens so that agentic actions are tied to verifiable identities and least-privilege access. An evaluation hub provides a unified control plane for testing models, AI applications and agents, helping teams standardise evaluation instead of relying on fragmented tools. Integrated prompt management turns prompts into governed data assets stored in a central registry. Automated adversarial scanning using technology from Chatterbox Labs, including the Garak vulnerability scanner and Nvidia NeMo Guardrails at runtime, helps protect against jailbreaks, prompt injection and bias. These layers are intended to make production-grade AgentOps feasible at enterprise scale.
Managed Inference Service on IBM Cloud Expands Access
IBM is extending Red Hat’s AI capabilities into its cloud portfolio with Red Hat AI Inference on IBM Cloud, a fully managed inference service targeting production workloads. Built on the Red Hat AI inference stack and powered by vLLM, the service is designed for low-latency, high-throughput model serving in real-time and agentic AI scenarios. Customers get OpenAI-compatible APIs, integration with IBM Cloud identity and access management, audit logging, privacy controls and reliability features, without having to handle GPU provisioning or runtime infrastructure. The offering provides a curated model catalogue that currently includes options such as Granite 4.0 H Small, Mistral-Small-3.2-24B-Instruct, Llama 3.3 70B Instruct, GPT-OSS-120B and Nemotron-3-Nano-30B-FP8, with support for additional open and custom models planned. For organisations seeking a managed inference service, this pairing of Red Hat AI production tooling with IBM Cloud aims to deliver model-as-a-service capabilities without building the inference layer in-house.
OpenShift Virtualization: Unifying VMs and AI Workloads on One Platform
Alongside managed AI inference, IBM is offering the Red Hat OpenShift Virtualization Service on IBM Cloud, which complements AgentOps by consolidating traditional and AI workloads on a single Kubernetes-based platform. Running on IBM Cloud VPC Bare Metal, the service provides automated lifecycle management and migration tooling, including the Migration Toolkit for Virtualization, to help enterprises reassess and modernise their virtualisation platforms. By enabling virtual machines and containers to run side by side on OpenShift, organisations can host legacy applications, new microservices and AI inference endpoints within the same operational model. This convergence is particularly relevant for hybrid cloud AI, where data sources and line-of-business applications often still run in VMs. Combined with Red Hat AI 3.4’s Model-as-a-Service and AgentOps framework, OpenShift Virtualization gives enterprises a path to integrate AI agents directly into existing VM-based environments while preserving governance, security and consistency across the stack.
