From Impressive AI Demos to Reliable Production Agents
Microsoft Foundry is an AI app and agent platform that provides shared runtime, tooling, and governance so enterprises can move AI agents from experimental demos into reliable production deployment across their existing systems. The platform is designed to sit between model providers and business applications, offering a consistent layer for grounding, observability, and policy. At Build, Microsoft framed Foundry as “the place where AI agents move from experiments to production systems,” stressing that the current enterprise AI battle is less about raw model capability and more about reliability and control. This framing reflects a common problem: many organizations can prototype agentic AI quickly, but struggle to maintain agents under real load, with real data and compliance rules. Foundry’s new runtime, Toolboxes, and evaluation capabilities target that gap, presenting a more opinionated but interoperable infrastructure layer.

Microsoft Foundry Runtime: A Production Path Without Rewrites
The new Microsoft Foundry runtime positions itself as an infrastructure layer that can run many existing agents without forcing teams to rebuild their stacks. Hosted agents in Foundry Agent Service provide a managed environment where each agent session runs in a sandbox with its own compute, memory, and durable filesystem. Importantly, agents built with Microsoft Agent Framework, GitHub Copilot SDK, LangGraph, and other SDKs can be deployed without rewrites, through either a stateful Responses API or a flexible invocations protocol for custom orchestration. Routines, now in public preview, allow agents to run on schedules for tasks such as overnight ticket triage or daily reporting, while long-running autonomous agents gain durable state. According to Nick Brady, the release brings “runtime, tools, memory, grounding, models, observability, and governance” together, describing a unified path from prototyping to AI agents production deployment.
Production Agent Tooling: VS Code, Toolboxes, and Memory as a Service
On the developer side, Microsoft is turning Foundry into what it calls an “AI app and agent factory,” with production agent tooling embedded into familiar workflows. The Foundry Toolkit for VS Code is now generally available, letting developers create agents from templates or with GitHub Copilot, debug runs locally with trace visualization, connect to Toolboxes, and deploy directly to Foundry Agent Service. Toolboxes, in public preview, give AI agents a single managed endpoint for tools, skills, Model Context Protocol clients, and enterprise data integrations. Tools are registered once in a project-scoped catalog, can be versioned, and discovered via tool search so models only see a small, relevant subset per task. Memory in Foundry is treated as a platform feature, with procedural, user, and session memory helping agents persist knowledge and improve task success across runs instead of managing state in each application.

Enterprise AI Governance: Tools, Data, and Policy in One Fabric
As AI agents move into business-critical workflows, enterprise AI governance becomes as important as model accuracy. Foundry’s Toolboxes approach turns tool governance into a managed service: teams configure tool access, authentication, and lifecycle once, while agents refer to a single endpoint. Skills can be published as discoverable assets, and tool search avoids exposing the entire catalog to every request, which helps both quality and context window limits. Toolboxes also connect to Microsoft IQ services—such as Work IQ, Fabric IQ with the Fabric data agent, Ontology, and semantic models—so agents can reach enterprise data without custom integrations for each source. Grounding and retrieval are handled through Foundry IQ, which unifies knowledge bases and sources like Azure SQL and file search behind one retrieval endpoint, so data governance and service-level expectations can be centralized instead of scattered across agents.
Policy-Driven Evaluation and the Shift from Capability to Reliability
The governance story extends beyond access control into how enterprises measure whether agents are safe and production-ready. Microsoft introduced ASSERT, an open-source framework for policy-driven agent evaluation and regression testing, built on Microsoft Research work. ASSERT converts written policies into measurable evaluations and generates targeted scenarios to surface safety and quality defects before deployment, and it supports multiple frameworks such as LangChain, CrewAI, LightLLM, and OpenAI. In parallel, Foundry adds shared observability and policy across agents, with direct publishing into Microsoft Teams and Microsoft 365 Copilot so identity, permissions, and policy follow agents into user-facing surfaces. These moves signal that Microsoft sees the enterprise AI competition shifting toward reliability, governability, and standardized AI agents production deployment pipelines, not just adding more powerful models. For large organizations, the question becomes less “Can we build an agent?” and more “Can we trust it in production?”






