NVIDIA DSX platform for AI factory infrastructure

From Experimental Labs to AI Factory Infrastructure

An AI factory is a highly coordinated data center environment that converts energy, compute, and software into tokens of machine-generated intelligence, operated as repeatable, scalable infrastructure rather than isolated model experiments. NVIDIA’s DSX platform is built around this idea, giving infrastructure teams a complete framework to move from one-off AI projects to production AI operating at enterprise scale. Instead of treating chips, networks, facilities, and applications as separate concerns, DSX aligns them into a single AI factory infrastructure stack covering compute, networking, storage, power, cooling, simulation, and operations. NVIDIA positions DSX as a “full-stack approach” that lets organizations design, validate, and run AI factories with better reliability and efficiency. The result is an AI operating model where intelligence production is predictable, token costs can be driven down, and new AI services ship faster than traditional, manually assembled infrastructure would allow.

NVIDIA DSX OS Turns AI Factories into Enterprise-Scale Intelligence Engines

Inside the NVIDIA DSX Platform: Reference Designs and Simulation

The NVIDIA DSX platform combines software, reference designs, and partner technologies into a unified toolkit for building AI factories. DSX Reference Design offers generation-specific, validated AI factory blueprints that span cluster architecture, storage layouts, and facilities infrastructure, helping builders avoid guesswork when planning large-scale AI deployments. DSX Sim adds a high-fidelity simulation layer, allowing teams to model power, cooling, and workload behavior before hardware is installed. According to NVIDIA, “with the DSX platform, you can simulate the entire factory before you spend a dollar, validate performance before a single rack is installed and operate with the kind of reliability that production AI demands.” This simulation-first method is central to enterprise AI scaling: operators can explore design trade-offs, test failure scenarios, and optimize token throughput per watt long before their AI factory goes live, shortening time to first production.

DSX OS: An AI Operating System for Multi-Tenant Factories

DSX OS is the AI operating system at the heart of the NVIDIA DSX platform, built as open source, modular AI software for running multi-tenant AI factories at scale. It covers core operational needs such as lifecycle management, intelligent scheduling, runtime consistency across regions, health automation, and resiliency. DSX OS standardizes how components communicate, from GPUs and networking to building management systems, power distribution, and cooling controls, so AI workloads can respond to physical conditions in real time. NVIDIA is releasing software it uses to operate NVIDIA DGX Cloud as open source, allowing partners to reuse proven components instead of rebuilding their own stacks. This shared foundation addresses one of the largest challenges in AI operating systems: coordinating a complex mix of infrastructure, models, and applications so that AI services can be deployed, updated, and scaled with predictable performance and uptime.

Power, Tokens, and Efficiency: DSX MaxLPS and DSX Flex

In an AI factory, power is the hard limit on how much intelligence can be produced, so NVIDIA DSX dedicates an entire layer to energy and efficiency. DSX MaxLPS is designed to maximize tokens per watt within a fixed power budget, combining 45-degrees-Celsius liquid cooling and in-rack optimizations to keep GPUs at their most efficient operating point. NVIDIA states that this lets operators run “up to 40% more GPUs at their most energy-efficient operating point with minimal impact on workload performance.” DSX Flex ties AI factories to grid signals such as demand response or load shedding, turning energy constraints into a controllable input rather than a constant risk. Together with DSX Exchange, which routes MQTT-based signals between IT and operational technology, these tools connect the power layer directly to AI scheduling, helping enterprises improve tokens-per-megawatt and lower the overall cost of intelligence production.

Modular AI Software for Faster Enterprise AI Scaling

The open, modular architecture of DSX OS is key to turning AI factories into repeatable enterprise AI infrastructure. Its components can be integrated into existing platforms, giving operators a way to standardize provisioning, health monitoring, remediation, and platform services without discarding their current tools. DSX Exchange and MCP servers expose a unified operational surface that AI agents can use as a tool catalog, enabling agentic workflows to link GPU health events, thermal anomalies, network issues, and performance impacts across domains. This coordination shifts operations from reactive alerting to automated remediation and consistent runtime management. For enterprises, that means faster deployment cycles and better resource utilization for token generation workloads. By treating AI operating systems as shared, modular platforms rather than bespoke stacks, NVIDIA DSX helps organizations scale AI production from a few clusters to gigawatt-scale AI factory infrastructure.