What NVIDIA DSX Is and Why AI Factories Need It
NVIDIA DSX is a full-stack AI factory platform that combines software, reference designs, simulation, and partner systems into one coordinated framework for designing, building, and operating large-scale AI infrastructure. Instead of treating chips, servers, facilities, and operations as separate projects, the NVIDIA DSX platform aligns them under a common architecture so infrastructure builders can move from planning to production faster. AI factories—data centers that produce tokens as AI “output”—must keep up with rising demand while staying within strict power and reliability limits. Point solutions for cooling, scheduling, or monitoring often create gaps and conflicting priorities. DSX addresses this by treating energy, chips, infrastructure, models, and applications as a five-layer stack that can be designed and tuned together. With this unified approach, operators can focus on increasing tokens per watt and reducing token cost, rather than stitching together disconnected tools.
Inside the NVIDIA DSX Platform: From Reference Designs to Simulation
The NVIDIA DSX platform gives infrastructure teams a shared blueprint for AI factory infrastructure from the earliest design stages. DSX Reference Design offers generation-specific, validated architectures that span compute, networking, storage, cluster layouts, and facilities such as power and cooling. This helps standardize how new AI factories are built and expanded. DSX Sim adds a high-fidelity simulation layer, so teams can model, validate, and optimize infrastructure decisions before any racks are installed. According to NVIDIA, “with the DSX platform, you can simulate the entire factory before you spend a dollar, validate performance before a single rack is installed and operate with the kind of reliability that production AI demands.” By simulating grid behavior, thermal dynamics, and workload performance, DSX reduces guesswork and shortens time to first production, all while aligning chips, systems, and facilities under one plan.

DSX OS: The Open, Modular DSX OS Software Layer
At the heart of the NVIDIA DSX platform is DSX OS software, an open source, modular stack built specifically for AI factory operations at scale. DSX OS focuses on lifecycle management, intelligent scheduling, runtime consistency, health automation, resiliency, multi-tenant operations, and platform services. These components are co-designed so data center hardware, facilities, and AI services can operate as one system instead of isolated domains. Because DSX OS is released as open source and derived from the software NVIDIA uses to run NVIDIA DGX Cloud, ecosystem partners can reuse proven building blocks instead of coding everything themselves, cutting months of custom work. DSX OS also standardizes communication across compute, networking, power, and cooling systems, making critical facility signals visible to AI infrastructure software. This shared software layer is what turns AI factory infrastructure into a manageable, programmable platform instead of a patchwork of tools.
Maximizing Tokens per Watt: DSX MaxLPS, Flex, and Exchange
Power is often the hard limit for AI factory infrastructure, so the NVIDIA DSX platform includes components focused on energy-aware operations. DSX MaxLPS is a suite of technologies that aims to maximize token performance per megawatt within a fixed power budget. It combines 45-degrees-Celsius liquid cooling with in-rack optimizations to improve performance per watt, allowing operators to run up to 40% more GPUs at their most energy-efficient point with minimal workload impact. DSX Flex connects AI factories to power-grid services, so workloads can respond to events such as load shedding, demand response, and pricing changes. DSX Exchange provides an MQTT-based IT/OT hub that links facility-level signals like grid events and thermal data to the software stack. Together, these tools help operators turn available power into higher AI output and make energy a first-class part of the AI factory strategy.
From Point Solutions to Full-Stack AI Factory Infrastructure
Traditional AI infrastructure often evolves as a collection of point solutions: one tool for cluster management, another for cooling, separate systems for power monitoring, and multiple vendor consoles. This fragmentation makes it difficult to coordinate upgrades, respond to faults, or scale to gigawatt-level AI factory infrastructure. The NVIDIA DSX platform takes a full-stack AI platform approach instead. It weaves together DSX OS software, DSX Reference Design, DSX Sim, MaxLPS, Flex, and Exchange into a single architecture that covers compute, networking, storage, facilities, power, cooling, controls, simulation, and operations. DSX OS then adds health automation, fleet-wide visibility, and automated remediation to reduce reliance on reactive alerts. The result is a platform where chips, systems, facilities, and AI services can be tuned together for faster time to revenue, higher efficiency, and better reliability—rather than managed as separate islands of technology.
