What GPU cloud infrastructure is solving for enterprises
GPU cloud infrastructure is a class of computing platforms that connects large pools of graphical processing units to distributed data sources, allowing enterprises to run AI training and inference without being constrained by where their data physically resides. For most organizations, the core problem has been that data lives in scattered storage systems while GPUs sit in distant clusters, forcing teams to copy, stage, and synchronize information before workloads can start. One recent analysis cited by Qumulo notes that average enterprise GPU utilization sits at about 5%, meaning expensive accelerated hardware is idle most of the time because data arrives late. New enterprise data fabric approaches, combined with distributed GPU access, aim to change this equation by presenting a single, consistent data view to GPUs across clouds and regions, so compute can be scheduled where capacity is available rather than where storage happens to be deployed.
Qumulo’s AI data fabric and the end of staging delays
Qumulo’s Cloud AI Accelerator centers on an AI data fabric that links Cloud Native Qumulo, Qumulo Cloud Data Fabric, and Qumulo NeuralCache across on-premises, edge, and multi-cloud environments. Instead of replicating datasets into every GPU cluster, the platform connects enterprise data to GPUs in real time, so workloads can begin without lengthy staging cycles or consistency trade-offs. Qumulo calls this “GPU liquidity”: the ability to run jobs wherever GPU capacity is open, across regions and clouds, while keeping a unified data footprint. According to Qumulo, average enterprise GPU utilization hovers around 5%, and the company argues that eliminating data-gravity and staging delays is key to raising this figure. Enterprises can connect Qumulo systems directly to services like Microsoft AI Foundry, AWS Bedrock, and Google Vertex AI without copying data, avoid siloed storage islands, and cut idle compute costs by skipping heavy load phases into GPU-attached flash.
SoftBank’s Infrinia-powered GPU cloud and sovereign compute
SoftBank’s AI Data Center GPU Cloud adds another layer to this shift by combining high-density NVIDIA GB200 NVL72 systems with its Infrinia AI Cloud OS. Infrinia AI Cloud OS provides Kubernetes as a Service for multi-tenant environments and Inference as a Service for large language model APIs, so teams can orchestrate AI workloads from a single control plane. The hardware is designed for memory-intensive training and complex inference, with NVLink interconnects reducing the need for heavy network traffic between nodes. SoftBank positions this as a sovereign GPU cloud infrastructure for enterprises that need their AI workloads to remain within a defined jurisdiction and still benefit from scalable distributed GPU access. Training and inference share one GPU pool, while automation handles scaling, load balancing, and recovery, allowing organizations to concentrate on models and data pipelines instead of building and operating custom stacks.
Direct GPU access and the future of AI data management
Both Qumulo’s AI data fabric and SoftBank’s Infrinia AI Cloud OS point toward an architecture where GPUs access enterprise data directly, across clouds and regions, without replication overheads. This changes AI data management from a logistics problem into a scheduling problem: instead of moving terabytes of information to wherever GPUs are idle, enterprises route workloads to available capacity while presenting a consistent data view. Direct paths between storage and compute erase traditional bottlenecks between data and GPUs, shrinking training start times and enabling continuous inference pipelines. As enterprises push more AI into production, these platforms show how GPU cloud infrastructure and enterprise data fabric technology can work together to keep utilization high and latency low, while avoiding fragmented storage silos. The result is a more flexible model for distributed GPU access, in which data stays authoritative and GPUs stay busy.
