Enterprise GPU cloud infrastructure & data fabric

What Enterprise GPU Cloud Platforms Are Solving

Enterprise GPU cloud platforms for AI are integrated infrastructure and data fabric solutions that connect distributed corporate datasets directly to GPU pools across clouds and regions, removing the need for bulk replication or staging and allowing AI workloads to start as soon as GPUs are available, which improves utilization, cuts idle time, and simplifies multi-cloud operations. That matters because the typical enterprise still wastes most of its GPU capacity. According to a recent analysis cited by Qumulo, average enterprise GPU utilization sits around 5%, meaning accelerated compute is idle for about 95% of the time while data is being copied or rearranged. At the same time, AI projects are spreading across multiple clouds and locations, making traditional copy-and-sync storage models too slow. The new wave of enterprise data fabric and GPU cloud infrastructure is aimed directly at this mismatch between hungry GPUs and slow-moving data.

Qumulo’s AI Data Fabric and the End of Data Gravity

Qumulo’s Cloud AI Accelerator is an enterprise data fabric designed to link existing datasets to GPU resources without copying them first. It connects Cloud Native Qumulo, Qumulo Cloud Data Fabric, and Qumulo NeuralCache into a single layer that spans on-premises, edge, and multi-cloud environments. Instead of moving petabytes of files to wherever GPUs are free, it presents data in real time to GPU farms across regions, clouds, and hybrid deployments. This approach removes the “data gravity” that has forced enterprises to stage and replicate data before training or inference can begin. Qumulo also focuses on GPU cloud infrastructure efficiency: the platform eliminates the heavy load phase into GPU-attached flash, wipes out weeks-long staging delays, and removes the need for isolated storage silos. The result is higher GPU utilization and what Qumulo calls “GPU liquidity” across any cloud or availability zone.

Direct Multi-Cloud GPU Access Without Replication

The most important shift in these platforms is direct GPU access to enterprise data without replication. Qumulo’s Cloud AI Accelerator connects on-premises or cloud-native Qumulo systems directly to managed AI services such as Microsoft AI Foundry, AWS Bedrock, and Google Vertex AI. Enterprises do not need to copy data into each provider’s storage, and they avoid maintaining separate file systems wherever GPUs are hosted. This design turns multi-cloud GPU access from a logistics challenge into a scheduling decision. Workloads can be placed wherever capacity appears, across regions and providers, while reading from a consistent data fabric. For AI teams, this means less time waiting for data pipelines and more time running training jobs and inference at scale. For infrastructure teams, it reduces complexity: one enterprise data fabric instead of many replicated storage islands per cloud or region.

SoftBank’s GPU Cloud and Sovereign AI Compute

SoftBank’s AI Data Center GPU Cloud, powered by its Infrinia AI Cloud OS, targets the compute side of the same problem by offering a dense, centrally managed GPU pool. Infrinia AI Cloud OS provides Kubernetes as a Service for multi-tenant environments and an Inference as a Service layer for large language model APIs, lowering the total cost of ownership compared with custom stacks. Underneath, NVIDIA GB200 NVL72 systems deliver high-bandwidth, memory-rich GPU cloud infrastructure suitable for both training and complex inference. SoftBank positions this as a sovereign, secure platform for organizations that must keep training data and inference workloads within a defined jurisdiction. Longer term, the company plans to tie this GPU cloud to its Telco AI Cloud vision, integrating AI data centers with AI-RAN infrastructure to bring AI compute closer to the network edge and reduce latency for production workloads.

Why Multi-Cloud GPU Liquidity Will Define Enterprise AI

Taken together, these developments point to a new architectural pattern: a shared enterprise data fabric on one side and globally available GPU pools on the other, linked without bulk data movement. Qumulo targets the data plane, exposing a single file fabric to any GPU farm in any cloud and raising GPU utilization far beyond the current 5% baseline. SoftBank targets the compute plane, offering a high-density, centrally orchestrated GPU cloud with built-in AI services and a roadmap toward distributed, low-latency deployments. For enterprises, the prize is AI compute optimization: matching workloads to the best available GPUs across clouds and regions while accessing consistent data without staging delays. As AI deployments spread, multi-cloud GPU access will move from a nice-to-have to a core requirement, and platforms that can connect data to compute in real time will stand out.