Why AI Compute Infrastructure Can’t Rely on GPUs Alone
The AI boom is colliding with a hard reality: traditional GPU-based data centres are expensive to scale, power-hungry and increasingly constrained. As organisations move from training a few frontier models to deploying AI everywhere, inference workloads surge and strain existing infrastructure. Data centres hit power and cooling limits, while enterprises in sectors like healthcare and manufacturing also need low latency and data privacy, pushing them toward private and edge deployments rather than distant hyperscale clouds. Industrial players such as IA are betting that many companies will prefer on‑premise or site‑adjacent AI compute, delivered through small, modular data centres that can be expanded over time instead of huge, monolithic builds. At the same time, demand for edge AI chips is accelerating as billions of IoT devices and physical systems require local intelligence. Together, these pressures are forcing the AI compute stack to evolve beyond standard GPUs toward more specialised, efficient and distributed approaches.

Networked AI Architectures: Tenstorrent’s Scale-Out Approach
One response to GPU bottlenecks is to rethink how compute, memory and networking are stitched together. Tenstorrent’s Galaxy Blackhole platform illustrates this shift with a native networked AI architecture that unifies these elements into a single system optimised for real-world workloads. Instead of bolting together separate accelerators, Galaxy is designed as general-purpose AI infrastructure that can deliver leading performance across video generation and large language models in both prefill and decode. In collaboration with Prodia, a Tenstorrent Galaxy supercluster reportedly generates 720p, 81‑frame AI video around 10 times faster than leading GPU systems, and its Blitz Mode targets premium, latency‑sensitive LLM use cases such as long‑context reasoning and agentic workflows. This kind of scale‑out, high‑bandwidth networked AI architecture points toward an infrastructure future where performance gains come as much from system design and interconnects as from any single chip’s raw speed.

Optical AI Servers: Lumai’s Photonic Leap for Inference
Beyond silicon, optical computing is emerging as a radical way to cut power and latency for AI. Lumai’s Iris inference server, and its first product Iris Nova, is billed as the world’s first optical computing system to run billion‑parameter LLMs in real time. Instead of relying solely on electronic circuits, Lumai accelerates inference workloads using light, enabling higher execution efficiency and up to 90% lower energy consumption than conventional GPU-based architectures. Built as an optical AI server that fits into standard PCIe-card‑based systems, Iris Nova targets hyperscalers, emerging “neo‑clouds”, enterprises and research labs that are hitting energy and scalability limits in their existing AI compute infrastructure. As AI shifts into an “inference era” where deployment volume dominates, photonic approaches like this could become crucial for sustaining growth without overwhelming data‑centre power budgets, especially for latency‑sensitive applications such as conversational assistants, real‑time analytics and interactive video experiences.
Decentralised AI Clouds and Edge AI Chips: Pushing Compute Outward
Alongside new hardware, the AI ecosystem is experimenting with new ways of organising and commercialising compute. Solidus AI Tech’s rebrand to AITECH Cloud Network and migration from BNB Chain to Ethereum reflects a broader move toward decentralised AI cloud infrastructure. ACN is positioning itself as a unified enterprise‑grade stack with three layers: distributed high‑performance GPU compute for training and inference, orchestration for autonomous AI agents, and an on‑chain economic layer for payments and coordination. At the physical edge, companies like Advantech see edge AI as a key growth driver, rolling out platforms that deliver at least 100 trillion operations per second for industrial, healthcare and robotics use cases that need real‑time response and data privacy. Market research also points to rapid growth in edge AI chips as billions of IoT devices and autonomous systems demand local processing, complementing centralised and decentralised cloud resources.
Modular Data Centres and the Road Ahead for AI Compute
Industrial players are redesigning where AI runs, not just how fast it runs. IA, for example, is building an AI compute infrastructure strategy around modular data centres roughly the size of a refrigerator, which can be installed quickly on factory floors, in public-sector facilities or for defence. These units can be expanded by adding modules and racks, and are tied into a broader Kneocube initiative that spans NPUs, GPUs, cloud operating systems and AI models, aiming to deliver a full‑stack solution from hardware to services. Over the next few years, such modular and edge‑centric designs, combined with optical AI servers, networked AI architectures and decentralised AI clouds, could ease GPU shortages, reduce power consumption and intensify competition among AI cloud providers. For non‑technical observers, the simple message is that AI’s progress will depend as much on innovative compute infrastructure as on clever new algorithms and models.
