What AI inference startups are and why their valuations are exploding
AI inference startups are companies that focus on running already-trained machine learning models at scale, offering infrastructure so enterprises can send data to these models and receive predictions or generated content efficiently, reliably, and at predictable cost across diverse production workloads. Once seen as a thin layer between model creators and application builders, these AI infrastructure companies are now at the center of the boom in enterprise AI deployment. Demand for large language model inference has grown fast as coding assistants, internal knowledge agents, and AI productivity tools move from pilots into daily workflows. That demand is translating into revenue momentum strong enough to push several inference specialists toward decacorn valuation territory, not through slow, multi-year venture cycles, but through rapid funding rounds that follow sharp revenue increases and expanding customer bases across multiple sectors.
From side-eye to decacorn: Baseten, Fireworks and the revenue reset
Investor doubts about inference margins have given way to a focus on growth, as Baseten, Fireworks AI and Modal post eye-catching revenue numbers. Fireworks AI CEO Lin Qiao said on X that the company surpassed USD 800 million (approx. RM3,680 million) in annualized revenue, up from USD 250 million (approx. RM1,150 million) in late October last year. Modal shared that it has crossed USD 300 million (approx. RM1,380 million) in ARR, while Baseten’s ARR reportedly jumped to USD 600 million (approx. RM2,760 million) from USD 200 million (approx. RM920 million) at the start of the quarter. Baseten is raising up to USD 1 billion (approx. RM4,600 million) for an USD 11 billion (approx. RM50,600 million) decacorn valuation, while Fireworks AI is in talks around USD 15 billion (approx. RM69,000 million). According to Menlo Ventures partner Deedy Das, many of these companies are now growing at multiples on a USD 100 million-plus (approx. RM460 million-plus) revenue baseline.
Specialized AI infrastructure companies win as enterprises scale deployment
The surge in valuation reflects a broader shift toward specialized AI infrastructure companies as enterprises optimize the costs of running large language models in production. Inference workloads are recurring, tied directly to usage rather than one-off training runs, which makes them attractive for enterprises that care about predictable spend and for startups that benefit from usage-based revenue. Coding assistants, where Fireworks AI leans on Cursor as a major customer, are an early proof point, but similar patterns are emerging in customer support, analytics, and internal knowledge use cases. Enterprises want flexibility in model choice, fine-tuning options, and tools that adapt to their data governance rules. That pushes them toward AI inference startups that offer model customization tools, evaluation frameworks, and APIs across many models, rather than single-model platforms, helping those providers reach decacorn valuation on the back of enterprise AI deployment rather than speculative future use.
Competing with generalist AI labs and hyperscalers on value, not hype
As inference specialists climb toward decacorn status, investors are starting to compare their valuations to generalist AI labs such as Anthropic and OpenAI. Anthropic’s reported USD 65 billion (approx. RM299,000 million) Series H lifted its valuation significantly, underscoring how much capital concentrates in frontier model labs. Yet inference-focused players differentiate themselves by chasing revenue tied to production workloads, not model training milestones. They compete with labs and hyperscalers for GPU access, but they also position themselves as neutral infrastructure for multi-model enterprise AI deployment. Companies like Together AI, which combines inference with a broader AI-native cloud stack, and Fal, which serves over 1,000 image, video, audio, 3D, and world models through an inference engine, show that specialization can span both breadth of models and depth of tooling. Competitive pricing pressure remains strong, but so does customer appetite for flexible, model-agnostic inference platforms.
Margin questions, VC allocation, and what comes next for AI inference
The rapid rise of AI inference startups raises hard questions about margins and long-term durability. Most leading providers lease GPU capacity instead of owning the chip stack, unlike some neoclouds that pair infrastructure ownership with inference services. That means cost of goods sold stays high, while they also compete with major labs that secure large compute allocations for their own use. One skeptical investor notes that “VCs are just doing a revenue multiple and are assuming the margin doesn’t matter,” highlighting the risk that current decacorn valuation levels may not match eventual profitability. For venture capital, the signal is clear: AI infrastructure companies that sit close to real workloads are attracting substantial capital, potentially crowding out more speculative bets. If these inference leaders can convert fast growth into sustainable margins, they may define the next phase of the AI startup ecosystem; if not, consolidation and price wars could follow.
