What the New Siri AI Architecture Really Is
The new Siri AI architecture is a hybrid system that combines Apple’s on-device intelligence, private cloud infrastructure, and selective use of partner models to deliver a personal assistant deeply integrated into the operating system. This design moves Siri from a scripted voice interface to a multi-model personal AI that understands context across apps, data, and tasks, while still protecting user privacy through strict separation between local processing and cloud requests. Following WWDC 2026 Siri announcements, Craig Federighi and his team held a technical briefing to counter the idea that Apple had replaced Siri with a thin layer on top of Google Gemini. According to the 9to5Mac-reported session, Apple Intelligence is built as part of the OS itself, not as a standalone chatbot app, and it routes every request through Apple’s own orchestration and privacy controls first.

Federighi’s Clarification: Siri Is Not Google Gemini
Craig Federighi was direct about the limits of Apple’s Gemini partnership. He explained that Apple does not ship the Gemini app, does not run Google’s client code, and does not use the models or deployment stack that Google offers to its own customers. He also stressed that Google Search is not the base of the system and said “The amount of the Google Assistant we use is none.” The aim is to distinguish Apple’s Siri AI architecture from generic chatbot integrations that forward almost everything to a third-party service. Instead, Gemini appears in the pipeline as one of several sources for advanced large-model capabilities, and only behind Apple’s own layers of control. Siri’s new behavior—understanding device context, content, and actions—is driven first by Apple’s architecture, with external models reserved for complex, high-reasoning cases that exceed on-device or Apple-run cloud models.
On-Device AI Processing and the System Orchestrator
At the core of WWDC 2026 Siri changes is Apple’s emphasis on on-device AI processing. A new “System Orchestrator” sits inside the operating system and decides how to satisfy each request using local models, app knowledge, and personal data stored on the device. This orchestrator treats Siri less as a voice front end and more as a system-wide assistant layer that can act across apps, notifications, and documents while keeping as much processing local as possible. When requests grow more demanding, the orchestrator may call Apple’s Private Cloud Compute, which Apple describes as an extension of the iPhone’s privacy promise into the cloud. Even then, requests move to Apple-controlled environments rather than directly to a partner API. This design lets Apple keep tight control of context, permissions, and data minimization while still gaining the benefits of larger models when the task demands it.
Apple Foundation Models and the Role of Gemini
Amar Subramanya, Apple’s VP of AI, detailed how the company has built a “family of third-generation Apple Foundation Models” that span both on-device and cloud deployments. These AFM models are custom-built for Apple Silicon, trained on proprietary data, and then refined “using outwards from Gemini frontier models.” In practice, this means Gemini helps shape and improve Apple’s models, but Apple runs its own stack rather than handing Siri over to Google. For the most demanding reasoning tasks, Apple has created AFM Cloud Pro, powered in part by Nvidia GPUs hosted inside Google’s cloud while still wrapped in Apple’s Private Cloud Compute guarantees. The hybrid AI model is designed to match each query to the smallest and closest model that can answer it, balancing latency, privacy, and capability instead of defaulting everything to a remote, third-party chatbot endpoint.






