On-device AI models in Microsoft Edge

What on-device AI in Microsoft Edge means for the web

On-device AI in Microsoft Edge is the use of small, optimized language and task-specific models that run directly in the browser, turning web pages into coding and writing assistants without depending on cloud services or external APIs. Microsoft’s Edge browser AI story started with the Prompt and Writing Assistance APIs powered by the Phi-4-mini model, bringing a 4B-parameter language model into local browser contexts for text understanding and reasoning. That early work set the pattern: ship a capable model, then expose it through JavaScript APIs so developers can build browser-based coding assistants, writing helpers, and content tools that run locally. With new local language APIs, on-device speech recognition, and the Aion-1.0-Instruct model, Edge is shifting more AI work onto the user’s PC, trimming latency while keeping sensitive prompts and outputs on the device.

Microsoft Edge On-Device AI Turns the Browser into a Local Coding and Writing Assistant

From Phi-4-mini to Aion: bringing AI to more CPUs and GPUs

The Phi-4-mini model gave Edge a strong text engine, but its hardware needs limited where Microsoft could ship it. Aion-1.0-Instruct is the answer to that constraint. According to Microsoft’s Edge team, the new model is “smaller, faster, and more efficient” than Phi-4-mini while remaining effective for a wide range of web use cases. Crucially, Aion supports less capable GPUs and can run through CPU inference, so more PCs can host an on-device AI experience. In Edge Canary and Dev builds, Aion lives behind the Prompt and Writing Assistance APIs, where websites and extensions treat it as an experimental local assistant that may need to be downloaded before use. This approach turns Edge into a test bed for how compact language models behave across real hardware, including how browser-based coding assistants and writing tools degrade gracefully when local models are unavailable.

Local language APIs: translation and detection without the cloud

Beyond general language models, Microsoft Edge now includes local language APIs focused on detection and translation. The Language Detector and Translator APIs in Edge 148 use on-device, task-specific models to identify text languages and translate between more than 145 languages directly in the browser. Developers call these features from JavaScript, gaining low-latency translation with no external network call, which improves privacy and removes per-request translation costs. For browser-based writing assistants, this means instant language-aware suggestions, automatic detection of user input language, and inline translation features that behave even when the network is unreliable. The same APIs can underpin multilingual coding helpers that label comments, translate documentation snippets, or localize interface copy, all powered by Microsoft Edge browser AI running on the user’s machine rather than in a remote data center.

Seven new MAI models and the rise of browser-based coding assistants

While Edge focuses on on-device AI models, Microsoft’s broader MAI family adds new engines for reasoning, coding, image, voice, and transcription tasks. The seven new MAI models include MAI-Thinking-1, a mid-sized reasoning model with a 256K context window, and MAI-Code-1-Flash, which is rolling into GitHub Copilot via VS Code for fast code suggestions. Alongside MAI-Image and voice systems, these models reinforce Microsoft’s push to use its own AI stack across products. For web developers, this ecosystem matters because the same company shipping MAI-Code-1-Flash to Copilot is also shipping Aion into Edge’s Prompt API. The result is a continuum: heavy MAI models in the cloud for complex jobs and lighter on-device AI models in the browser for quick, private tasks like in-page refactors, doc comments, and context-aware writing suggestions.

Practical use cases: building privacy-first AI web apps

The combination of Aion, Phi-4-mini, local language APIs, and experimental on-device speech recognition enables a new category of browser-based coding assistants and writing tools that respect user privacy. A web IDE can use the Prompt API to offer local code completions, refactoring hints, and comment generation without sending source code to a server. A note-taking app can call the Writing Assistance APIs to rewrite text, detect language, and translate content offline. Extensions can tap the Language Detector and Translator for real-time page translation, while future speech recognition support via the Web Speech API points toward voice-controlled editors that stay local. Developers will still need to handle model availability and downloads, but when the model is present, on-device processing reduces latency, eliminates cloud dependency, and gives users more control over how their data flows through AI features.