Microsoft MAI models in Foundry for developers

What Microsoft’s Expanded MAI Family Means for Developers

Microsoft’s new MAI models are a family of in-house AI systems for reasoning, coding, image generation, voice synthesis, and transcription that are now available to developers through Foundry in private preview, giving them direct access to multi‑modal tools under one managed platform. Instead of depending only on third‑party providers, teams can compare Microsoft MAI models side by side with OpenAI, Google, Anthropic, and specialist tools in the same deployment workflow. The flagship MAI-Thinking-1 reasoning model anchors this push, supported by MAI-Image-2.5 for image tasks, MAI-Transcribe-1.5 for speech-to-text, and MAI-Voice-2 for multilingual text-to-speech. Together, they support AI model access developers need for applications that mix long-context reasoning, code generation, image editing, and voice transcription tools. Microsoft is positioning this as a step toward wider availability in its MAI Playground and across products like Copilot, Teams, and Azure Speech.

MAI-Thinking-1: Long-Context Reasoning and Code for Foundry

MAI-Thinking-1 is Microsoft’s flagship reasoning model in the Foundry rollout, built with a sparse Mixture-of-Experts design that routes tasks through selected subnetworks rather than activating the whole model for every request. It lists 35 billion active parameters with roughly 1 trillion total parameters and a 256K-token context window, giving it room to handle long documents and large codebases in a single prompt. The model supports function calling, developer instructions, and compatibility with the Chat Completions API, lowering integration work for teams already familiar with that pattern. Kyle Daigle at GitHub presents MAI-Thinking-1 as suitable for complex chained tasks, long-context reasoning, and code generation. Microsoft says the model was trained on commercially licensed data without distillation from third-party models, a stance aimed at enterprises sensitive to intellectual property and data provenance.

MAI-Image-2.5 and Image Benchmarks Against Nano Banana

MAI-Image-2.5 brings Microsoft’s image efforts into direct competition with Google’s Nano Banana line and OpenAI’s gpt-image-2. On the Arena benchmark, the model landed third, behind those two systems, but Microsoft highlights scenarios where its image generation benchmark performance exceeds Google’s Nano Banana 2 on specific tasks. The image model ships in two variants: a high-quality MAI-Image-2.5 and a faster MAI-Image-2.5e, similar to the earlier MAI-Image-2 split between a quality and a speed-optimized option. MAI-Image-2.5 can accept image uploads, enabling both generation and precise editing workflows that matter for professional and enterprise users. According to Microsoft AI CEO Mustafa Suleyman, MAI-Image-2.5 “gives you precise editing with incredible control and consistency,” while the Flash-style variant is tuned for “super-efficient production workloads.”

Microsoft’s New MAI Models Reshape Developer Access to AI

MAI-Transcribe-1.5 and MAI-Voice-2: Speech and Voice Upgrades

On the audio side, Microsoft is extending its speech portfolio with MAI-Transcribe-1.5 and MAI-Voice-2. MAI-Transcribe-1.5 builds on the speech-to-text model released in April, which already claimed the lowest word error rate across 25 languages, giving developers stronger voice transcription tools for multi-language applications. MAI-Voice-2 is a multilingual successor to MAI-Voice-1, adding German, Australian and US English, Spanish, French, Hindi, Indonesian, Italian, Japanese, Korean, Dutch, Portuguese, Turkish, Vietnamese, and Chinese. It widens the emotional palette with tones such as angry, confused, and embarrassed, and early samples suggest it can whisper, opening more natural-sounding experiences across Copilot, Teams, and Azure Speech. These Microsoft MAI models bring text-to-speech and transcription into the same Foundry-managed environment as reasoning and image tools, making it easier for developers to build end-to-end audio pipelines.

Weight Tuning, Foundry Integration, and the Road to Build

Beyond the models themselves, Microsoft is using Foundry to change how AI model access developers work with its stack. Foundry remains the platform for discovering, deploying, and governing AI systems, but now Microsoft-owned MAI models sit in the same decision path as OpenAI, Anthropic, and Google options. Microsoft AI CEO Mustafa Suleyman says, “For the first time developers will be able to tune the weights of the model themselves,” signaling deeper control than prompt engineering or retrieval alone. Weight tuning allows enterprises to adapt MAI-Thinking-1 and related models to their domains while keeping governance centralised. The seven-model family, including code and transcription models, is in limited Foundry access now and is being prepared for a broader unveiling at Build 2026. This timing underlines Microsoft’s strategy to expand first-party AI availability while reducing dependence on external partners.