From Cloud Assistants to Self-Hosted LLMs
Developers are moving away from cloud-only coding assistants toward open source coding models and local AI development tools that run on infrastructure they control, reducing dependency on proprietary APIs, strengthening privacy, and giving teams more predictable costs and technical autonomy as AI becomes central to software engineering workflows. This shift is fueled by frustration with vendor lock-in, rate limits, and opaque pricing from frontier AI providers. Teams building continuous agentic systems and high-volume coding assistants need reliable performance and tighter integration with internal tooling than generic cloud models often provide. Self-hosted LLM options now span lean, specialized models for coding tasks and larger, general-purpose systems delivered through flexible platforms. Together, these options mark a transition from AI as a managed service to AI as a core part of the development stack, owned and operated alongside source control, CI pipelines, and observability tools.
Mellum2: JetBrains Bets on Open, Infrastructure-Level Coding AI
JetBrains’ Mellum2 embodies this shift by offering an open-source, 12B-parameter coding model designed to run entirely on infrastructure that engineering teams own. Released with open weights under Apache 2.0, Mellum2 is tuned for routing, retrieval pipelines, and sub-agent tasks rather than competing with frontier chat models. JetBrains describes it as a “focal model” aimed at high-frequency software engineering workloads, with base, instruct, and “thinking” variants tailored to different levels of reasoning. Its Mixture-of-Experts design keeps only 2.5B parameters active per token, so it behaves more like a smaller dense model in cost and latency while still scoring 78.4% on the EvalPlus function-level code benchmark in its thinking variant. By running fully on-premises, Mellum2 goes where tools like Claude Code or cloud-tied IDE copilots cannot, giving enterprises a self-hosted LLM option that aligns with strict data sovereignty and audit requirements.
MAI-Thinking-1: Microsoft Adds Another Major Player to the Stack
Microsoft’s MAI-Thinking-1 takes a different route to the same destination: more control over the AI coding stack, but at hyperscaler scale. Introduced at Microsoft Build, MAI-Thinking-1 is the company’s first large language model that it says matches the performance of Anthropic’s Claude Opus 4.6 and underpins a strategy to build frontier AI that is not dependent on OpenAI. It arrives alongside MAI-Image-2.5, MAI-Voice-2, MAI-Transcribe-1.5, and MAI-Code-1-Flash for “vibe coding,” all exposed to developers through Foundry and already powering Copilot, Bing, PowerPoint, and Azure Speech. According to Microsoft AI CEO Mustafa Suleyman, the new models “feel like this is a new era of AI that you control on your terms.” For developers, this adds another strong vendor option and signals a broader move toward multi-model flexibility across applications.

Control, Cost Predictability, and Data Sovereignty Take Center Stage
Both Mellum2 and MAI-Thinking-1 highlight a market where developers want more than convenience from managed AI services. Open source coding models and local AI development tools reduce exposure to sudden API changes, outages, and policy shifts from a single provider. They also help address latency concerns by running closer to where code is written and executed, and they give legal and security teams clearer answers about where data lives and how it is processed. Enterprises can choose Mellum2’s self-hosted LLM path for maximum control, or tap Microsoft’s growing in-house model portfolio to diversify beyond a single frontier partner while staying inside a managed cloud. Microsoft’s leadership frames this as moving from “consuming a frontier model to fully participating at the frontier,” and end users are likely to favor ecosystems that keep model choice open rather than narrowing it.
What Mellum2 vs Claude and MAI-Thinking-1 Mean for Developers
The Mellum2 vs Claude comparison is less about raw benchmarks and more about deployment philosophy: Mellum2 is built to be self-hosted, specialized, and owned, while Claude Code remains tied to Anthropic’s cloud. MAI-Thinking-1 adds yet another option, showing that even major platform providers want independence from external frontier models. For engineering leaders, this expanding menu changes procurement conversations from “which API” to “which stack do we want to control.” Teams can blend local AI development tools for latency-sensitive, confidential workloads with larger managed models for broad reasoning. Over time, frameworks that make it easy to route tasks among Mellum2, MAI-Thinking-1, and other self-hosted LLMs will likely become standard. The winners will be tools that respect developers’ need for control, cost clarity, and data sovereignty without sacrificing the speed and quality gains that made cloud coding assistants popular in the first place.






