What Is Google AI Edge Gallery and Why Run Gemma Locally on Mac?
Google AI Edge Gallery for macOS is a first-party app that lets you download and run Gemma large language models entirely on your Mac, enabling offline AI models, private AI inference, and low-latency text generation without sending data to external servers. With this release, Mac users can run Gemma locally on Mac through an interface built by Google rather than third‑party tools. The app provides a curated collection of Gemma models tuned for on-device use, including instruction‑tuned variants for chat and coding. Because everything runs on your Mac’s hardware, responses depend on local performance, not cloud congestion. This makes Google AI Edge Gallery a focused alternative to flexible platforms like Ollama or LM Studio, trading breadth of models for an optimized, Google-controlled local LLM deployment that prioritizes privacy and reliability.

Check Your Mac and Install Google AI Edge Gallery
Before you run Gemma locally on Mac, confirm your hardware is ready. Google says the flagship Gemma-4-12B-it model is designed to run “directly” on laptops with at least 16GB of VRAM or unified memory, which includes all modern Apple silicon Macs except the MacBook Neo. If your Mac meets that bar, go to Google’s AI Edge Gallery website and download the macOS version, which is now available as a direct download. Install it like any other app by dragging it into Applications, then open it from Launchpad or Spotlight. On first launch, you may need to approve the app in System Settings if macOS flags it as downloaded from the web. Once open, the gallery becomes your hub for managing offline AI models, downloading Gemma variants, and configuring basic settings.
Download Gemma Models Optimized for Offline AI Inference
With AI Edge Gallery installed, you can choose from a curated list of Gemma models aimed at local LLM deployment on consumer hardware. According to AppleInsider, the app supports Gemma-4-12B-it, Gemma-4-E2B-it, Gemma-4-E4B-it, Gemma-3n-E2B-it, and Gemma-3n-E4B-it. These instruction‑tuned models are suitable for chat-style interaction, coding help, and general writing. Gemma 4 12B stands out: it is a 12‑billion‑parameter multimodal model that handles text, vision, and audio while delivering performance Google compares to a larger 26‑billion‑parameter mixture‑of‑experts model. From within the gallery, select the model you need and start the download. Once downloaded, the model stays on your Mac, so you can run offline AI models without an internet connection and without sending any prompts or documents to cloud servers.
Run Private AI Inference: Text, Coding, and Multimodal Tasks
After downloading at least one model, you can start private AI inference directly from AI Edge Gallery. Use Gemma-4-12B-it or one of the smaller instruction-tuned options for conversation, drafting emails, summarizing documents stored locally, or exploring code. Because the model runs on-device, latency is typically lower than cloud APIs, and your prompts, files, and outputs stay on your Mac. Gemma 4 12B’s multimodal design means it can work with text, vision, and audio inputs when supported by the app, enabling richer local workflows than text-only models. Compared with cloud tools like ChatGPT, Claude, or online Gemini, this setup reduces data transmission and removes reliance on a stable connection. Compared with Ollama and LM Studio, you lose access to thousands of community models but gain a focused environment around Google’s own Gemma family.
Bonus: Use AI Edge Eloquent for On-Device Dictation and Editing
To round out your offline AI setup, install Google’s AI Edge Eloquent app alongside AI Edge Gallery. Eloquent is a separate, free on-device dictation and editing tool that works across all Mac apps and launches via a keyboard shortcut. It listens to your speech, transcribes it, removes filler words, and cleans up the text before inserting it into the active app. You can choose writing styles and define custom vocabulary for names or domain-specific jargon, making it useful for notes, emails, and documents. All processing runs locally, so spoken content never leaves your computer. Together, AI Edge Gallery for running Gemma models and AI Edge Eloquent for dictation give you a cohesive offline AI workflow on Mac, blending local LLM deployment with fast, privacy‑preserving input and editing.






