MilikMilik

Run AI Agents Directly on Arduino Without Cloud Dependencies or API Keys

Run AI Agents Directly on Arduino Without Cloud Dependencies or API Keys

Why Local Agentic AI on Microcontrollers Matters

Most “AI on hardware” demos secretly rely on cloud language models: your microcontroller forwards prompts over the internet, a remote LLM produces the answer, and the board simply follows instructions. QClaw overturns this pattern by running a full local AI microcontroller workflow directly on the Arduino Uno Q. No API keys, no subscriptions, and no network connection are required. This approach turns the Uno Q into an offline AI assistant that can understand instructions, orchestrate tools, and update its own firmware. Instead of treating the device as a dumb endpoint, you deploy an embedded agentic AI that reasons, decides, and acts entirely on-board. For IoT builders and embedded developers, this unlocks offline-first prototypes, privacy-preserving designs, and resilient systems that continue working even when networks fail. The result is an Arduino AI agent you fully control, from model weights to flashing behavior, with zero external dependencies.

Inside QClaw: How the Uno Q Hosts Its Own AI Agent

QClaw is an embedded agentic AI stack that lives entirely on the Arduino Uno Q. The board’s split-silicon design is the key: an MPU and an MCU share the same PCB, with GPIO lines wired to the MCU’s SWD pins. Using the linuxgpiod driver, the MPU can hold the MCU in reset and reprogram its flash directly—no USB cable, external probe, or second computer. On the MPU side, QClaw runs a llama-server backend and hosts a Qwen3.5 0.8B Q4_0 model with an 8K context window. With 4 GB of RAM, the board keeps the model mlocked and uses a q8_0 KV cache, achieving around 8 tokens per second. Above the model sits an agent loop with an eight-tool interface and a fifteen-skill pre-router. This architecture allows the offline AI assistant to plan actions, select tools, and manage files while staying within the tight resource budget of a local AI microcontroller platform.

From Prompt to Firmware: The Fully Local Agentic Workflow

QClaw’s value is its complete, cloud-free loop: generate, compile, flash, and observe, all on the Arduino Uno Q. When you use the agentic runtime, a Go gateway sits in front of the model, running the multi-iteration loop, the pre-router, and the eight-tool dispatcher. The agent can navigate its workspace with read_file, write_file, and list_dir, then call the arduino tool to build firmware using arduino-cli. The arduino tool invokes arduino-cli compile with the arduino:zephyr:unoq FQBN and exports a .elf-zsk.bin binary. QClaw then pipes that binary into OpenOCD over the GPIO-based SWD bridge, flashing the MCU directly from the MPU with sub-second uploads once the file is on disk. No SSH, no network, and no remote OpenOCD tunneling are involved. The result is a local Arduino AI agent that can respond to a natural-language request—like scrolling text on an LED matrix—and autonomously deploy the corresponding sketch.

Pre-Routing, Skills, and Tools: Making a 0.8B Model Feel Smart

Running embedded agentic AI on a small 0.8B model requires careful prompt engineering. QClaw’s pre-router is a lightweight, skill-based system rather than full retrieval-augmented generation. It uses 23 keyword regex rules to select from 15 skills—such as blink, servo control, LED matrix handling, Uno Q pin tables, dual-chip workflow, Linux Wi-Fi, camera, and Modulino sensors. When a user message arrives, the pre-router inlines the relevant SKILL.md content and referenced files directly into the system prompt before the LLM call. This eliminates the need for the model to call read_file just to access canonical knowledge, avoiding an extra LLM iteration that would be prohibitively slow at 0.8B scale. The tool layer is tightly constrained as well: there is no general exec or shell, and each tool validates its inputs against allow-lists. This combination keeps the offline AI assistant practical and safe while still enabling rich, context-aware behavior on constrained hardware.

Getting Started and Future Possibilities for Offline-First IoT

Setting up QClaw is straightforward if you are familiar with development on Linux-like systems. You clone the Uno-QClaw repository, initialize submodules, and download the llama.cpp-based inference engine. Next, you fetch the Qwen3.5 0.8B Q4_0 GGUF model (around 490 MB) into a local models directory. A single make qclaw-install step builds the Go gateway, installs arduino-cli and the arduino:zephyr core, and prepares a workspace with the system prompt and skill tree. From there, you choose between two runtimes: make qclaw-agentic for the full agent loop with compile, upload, camera, sysfs_led, network, and i2cdetect tools, or make qclaw-direct for a lighter, faster Q&A interface without tool calls. This pattern demonstrates that a local AI microcontroller can host powerful offline workflows. It points toward a new class of offline-first IoT designs where an embedded Arduino AI agent provides resilience, privacy, and autonomy without ever touching the cloud.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!