MilikMilik

How AI Agents Are Moving Beyond Browsers to Automate Apps and Terminals

How AI Agents Are Moving Beyond Browsers to Automate Apps and Terminals
Interest|High-Quality Software

From Browser Bots to Full Computer-Use AI Agents

Computer-use AI agents are AI systems that understand user goals and carry out step-by-step actions across screens, apps, browsers, and terminals, turning natural-language instructions into real operations such as clicks, text input, code execution, and cloud console changes, instead of returning text answers alone. First-generation agents focused on browser automation, filling forms or scraping pages. The new wave goes further: they see the interface, decide which window or field matters, and loop through actions until a task is complete. This shift underpins app automation AI for enterprise workflows, where tools must survive logins, pop-ups, and changing layouts. By adding AI screen interaction, vendors want agents that can reliably operate desktop apps, developer tools, and cloud dashboards, not only web tabs. That evolution is pushing agents from sidecar assistants into the core of enterprise AI automation strategies.

Qwen3.7-Plus: Alibaba Pushes Into Screen, App, and Terminal Automation

Alibaba’s Qwen3.7-Plus is pitched as a multimodal “computer-use” model built for screen and coding automation, rather than language tasks alone. It adds vision input, screenshot perception, browser automation, app operation, and screen navigation so an agent can read an interface, choose the next click or keystroke, and check results across app and cloud workflows. According to Alibaba’s Tongyi Qianwen team, a hybrid agent using Qwen3.7-Plus generated more than 10,000 lines of code across more than 1,000 agent calls during an eleven-hour vocabulary-app build. The same model is claimed to recreate the native macOS Stocks app, from parsing the UI to writing SwiftUI code, wiring an API, compiling, and running ten functional tests. Benchmarks such as 79.0 on ScreenSpot Pro and 70.3 on Terminal-Bench highlight its focus on AI screen interaction and terminal tasks for enterprise AI automation.

How AI Agents Are Moving Beyond Browsers to Automate Apps and Terminals

WeChat’s In-App Agent Shows the Power of Embedded Ecosystems

While Qwen illustrates deep computer control, Tencent’s planned WeChat AI agent shows how app automation AI is moving into consumer ecosystems. The prototype would not stop at chat responses: it aims to complete tasks inside WeChat, using mini programs for payments, ordering, shopping, travel, and local services. Users would access a dedicated chat window by swiping right from the WeChat home screen, then issue commands such as finding a cafe that fits taste and price preferences and having drinks ordered through the right service. With about 1.4 billion active users, WeChat gives Tencent a massive test bed for AI screen interaction inside a single super-app. The company must still solve reliability, permission, and compute-cost limits, and define when users must confirm actions, but the direction is clear: AI agents are becoming native operators inside everyday apps instead of separate tools.

From External Tools to Built-In Enterprise AI Automation

Taken together, Qwen3.7-Plus and the WeChat agent point to a broader shift: computer-use AI agents are turning into embedded operators within productivity and commerce platforms. Instead of scripting a browser from the outside, enterprises will plug agents directly into IDEs, cloud consoles, messaging apps, and payment workflows. That makes multi-environment reach essential. Qwen3.7-Plus is designed to span browser sessions, desktop apps, and terminals in one loop, while WeChat’s approach shows how agents can coordinate mini programs with minimal user friction. For enterprise AI automation, this means richer, end‑to‑end workflows: an agent could update code, deploy to a cloud service, adjust resource settings in a web console, and confirm success inside a chat thread. The main questions now are governance and trust—where to draw permission boundaries so these agents feel like dependable colleagues instead of opaque background processes.

Milik earns a commission when you shop through our links, at no extra cost to you. Editorial content is independently selected by our team.

You May Also Like

Comments
Say something...
No comments yet. Be the first to share your thoughts!