From Turn-Based Chatbots to Continuous Real-Time AI Interaction
Thinking Machines has introduced a new class of interaction models designed to move beyond traditional turn-based chatbots toward continuous, real-time AI interaction. Instead of waiting for users to finish typing or speaking, these multimodal AI models process audio, video, and text inputs at the same time, supporting natural overlaps in conversation. The system is built to collaborate natively across modalities, enabling the AI to listen while it talks, track what it sees on video, and manage live dialog flow without external orchestration layers. This shift addresses what the company describes as a bandwidth bottleneck between humans and AI, where slow, sequential exchanges limit how much context, intent, and nuance can be shared. By rethinking interaction as a continuous stream rather than discrete turns, Thinking Machines aims to make voice AI models feel less like tools that answer queries and more like partners that can actively participate in ongoing work.
Inside the Multi-Stream Architecture Delivering Sub-Second Latency
At the core of Thinking Machines’ platform is a multi-stream, micro-turn architecture that processes interaction in roughly 200-millisecond slices. One time-aware interaction model manages live conversation, handling speech, timing, and dialog management, while an asynchronous background model tackles heavier reasoning and tool calls. Results from the background process are then woven back into the ongoing exchange, allowing the system to think and talk at the same time. The flagship TML-Interaction-Small model delivers an average response latency of about 0.4 seconds, a low-latency AI response that supports human-like conversational rhythm. Early benchmarks cited by the company indicate that this model not only reacts quickly but also outperforms comparable systems in intelligence and interaction quality. Together, the architecture and latency gains underline the company’s ambition to set a new bar for real-time AI interaction in environments where every fraction of a second matters.
Mira Murati’s Pivot From OpenAI Drama to Technical Execution
The launch of these interaction models also marks a strategic moment for Thinking Machines and its founder, former OpenAI CTO Mira Murati. After helping lead the development of ChatGPT and briefly serving as interim CEO during OpenAI’s leadership turmoil, Murati left in 2024 and founded Thinking Machines Lab in early 2025. The startup has since attracted intense investor interest and even acquisition overtures, as well as fierce competition for talent. Reports that a major tech company tried to buy the firm and subsequently hired away several founding members underscore the stakes around its technology. Murati’s response—bringing in figures like PyTorch creator Soumith Chintala—signals a focus on deep technical capability. With this research preview, the narrative around the company is shifting from executive departures to concrete multimodal AI models that promise to redefine how people collaborate with machines.
Enterprise Use Cases for Always-On, Multimodal Voice AI Models
For enterprises, the appeal of Thinking Machines’ approach lies in natural, low-latency AI response across multiple channels at once. The demos highlight practical scenarios: an AI that counts exercise reps from video while talking, translates speech in real time, or notices posture changes during a meeting, all without pausing the conversation. Such capabilities could reshape customer support, where an assistant simultaneously listens to a caller, reads on-screen data, and reacts to visual cues; or operations, where an AI monitors live camera feeds while coordinating tasks with human staff. By treating audio, video, and text as a unified interaction stream, these models aim to reduce friction between human workflows and AI tools. Although access is currently limited to research partners, the roadmap toward broader availability positions Thinking Machines as a serious contender for enterprises seeking real-time AI interaction that feels closer to collaborating with a colleague than querying a system.
