Mira Murati's First Product: AI That Listens While Talking

Thinking Machines Lab, the AI company founded by former OpenAI CTO Mira Murati, announced its first public product on May 11, 2026: Interaction Models, a new model class that processes input and generates a response at the same time. The flagship, TML-Interaction-Small, replies in 0.40 seconds — roughly human conversational speed, and what the lab says is significantly faster than comparable models from OpenAI and Google. An Interaction Model can interrupt and be interrupted mid-sentence, making a session behave like a phone call rather than a text thread. The product enters a limited research preview in the coming months, with a wider release later in 2026.

Disclosure: ThePlanetTools.ai has no affiliate, commercial, or referral relationship with Thinking Machines Lab, OpenAI, Google, ElevenLabs, or Cartesia. Nothing on this page is sponsored. This is an independent strategic read, and the framing below is my opinion, scoped to what I can verify from the announcement as of May 16, 2026.

What Thinking Machines Actually Announced

On May 11, 2026, Thinking Machines Lab published the first concrete product from a company that, until now, had been better known for its founder and its funding than for anything it shipped. Mira Murati left OpenAI as its chief technology officer in 2024. Thinking Machines Lab was formed the following year and raised a reported $2 billion seed round — one of the largest seed rounds in the history of the technology industry, raised before a single public product existed. In my read, that gap between capital and output is the single most important piece of context for understanding why this specific launch matters and why it looks the way it does.

The announcement introduces a category the lab calls Interaction Models, with a first model named TML-Interaction-Small. The core claim is mechanical, not vibes-based: today's conversational AI runs a turn-based loop. You talk, it listens. It finishes listening, then it thinks, then it responds, then you listen. Interaction Models collapse that loop. The model processes the incoming stream and generates its own output stream simultaneously — what the field calls full duplex. The practical consequence is that the model can be interrupted mid-answer and can itself interrupt, the way two people on a phone call talk over each other, back-channel with "mhm," and course-correct before a sentence finishes.

The headline number is latency. TML-Interaction-Small responds in 0.40 seconds. That figure sits inside the window humans use in natural conversation, and the lab states it is meaningfully faster than comparable models from OpenAI and Google. The announcement does not publish competitor latency figures, name the specific rival models, disclose pricing, or document an API. Availability is staged: a limited research preview in the next few months, then a wider release later in 2026.

The One-Sentence Version

Thinking Machines Lab's first product is a speed-and-architecture bet: an AI you can talk over, that talks back at human latency, shipped as a research preview rather than a finished API. Everything strategic about this launch follows from those three facts.

What Is Confirmed Versus What Is Framing

I want to be precise about the evidence boundary, because the gap between confirmed and inferred is where most coverage of well-funded labs goes wrong. Confirmed: the product names, the 0.40-second latency, the full-duplex mechanic, the interrupt/be-interrupted behavior, the phone-call analogy, the staged preview timeline, and the relative-speed claim against OpenAI and Google. Framing, including mine: what this means for the voice AI market, whether 0.40 seconds is the right battleground, and whether a research preview is a confident move or a hedged one. I will flag where I cross that line.

Why "Listening While Talking" Is an Architecture Story, Not a Feature

It is tempting to file this under "faster voice assistant." That undersells what is being claimed. The turn-based loop is not an implementation detail of current systems — it is baked into how most of them are built. A transcription stage waits for an endpoint, a language model generates a full response, and a synthesis stage speaks it. Each stage is sequential, and latency is the sum of all of them plus the silence the system waits through to decide you are done talking.

An Interaction Model, as described, removes the sequence. Generation begins while input is still arriving, which is why interruption stops being an awkward edge case and becomes a native behavior. In my read, this is the single most defensible part of the announcement, because it is a structural claim that is either true or false rather than a marketing adjective. If the architecture genuinely runs input and output concurrently, the interruption behavior is a consequence, not a bolt-on. That is a cleaner story than most launches in this space tell.

The Pipeline Tax Most Voice Stacks Pay

Anyone who has wired together a production voice agent knows the pipeline tax. You chain a speech-to-text provider, a language model, and a text-to-speech provider, and you spend weeks shaving milliseconds off endpointing, streaming, and turn detection because the perceived "snappiness" of the agent is dominated by glue, not by the model's raw intelligence. Tools such as Cartesia and ElevenLabs have built real businesses partly by making one slice of that pipeline — synthesis — fast and natural. Interaction Models, if they deliver, attack the existence of the pipeline itself rather than one stage of it.

Why Interruption Is the Real Test

Latency gets the headline, but interruption is the harder problem and, in my view, the better proof point. Humans interrupt constantly: to confirm, to redirect, to stop a wrong answer before it wastes ten seconds. A system that handles being interrupted gracefully — stopping cleanly, retaining what it had committed to, and incorporating the correction — feels categorically different from one that has to be talked at in complete turns. The phone-call analogy in the announcement is doing real work here. A phone call is not a fast text chain. It is a different interaction model, which is presumably why the lab named the category after the interaction rather than after the speed.

The 0.40-Second Number, Read Carefully

0.40 seconds is a strong figure. In natural human conversation, the gap between speakers is frequently around 200 milliseconds, and once response latency creeps past roughly half a second, a voice exchange starts to feel like a walkie-talkie instead of a conversation. A model that lands at 0.40 seconds is in the zone where the interaction can feel alive rather than transactional. So the number is well chosen as a target, and as a thing to put in a launch.

Here is where I scope the claim. The announcement does not publish the measurement methodology, the input length, the audio conditions, or the competitor models it is faster than. "Significantly faster than comparable models from OpenAI and Google" is a relative claim without a benchmark table behind it in the public material. None of that makes the figure wrong — it makes it a research-preview figure rather than a settled benchmark. In my read, treating it as the former is the honest posture until the preview is in independent hands.

What Would Make 0.40 Seconds Real

The number becomes durable when it survives three things: real network conditions instead of a lab loopback, conversational input instead of a single clean clipped utterance, and the interruption case, where latency is hardest because the model must abandon a partial generation and re-plan. If TML-Interaction-Small holds near 0.40 seconds through an interruption on a normal connection, that is a genuinely differentiated result. If the number is a best-case figure on idealized input, it is still good — just not yet the category-defining claim the framing implies.

Latency Is Necessary, Not Sufficient

I have watched voice AI launches lead with a latency number for two years now. Latency is necessary and it is not sufficient. The fastest model that interrupts badly, loses context on barge-in, or speaks over the user at the wrong moment will feel worse than a slightly slower model with excellent turn behavior. The lab seems to understand this, which is why the category is named for interaction rather than speed. The risk is that the market reads "0.40 seconds" and stops there.

How This Compares to OpenAI's Realtime Approach and Google

The obvious comparison is OpenAI's realtime, speech-to-speech direction, which collapses the transcription and synthesis stages into a single model and is the closest public analog to what Thinking Machines is describing. Our breakdown of gpt-realtime covers that lineage in depth. Google has pushed in a similar direction with low-latency multimodal live models. The shared idea across all three is the same: kill the pipeline, get the model close to the audio, and chase human latency.

Where Thinking Machines is positioning differently, in my read, is rhetorical and architectural at once. By naming the category Interaction Models and foregrounding interruptibility rather than raw speed-to-first-token, the lab is implicitly arguing that the incumbents optimized the wrong primary metric. OpenAI's realtime story is "fast speech-to-speech." Thinking Machines' story is "a model built around the structure of a conversation, where speed is one property among several." That is a positioning claim, and it is a defensible one. It is also, for now, only a claim — the incumbents ship APIs at scale today, and a research preview does not.

The Substitution Question

For a team building a voice agent right now, the practical question is substitution. gpt-realtime is available, documented, billable, and in production. Cartesia and ElevenLabs are available, with mature SDKs and the voice quality businesses already ship on — our ElevenLabs vs Cartesia comparison walks through where each wins. TML-Interaction-Small is, today, a preview you cannot buy. So the comparison is not "which do I deploy." It is "which architecture is the market converging on," and on that question Thinking Machines has planted a flag rather than won a fight.

The Reasoning Model in the Room

One adjacent point worth scoping. Conversational latency and reasoning depth pull against each other. A model that answers in 0.40 seconds is, by construction, not doing extended deliberation in that window. For agentic or tool-using workloads, teams routinely reach for a slower, more deliberate model — the kind of capability our coverage of Claude gets into. The likely real-world pattern is a fast interaction layer in front of a slower reasoning layer, not one model doing both. The announcement does not address this, and in my read it is the most interesting unanswered design question.

The Strategic Read: Why This Product, Why Now

This is the section where I am explicitly giving an opinion, scoped to my reading of the public facts. Treat it as analysis, not reporting.

A lab that raised roughly $2 billion in seed funding before shipping anything carries a specific kind of pressure: the first public product has to justify the narrative. In my read, Interaction Models is a well-chosen first move precisely because it is a category claim rather than a feature. Shipping "a faster voice API" would have invited a direct, losing benchmark fight against incumbents who already operate at scale. Shipping "a new model class defined by how it interacts" reframes the conversation onto ground the lab gets to define. That is sharp positioning, and I think it is intentional.

The Research-Preview Decision

The most strategically interesting choice is the staged rollout: research preview first, wider release later in 2026, no public API or pricing at announcement. There are two readings, and I will name both rather than pick the flattering one. The confident reading: the lab is releasing on its own timeline, controlling the narrative, and avoiding a premature scale fight. The cautious reading: a research preview is also what you ship when the architecture works in controlled conditions but is not yet ready to take arbitrary production traffic. The public facts are consistent with either. In my read, the truth is probably a deliberate blend — the lab gets narrative control and buys time, and both are rational given the funding pressure.

The Timing Against the Market

The timing is not random. Voice AI has been one of the loudest investment stories of 2026 — capital is moving fast into companies that own a conversational surface, as our coverage of Vapi's rise to a $500M valuation documents. Launching a category claim into a market that is actively repricing voice AI is good timing for narrative, regardless of whether the product is production-ready. In my read, the announcement is partly a product and partly a position-staking exercise, and there is nothing wrong with that — it is what a well-funded lab with a strong founder should do. It just means readers should weigh the framing accordingly.

The Murati Premium and Its Risk

Mira Murati's track record as OpenAI's CTO is the reason this announcement gets covered at the volume it does, and the reason the $2 billion existed before the product. That is an asset. It is also a liability of expectation: a founder premium raises the bar for what counts as a successful first ship. In my read, Interaction Models clears the bar for "interesting and credible" and has not yet had the chance to clear the bar for "proven at scale." Those are different bars, and conflating them is the mistake I would most expect the broader coverage to make.

The $2 Billion Context You Cannot Ignore

I want to isolate one fact because the entire launch reads differently once you hold it in view: the roughly $2 billion seed round was raised before any public product existed. That is not a normal startup posture. It is a bet on a founder and a thesis, made at a scale that compresses the timeline between "interesting research" and "must justify the valuation." In my read, every visible choice in this announcement — the category name, the latency headline, the staged preview — is consistent with a lab that knows its first ship is being graded against an unusually high prior.

That context cuts both ways, and I want to be fair about it. The upside: enormous capital buys the freedom to define a category instead of competing on someone else's benchmark, and Interaction Models is exactly the kind of move that freedom enables. The downside: the same capital creates expectation gravity, where "credible and interesting" gets read by the market as "proven," and the lab has an incentive not to correct that reading too loudly. None of this is a criticism of the work — it is strategic context that any honest read of the announcement has to price in.

What This Means for Builders Right Now

If you are shipping a voice product this quarter, nothing about your stack changes today. TML-Interaction-Small is not purchasable. The actionable takeaway is directional: the market is converging on full-duplex, interruptible, human-latency interaction as the target architecture, and Thinking Machines has now put a strong, well-articulated stake in that ground alongside OpenAI and Google.

The practical move is to design for that future without betting on an unreleased model. Keep your voice layer modular so the interaction model can be swapped when the category matures. Treat interruption handling as a first-class requirement in your own UX now, because the bar for "feels like a conversation" is being reset upward whether or not you adopt any specific vendor. And watch the preview: the moment to re-evaluate is when independent hands hold TML-Interaction-Small and the 0.40-second number is tested on real connections with real interruptions.

For Teams Already on a Voice Stack

If you are on ElevenLabs, Cartesia, or gpt-realtime, this announcement is a reason to audit your turn-taking and barge-in behavior, not a reason to migrate. The incumbents are not standing still, and the realistic outcome is that interruptible, low-latency interaction becomes table stakes across all of them within a year. The differentiator then moves up the stack — to reasoning, memory, and reliability — which is exactly where a model like Claude earns its place in a layered design.

What Would Prove Me Wrong

I am scoping my strategic read deliberately, so here is the falsification list — the specific outcomes that would show my interpretation was wrong:

If the preview ships fast and open. If Thinking Machines moves from research preview to a documented, billable API within a quarter, my "research preview is partly a hedge" reading is wrong, and the confident-timeline interpretation wins.
If 0.40 seconds holds under independent testing on real networks, with conversational input, through interruptions — then the relative-speed claim is not just framing, and the architecture story is as strong as the announcement implies.
If incumbents respond defensively. If OpenAI or Google rapidly reframe their realtime offerings around interruptibility rather than speed-to-first-token, that would confirm Thinking Machines successfully redefined the metric — a stronger outcome than I am currently crediting.
If builders adopt at preview. If serious teams build production prototypes on the preview despite no pricing or SLA, that signals the architecture is differentiated enough to overcome the maturity gap I am emphasizing.

Until those resolve, my position is narrow: the architecture claim is the credible core, the latency number is a strong research-preview figure rather than a settled benchmark, and the category framing is sharp positioning by a lab under funding pressure to make a first product land. All three can be true at once.

The Bottom Line

Thinking Machines Lab's first public product is a confident category claim wrapped in a cautious release. Interaction Models reframe voice AI from "fast turn-taking" to "a model built around the structure of a conversation," and TML-Interaction-Small's 0.40-second, interruptible behavior is a credible expression of that idea. The architecture story is the strongest part and is testable. The latency number is strong but is a research-preview figure, not a benchmark. The staged rollout is rational under the funding pressure a $2 billion pre-product seed creates, and it buys narrative control and time at once.

In my read, this is exactly the right first move for this lab: stake out the category, avoid a premature benchmark war, and let the preview do the proving. Whether it becomes the default architecture of conversational AI depends on the next two quarters — on whether 0.40 seconds survives contact with real users, and on whether the preview becomes a product. Both questions are answerable, and both are the right ones to watch.

Disclosure (repeated): ThePlanetTools.ai has no affiliate, commercial, or referral relationship with Thinking Machines Lab, OpenAI, Google, ElevenLabs, or Cartesia. This page is an independent, unsponsored strategic analysis. The strategic interpretation is my opinion, scoped to publicly verifiable facts as of May 16, 2026; the falsification list above states exactly what would change it.

Interaction Models concept — full duplex listening while talking — Interaction Models process input and generate output simultaneously — a phone call, not a text thread

TML-Interaction-Small 0.40 second response latency architecture — TML-Interaction-Small responds in 0.40 seconds — inside the human conversational window

Thinking Machines vs OpenAI Realtime and Google low-latency voice — Thinking Machines positions on interruptibility; OpenAI and Google ship realtime APIs at scale today

Mira Murati Thinking Machines Lab $2B seed strategic timing — A $2B pre-product seed creates the pressure that shaped this first launch

Frequently Asked Questions

What is an Interaction Model from Thinking Machines Lab?

An Interaction Model is a new model class announced by Thinking Machines Lab on May 11, 2026 that processes incoming input and generates its response at the same time, instead of using a turn-based listen-then-respond loop. Because generation runs concurrently with input — what the field calls full duplex — the model can be interrupted mid-answer and can interrupt, making a session behave like a phone call rather than a text thread. The first model in the class is TML-Interaction-Small.

How fast is TML-Interaction-Small?

TML-Interaction-Small responds in 0.40 seconds, which sits inside the window humans use in natural conversation. Thinking Machines Lab states this is significantly faster than comparable models from OpenAI and Google. The announcement does not publish a benchmark methodology, competitor latency figures, or the specific rival models, so the figure should be read as a strong research-preview number rather than an independently settled benchmark as of May 16, 2026.

Who founded Thinking Machines Lab and what is the $2 billion seed?

Thinking Machines Lab was founded by Mira Murati, the former chief technology officer of OpenAI, who left OpenAI in 2024. The company was formed in 2025 and raised a reported $2 billion seed round — one of the largest seed rounds in technology history — before shipping any public product. Interaction Models, announced May 11, 2026, is its first public product.

How is this different from OpenAI's Realtime API?

OpenAI's realtime, speech-to-speech direction is the closest public analog: it collapses transcription and synthesis into a single model to chase low latency. Thinking Machines positions differently by naming the category Interaction Models and foregrounding interruptibility rather than raw speed-to-first-token, implicitly arguing incumbents optimized the wrong primary metric. The key practical difference today is maturity: OpenAI's realtime offering ships as a documented, billable API at scale, while TML-Interaction-Small is a research preview you cannot yet buy.

Can I use TML-Interaction-Small today?

No. As of the May 11, 2026 announcement, TML-Interaction-Small is not publicly available. It enters a limited research preview in the coming months, with a wider release planned for later in 2026. There is no public API, pricing, or SLA disclosed. Teams shipping voice products this quarter should treat it as directional, not deployable.

Why does "listening while talking" matter for voice AI?

Most voice stacks chain speech-to-text, a language model, and text-to-speech sequentially, and perceived latency is dominated by that pipeline plus the silence the system waits through to detect you have stopped talking. An Interaction Model removes the sequence by generating while input arrives, which makes interruption a native behavior instead of an awkward edge case. It attacks the existence of the pipeline rather than optimizing one stage of it.

Is 0.40 seconds actually fast for a conversation?

Yes, as a target. In natural human conversation the gap between speakers is often around 200 milliseconds, and once response latency passes roughly half a second a voice exchange starts to feel like a walkie-talkie. A model landing at 0.40 seconds is in the zone where interaction can feel alive. The caveat is that the announced figure lacks a published methodology, so its durability depends on holding near 0.40 seconds on real networks, with conversational input, and through interruptions.

How does this affect ElevenLabs and Cartesia?

ElevenLabs and Cartesia built real businesses partly by making the synthesis slice of the voice pipeline fast and natural, and both ship mature SDKs in production today. Interaction Models target the existence of the pipeline rather than one stage, but TML-Interaction-Small is not purchasable, so there is no migration pressure now. The realistic outcome is that interruptible, low-latency interaction becomes table stakes across all voice vendors within a year, pushing differentiation up the stack toward reasoning and reliability. See our ElevenLabs vs Cartesia comparison for where each currently wins.

Does a fast interaction model replace a reasoning model like Claude?

No, and the two pull against each other. A model answering in 0.40 seconds is not doing extended deliberation in that window. For agentic or tool-using workloads teams reach for a slower, more deliberate model such as Claude. The likely real-world pattern is a fast interaction layer in front of a slower reasoning layer, not one model doing both. The announcement does not address this layering, which is the most interesting unanswered design question.

Why is Thinking Machines launching a research preview instead of an API?

There are two defensible readings. The confident one: the lab is releasing on its own timeline, controlling the narrative, and avoiding a premature benchmark war against incumbents who already operate at scale. The cautious one: a research preview is also what you ship when an architecture works in controlled conditions but is not yet ready for arbitrary production traffic. The public facts are consistent with both, and the likely truth is a deliberate blend given the funding pressure a $2 billion pre-product seed creates.

What is the single biggest thing to watch next?

Whether the preview becomes a real product and whether 0.40 seconds survives independent testing on real network conditions with conversational input and interruptions. The moment to re-evaluate is when independent hands hold TML-Interaction-Small. If incumbents like OpenAI or Google rapidly reframe their realtime offerings around interruptibility rather than speed-to-first-token, that would confirm Thinking Machines successfully redefined the metric.

Mira Murati's First Product Is Here: Thinking Machines Lab Unveils Interaction Models — AI That Listens While It Talks (May 2026)