What is Gemini Omni Flash? Google's new generative video model, launched at Google I/O 2026 on May 19. It is the first member of the Gemini Omni family, available today inside the Gemini app, Google Flow, YouTube Shorts and the YouTube Create App. It takes images, audio, video and text as input and produces video, with persistent character consistency, physics-aware motion, and natural-language editing where each instruction builds on the last.
Google waited until the I/O keynote on May 19 to make it official, but the pre-keynote leak community had already named the family correctly. The launch is narrower than the leak suggested: Omni Flash ships today, image and audio outputs are roadmap, the developer API is "coming weeks" with no pricing attached, and there are no head-to-head benchmarks against Veo 3.1, Sora 2 or Runway Gen-4.5. The strategic angle still matters: Google just put a conversational video editor in front of the Gemini app's paying base and YouTube's billion-plus Shorts surface in a single keynote slot. That is a distribution story, not just a model story.
What Google actually shipped on May 19
According to Google's official launch post by Koray Kavukcuoglu, CTO of Google DeepMind, Gemini Omni Flash is the first member of the Gemini Omni family — explicitly framed as a flash-tier model with the heavier siblings still implied but not named. The inputs that work today: images, audio, video and text. The output that works today: video. Google stated plainly that image and audio outputs will follow "in time" but are not in this release.
Audio input has an asterisk too. Google said only voice references are supported for audio at launch, with "other types of audio inputs" to roll out later. That is not nothing — voice references unlock character voice cloning use-cases for storyboarding and dubbing — but it is not the full audio conditioning pipeline some demos in the leak phase implied.
Where it lives on day one is the more interesting question, and it answers itself: the Gemini app for Google AI Plus, Pro and Ultra subscribers globally; Google Flow for all tiers globally; YouTube Shorts at no cost for users; and the YouTube Create App at no cost. Four surfaces, two paid, two free. The API for developers and enterprise customers is "coming in the coming weeks." Pricing was not communicated for any non-YouTube-Shorts surface.
The three differentiators that actually matter
Strip out the keynote choreography and Google's pitch reduces to three claims, all quoted from the launch post: "your characters stay consistent, the physics hold up and the scene remembers what came before." That sentence is the model card in spirit. Each phrase is a direct shot at a known failure mode of the current generation of video models.
Character consistency across edits
This has been the persistent failure mode of conversational video tools since Pika's first chat interface. Re-prompt the same character with new wardrobe or a new scene and the face drifts. Re-prompt with a new action and the height shifts. Runway Gen-4.5 has invested heavily in consistency via reference images, and Sora 2 ships its own consistency tooling, but neither has fully solved cross-edit identity. Google's claim is that Omni Flash holds identity across natural-language edits within a session — not just within a single render, but across multiple iterative instructions. If true, that is meaningful.
Physics-aware motion
Google specifically called out "an improved intuitive understanding of forces like gravity, kinetic energy and fluid dynamics." This is the failure mode where a poured drink falls sideways, a thrown object floats, or a person walks like the floor is ice. Sora 2 and Veo 3.1 have improved here against Sora 1 and Veo 3, but anyone running production tests knows physics drifts at clip boundaries. Omni Flash claims it is better; benchmarks will arrive when third parties test it, not on May 20.
Natural-language editing that "builds on the last"
This is the differentiator that breaks from Veo's text-to-video paradigm. In Google Flow, you can already chain renders, but each prompt is treated mostly independently. Omni Flash treats the conversation as state: instruction n+1 is interpreted in the context of what instruction n produced. "Now make him turn left" only works if the model remembers who "him" is and where the camera was. That is a different product shape from cinematic single-shot generation.
Why the leak mattered, and where it was wrong
As we covered when the leak surfaced on May 15, the pre-I/O Gemini Omni leak got the name right and the unified video play right. What it overshot was the scope on day one. Several leaked demos implied broad multimodal output — text-to-image, text-to-audio, video editing — flowing through a single Omni-branded surface in the Gemini app.
What Google actually shipped is the video output member of the family, branded Omni Flash. Image and audio outputs are explicitly future tense. That is a meaningful gap between leak hype and launch reality. It is also a clear signal of how Google is sequencing the family: video first (because that is where the I/O headline pressure was, and where YouTube distribution is a moat), image and audio later (because Nano Banana Pro and the Gemini app's existing audio tools already cover those surfaces).
The strategic read on the gap: Google did not want to ship a worse image model than Nano Banana Pro under a new brand. So they sequenced. Video is where the competitive heat is and where they can lead a keynote.
Competitive positioning — vs Sora 2, Runway Gen-4.5, Veo 3.1 and Kling 3 Omni
Google is shipping Omni Flash into a video-model market that already has four serious competitors. The question is not whether Omni Flash is better in absolute quality — that requires benchmarks that do not yet exist — but where it fits.
Versus Sora 2 (OpenAI): Sora 2 has been the cinematic single-shot leader since its release, with strong physics and long-shot coherence. Sora 2 lives inside ChatGPT for paid users and via API. Omni Flash's stated edge is iterative natural-language editing, not single-shot cinematic quality. Different jobs, different default tools, at least until benchmarks ship.
Versus Runway Gen-4.5: Runway has spent the past two years building the professional video editing pipeline around its models. Gen-4.5 ships with motion brush, references, multi-shot tools. Omni Flash arrives inside the Gemini app and YouTube — surfaces where Runway is not present. Runway's defensive moat is the professional editor; Omni Flash's offensive bet is everyone-with-a-Gemini-subscription.
Versus Veo 3.1 (and Veo 3.1 Fast, Veo 3.1 Lite): Veo is still the dedicated cinematic text-to-video product. Google did not announce Veo deprecation. The strategic reading is that Veo stays as the high-end generation engine and Omni Flash becomes the conversational, multimodal entry point inside Gemini and YouTube. Both can coexist. The longer-term question is whether developers consolidate on Omni once the API lands and Veo becomes the cinematic mode inside Omni.
Versus Kling 3 Omni: Naming collision aside (both are called "Omni"), Kling 3 Omni from Kuaishou has been the strongest open-distribution alternative outside the US frontier labs, particularly strong on character motion. It lacks YouTube-scale distribution. Omni Flash uses YouTube as its distribution flywheel, and Kling does not.
What it means for creators and developers
For YouTube Shorts creators: Omni Flash is the free generative video tool inside Shorts on May 20. That is the single largest creator surface in the world getting a frontier model wired in. The question is not whether creators will use it — they will — but how YouTube polices model-generated content at the recommender level. Google did not address that on stage.
For Gemini app subscribers: AI Plus, Pro and Ultra tiers all get access. If you are paying for any of those, you have it today. The Pro and Ultra positioning becomes more defensible because video generation is now a tier-included feature, not a separate purchase.
For developers and enterprises: the API is "coming weeks" with no public pricing and no API documentation page live as of May 20. That means if you operate production video pipelines today using Veo via Vertex AI, Sora via OpenAI's API, or Runway via Gen-4.5 API, your stack does not change this week. Pencil in a June or July 2026 evaluation window. Do not commit budget to Omni Flash before pricing lands and a real model card is published. The pattern with Google video models has been that the in-app experience ships first and the API ships once the model is stable enough to be cheap. That is fine — just plan around it.
For everyone evaluating the I/O announcement as a market signal: Google just bundled a frontier video model into the YouTube creator stack. That bundling is the I/O story. The model quality story is unwritten because benchmarks do not exist yet.
What would prove this bullish take wrong
Three signals would deflate the Omni Flash narrative over the next sixty days:
- The API slips past July. "Coming weeks" that becomes "coming quarter" would suggest the model is harder to productionize at scale than the demo suggested, and would let OpenAI and Runway ship counter-launches uncontested.
- Third-party benchmarks land and Omni Flash trails Sora 2 on physics. Google's claim of improved intuitive physics is the most measurable of the three differentiators. If independent tests put Sora 2 ahead, the keynote framing was marketing.
- Character consistency breaks on edge cases creators care about. If the model loses identity on dark scenes, profile views, or non-Western faces — the failure modes that haunted earlier models — the "characters stay consistent" claim collapses into a demo trick.
None of these are unlikely. All three would change how the market reads I/O 2026.
Bottom line
Gemini Omni Flash launched on May 19, 2026, as the video-output member of the Gemini Omni family. Today it lives in the Gemini app for paying subscribers, in Google Flow for all tiers, in YouTube Shorts and the YouTube Create App at no cost. The developer API is a promise for "coming weeks" with no pricing. The differentiators that matter, per Google's launch post, are character consistency across edits, physics-aware motion, and natural-language editing that builds on prior instructions. The leak was right on the name and the unified video play; it overshot on scope. The competitive battlefield against Sora 2, Veo 3.1, Runway Gen-4.5 and Kling 3 Omni gets decided once benchmarks and the API ship, not on May 20.
Source: Google's official launch post by Koray Kavukcuoglu, CTO of Google DeepMind, on the Google Keyword blog (May 19, 2026).
Frequently Asked Questions
What is Gemini Omni Flash?
Gemini Omni Flash is Google's new generative video model, launched at Google I/O on May 19, 2026, as the first member of the Gemini Omni family. It accepts images, audio, video and text as input and produces video as output, with persistent character consistency, physics-aware motion, and natural-language video editing where each instruction builds on the previous one. Image and audio outputs are planned but not available at launch.
Where can I use Gemini Omni Flash right now?
As of May 19, 2026, Gemini Omni Flash is available in four surfaces: the Gemini app (for Google AI Plus, Pro and Ultra subscribers globally), Google Flow (all tiers globally), YouTube Shorts (free for users), and the YouTube Create App. The model is not yet exposed to developers through the API.
When does the Gemini Omni Flash API launch?
Google said the API for developers and enterprise customers is coming "in the coming weeks" after the May 19, 2026 launch. No exact date and no pricing were communicated at I/O. If you build production video pipelines today, you should plan around a likely June or July 2026 window and not budget on it before official pricing lands.
Is Gemini Omni Flash the same as Veo 3.1?
No. Veo 3.1 is Google DeepMind's existing dedicated text-to-video model. Gemini Omni Flash is positioned differently: it is a multimodal model in the Gemini family that ingests images, audio, video and text, and treats video editing as a conversation where each instruction builds on the previous one. Veo focuses on cinematic generation; Omni Flash focuses on iterative, conversational editing with persistent characters and scene memory.
How does Gemini Omni Flash compare to Sora 2 and Runway Gen-4.5?
Sora 2 (OpenAI) and Runway Gen-4.5 both lead on cinematic single-shot generation and effects pipelines. Gemini Omni Flash's stated differentiators are character consistency across edits, physics-aware motion (gravity, kinetic energy, fluid dynamics), and natural-language editing that remembers prior instructions. Google has not published head-to-head benchmarks, so direct quality comparison is editorial speculation until third-party tests land.
Does Gemini Omni Flash work with audio?
Partially. Audio is a supported input modality, but Google said only voice references are supported for audio at launch on May 19, 2026, with "other types of audio inputs" to follow. Audio output (model-generated audio synced to video) is on the roadmap but not in the May 19 release.
How does Gemini Omni Flash handle character consistency?
In Google's launch post, Koray Kavukcuoglu wrote that "your characters stay consistent, the physics hold up and the scene remembers what came before." Practically, this means Omni Flash maintains the same character identity across multiple natural-language edits within a session — a persistent failure mode of earlier video models like Pika and earlier Kling versions, where re-prompting often drifted the character's face or outfit.
What did Google announce vs. what did it actually ship on May 19?
Google shipped: Omni Flash in the Gemini app (AI Plus / Pro / Ultra), Google Flow (all tiers), YouTube Shorts (free), and YouTube Create App. Google announced but did not ship: API access for developers and enterprise ("coming weeks"), audio inputs beyond voice references, image and audio output modalities, and pricing for non-Shorts users.
Is Gemini Omni Flash free to use?
It is free inside YouTube Shorts and the YouTube Create App. In the Gemini app, access is gated to Google AI Plus, Pro and Ultra subscribers globally. In Google Flow, all tiers globally get access. API pricing for developers and enterprise has not been disclosed as of May 20, 2026.
Did the pre-I/O Gemini Omni leak match what Google actually launched?
Mostly yes on the name and the unified video play, but the launch was narrower than the leak community expected. The leaked Gemini app demos suggested broad multimodal generation; the official launch ships only the video-output member (Omni Flash) of the family. Image and audio outputs are still future tense. We covered the leak in detail when it surfaced on May 15.
Does Gemini Omni Flash replace Veo for Google's video roadmap?
Google did not say Veo is being deprecated. Veo 3.1, Veo 3.1 Fast and Veo 3.1 Lite remain published products as of May 20, 2026. The strategic read is that Veo continues to anchor high-end cinematic generation while Omni Flash becomes the conversational, multimodal entry point inside the Gemini app and YouTube. Both can coexist; the heavier question is whether developers will eventually consolidate on Omni once the API lands.
What's the catch with Gemini Omni Flash at launch?
Three real constraints: no API access yet (so you cannot build production workflows on it on May 20), no published benchmarks against Sora 2, Veo 3.1 or Runway Gen-4.5, and no public pricing for non-Shorts surfaces. The product is shippable in the Gemini app and YouTube, but the developer and enterprise story is still a promise. Wait for the API drop before re-architecting any pipeline.



