Skip to content
analysis12 min read

ByteDance's June Multimodal Blitz: One Day, Four Frontier Models

At its FORCE conference on June 23, 2026, ByteDance shipped four AI models in one day: Doubao Seed 2.1, Seedance 2.5 video, Seedream 5.0 Pro, and Doubao Audio 1.0. Here is what is real, what is beta, and what to watch.

Author
Anthony M.
12 min readVerified June 26, 2026Tested hands-on
ByteDance June multimodal blitz — Doubao Seed 2.1, Seedance 2.5, Seedream 5.0 Pro and Doubao Audio 1.0 at Volcano Engine FORCE 2026
ByteDance used its FORCE conference on June 23, 2026 to ship four AI models the same day, spanning text, video, image, and audio.

On June 23, 2026, ByteDance announced four frontier AI models in a single day at its Volcano Engine FORCE conference in Beijing: Doubao Seed 2.1 (a large language model in Pro and Turbo tiers), Seedance 2.5 (video), Seedream 5.0 Pro (image), and Doubao Audio 1.0 (audio generation). Only the Doubao Seed 2.1 language model is available through the Volcano Engine API today, priced at 6 yuan per million input tokens and 30 yuan per million output tokens (roughly $0.88 and $4.42). Seedance 2.5 is in enterprise beta with a public launch targeted for early July 2026, Seedream 5.0 Pro is "coming soon," and Doubao Audio 1.0 is invite-only. ByteDance claims Doubao Seed 2.1 is competitive with GPT-5.5, Claude Opus 4.x, and Gemini 3.1 Pro, but those are vendor claims with no independent benchmarks yet.

That is the rare kind of product day that resets how you think about a company. Most labs ship one frontier model and build a quarter of marketing around it. ByteDance shipped a model for every major modality at once — and wrapped the whole thing in a single self-reported number that, if accurate, says more about its position than any benchmark: 180 trillion tokens processed per day across the Volcano Engine platform, up more than tenfold year over year.

The catch — and it is the catch that matters for anyone planning to actually use these models — is that "announced" and "shipped" are not the same thing here. Three of the four models are not generally available. Below, we separate what you can build on today from what is a roadmap promise, decode the GPT-5.5 comparison ByteDance is leaning on, and untangle a resolution claim that several Western outlets got wrong.

What ByteDance Announced at FORCE

FORCE (火山引擎原动力大会, "Volcano Engine Primal Power Conference") is ByteDance's developer and enterprise event for Volcano Engine, the company's cloud and model-as-a-service arm. The summer 2026 edition, held in Beijing on June 23, was used to unveil a full multimodal stack in one keynote. Here is the lineup and, critically, the availability status of each.

The four models at a glance

ModelModalityAvailability (as of launch)Headline claim
Doubao Seed 2.1 (Pro / Turbo)Text / LLMLive now — Volcano Engine API"Coding and Agent era" model; vendor-claimed parity with GPT-5.5, Opus 4.x, Gemini 3.1 Pro
Seedance 2.5VideoEnterprise beta; public GA targeted early July 2026Native 30-second clips, up to 50 multimodal references
Seedream 5.0 ProImage"Coming soon" — not yet GAInteractive editing, multi-layer separation, native multilingual text
Doubao Audio 1.0AudioInvite-only betaUp to ~2 minutes, multi-character dialogue, music and ambient sound

If you take one thing from that table, take this: only one of the four is something a developer can call today. The video, image, and audio models were demonstrated and detailed, but they are not in general availability. That distinction shapes everything that follows, because the most interesting models in the announcement — the media generators — are also the ones you cannot use yet.

Doubao Seed 2.1: The Only Model You Can Use Today

Doubao Seed 2.1 is the flagship of the announcement and the only model shipped to general availability. ByteDance positions it explicitly as a model "designed for the Coding and Agent era," and split it into two tiers. Doubao Seed 2.1 Pro is the "deep thinking" flagship aimed at complex coding, long-horizon agent tasks, and multimodal understanding. Doubao Seed 2.1 Turbo is the lower-latency, lower-cost variant that ByteDance says delivers performance "comparable to Pro" at roughly half the price for large-scale production.

Pricing: aggressive, and clearly the point

ByteDance's pricing is where the strategy is least ambiguous. Doubao Seed 2.1 Pro is priced at 6 yuan per million input tokens and 30 yuan per million output tokens, with cached input as low as 1.2 yuan. At roughly 6.79 yuan to the dollar on launch day, that converts to about $0.88 per million input tokens and $4.42 per million output tokens (our conversion — ByteDance did not publish a USD price). Turbo lands at roughly half those figures.

TierInput (per M tokens)Output (per M tokens)Cached input
Doubao Seed 2.1 Pro¥6 (~$0.88)¥30 (~$4.42)¥1.2 (~$0.18)
Doubao Seed 2.1 Turbo~¥3 (~$0.44, est.)~¥15 (~$2.21, est.)

For context, that output price is a fraction of Western frontier pricing. ByteDance went further and claimed that, in coding and agent workloads with caching, the blended cost can fall to around 1.96 yuan per million tokens, and that overall usage cost is roughly 80 percent below Claude Opus 4.x. Treat the 80 percent figure as a vendor claim that depends heavily on cache-hit assumptions — but the headline list price is real and cross-confirmed, and it is low.

The GPT-5.5 comparison: read the fine print

ByteDance's most quotable claim is that Doubao Seed 2.1 is competitive with the Western frontier. Depending on which slide and which outlet you read, the named targets include GPT-5.5, Claude Opus 4.x, and Gemini 3.1 Pro. Specific benchmarks cited by ByteDance include matching Opus on Terminal Bench, approaching GPT-5.5 on a SWE-style coding benchmark, and topping OpenAI's own GDPval agent evaluation.

Here is the part that matters: every one of those numbers is a first-party claim. The model launched the same day, so no independent third-party benchmark exists yet. Chinese coverage was not even internally consistent on the comparison — some reports cited Opus 4.6, others Opus 4.7 — which is exactly the kind of detail that gets fuzzy when a vendor is doing the measuring. We have seen this movie before with Chinese frontier launches: the in-house numbers look spectacular on day one, and the real picture only emerges once labs like the SWE-bench maintainers, Artificial Analysis, or independent researchers run the model themselves. Until that happens, "competitive with GPT-5.5" is a marketing position, not a measured fact.

Doubao Seed 2.1 pricing versus Western frontier models — 6 yuan input and 30 yuan output per million tokens
Doubao Seed 2.1 Pro lists at 6 yuan input and 30 yuan output per million tokens, a fraction of Western frontier pricing — the clearest signal of ByteDance's strategy.

The number behind the number: 180 trillion tokens a day

The statistic ByteDance most wanted to leave in the room was not a benchmark at all. It was distribution. Volcano Engine reported that Doubao models now process more than 180 trillion tokens per day across the platform's enterprise customers — a figure the company says has grown more than tenfold in the past year. ByteDance also cited IDC data putting it at a 49.5 percent share of China's public-cloud model-as-a-service market, ranking first domestically.

The 180-trillion-token throughput is self-reported by ByteDance and not independently audited, so weigh it accordingly; the 49.5 percent market-share figure is IDC's, not ByteDance's own count. But even discounted, they explain why ByteDance can price Doubao Seed 2.1 the way it does. When you are already moving that much inference volume, you are optimizing for share and lock-in, not per-token margin. Doubao is also the most-used consumer AI app in China, with a user base reported in the range of 330 to 345 million monthly actives earlier in 2026 — though notably, the app lost around six million monthly users in May after introducing paid subscription tiers, a reminder that even dominant Chinese AI products hit the same monetization friction everyone else does.

Seedance 2.5: The Headline Act That Has Not Shipped

Seedance 2.5 is the model that generated the most excitement and, predictably, the most confusion. It is ByteDance's next-generation video model, and on paper it pushes two genuinely notable boundaries. But it is not generally available — and one of its most-repeated specs is being attributed to the wrong model.

What is confirmed

Two of Seedance 2.5's headline gains are clear and cross-confirmed in ByteDance's own materials:

  • Native 30-second clips. Seedance 2.5 generates a single 30-second video in one pass, doubled from the prior generation's 15 seconds, with scene changes and tempo shifts handled inside the model rather than by stitching shorter clips. ByteDance billed this as a world-first for single-shot native duration.
  • Up to 50 multimodal references. The model accepts as many as 50 reference inputs — images, video, audio, 3D white models, and style references — up from 12 in the previous version. That is a substantial jump in how much creative direction you can feed a single generation.
  • New editing controls. ByteDance added 3D white-model import, local replacement, and 3D preview, pushing Seedance toward a more controllable, production-oriented workflow rather than a pure prompt-to-clip toy.

The 4K trap: that spec belongs to Seedance 2.0

Several Western outlets reported that Seedance 2.5 generates "native 4K." That is a misattribution worth correcting carefully, because it cuts to how these announcements get garbled in translation. At the same FORCE keynote, ByteDance gave its existing model — Seedance 2.0 — a native 4K, 10-bit upgrade. The primary Chinese coverage is explicit: the 4K capability is listed under Seedance 2.0, while Seedance 2.5's three headline upgrades are duration, references, and editing. ByteDance did not publish a confirmed native-resolution figure for Seedance 2.5 itself.

This matters more than a pedantic spec note, because ByteDance has a track record of resolution headlines that did not match deliverables — an earlier Seedance generation was marketed around a "2K" figure that, in practice, often delivered closer to 720p. So when a 4K claim floats around a model that has not shipped and whose own maker did not state a resolution for it, the responsible read is: the 4K upgrade is real, but it is Seedance 2.0's, and Seedance 2.5's effective output resolution is something to verify at general availability, not assume now.

Seedance 2.5 multimodal video generation — native 30 second clips and up to 50 reference inputs converging into one timeline
Seedance 2.5's confirmed gains: native 30-second single-shot clips and up to 50 multimodal reference inputs, up from 12.

The audio question

You will also see claims that Seedance 2.5 co-generates synchronized audio with the video. Be careful here too. Western outlets described audio being processed "in the same latent space" as the visuals for native sound sync, but the primary Chinese coverage attributes audio to a separate new model — Doubao Audio 1.0 — and does not list co-generated audio among Seedance 2.5's features. The likely source of the confusion is that both were announced the same day. Until ByteDance's official Seedance 2.5 documentation is live, treat native audio-in-video as unconfirmed rather than a shipped feature.

Where it would land

Assuming the early-July general availability holds, Seedance 2.5 would slot into an unusually crowded video field. Its natural comparison set is Google Veo 3.1, OpenAI's Sora 2, Kling 3.0 Omni, Runway Gen-4.5, and MiniMax Hailuo 2.3. ByteDance's pitch leans on two structural advantages: the 30-second native duration and the 50-input reference ceiling, which the company contrasted against Veo's much smaller reference capacity. It also claimed roughly 20 percent better prompt adherence — again, a vendor number with no independent test behind it. The more durable advantage is distribution: ByteDance can push Seedance straight into CapCut, the video editor with a reported 400 million monthly active users, plus its Jimeng (internationally, Dreamina) creation app. That generation-to-editing-to-distribution pipeline is something none of its Western rivals can match at the same scale.

Seedream 5.0 Pro and Doubao Audio 1.0

The other two models rounded out the multimodal sweep, though both are even further from your hands than Seedance.

Seedream 5.0 Pro is the new "Pro" tier of ByteDance's image model, sitting above the Seedream 5.0 and 5.0 Lite versions that shipped earlier in 2026. ByteDance highlighted four upgrades: interactive precise editing, multi-layer separation, high-density information expression, and native multilingual text generation — the last of which directly targets a weakness most image models still have with rendering accurate text. Release status is "coming soon," with no resolution, pricing, or benchmark figures confirmed for the Pro tier specifically. When it does ship, it will compete with FLUX 2, Midjourney, Ideogram 4.0, and OpenAI's GPT Image line.

Doubao Audio 1.0 (豆包音频生成模型1.0) is ByteDance's audio generation model, and it is the most experimental release of the four. It is in invite-only testing on Volcano Engine's Ark platform, and ByteDance describes it as capable of producing up to roughly two minutes of audio per generation, with multi-character dialogue, background music, and environmental sound effects. No pricing has been disclosed. Audio generation is one of the few modalities where no single player has run away with the lead, so ByteDance entering with a multi-character, film-grade pitch is worth tracking — even if it is the model least ready for production today.

Why a Four-Model Day Matters

Strip away the individual specs and the strategic message is loud. ByteDance is making a claim that very few companies can credibly make: that it has a frontier-competitive model in every major modality, built in-house, and a distribution surface — Doubao, CapCut, Jimeng, Douyin — to put them in front of hundreds of millions of users without paying anyone for reach.

That is a different bet than the one OpenAI, Google, and Anthropic are making in the West. The Western frontier labs increasingly compete on raw capability and enterprise trust, priced at a premium. ByteDance is competing on breadth, price, and distribution. Doubao Seed 2.1's pricing is not trying to be the best model in the world; it is trying to be the most economical one that is good enough, plugged into the largest funnels in the Chinese-speaking internet. The 180-trillion-tokens-a-day figure, self-reported as it is, is the proof point for that strategy: volume first, margin later.

For everyone outside China, the more important question is reach. Seedance and Seedream feed CapCut and Dreamina, products with large international footprints, which means ByteDance's media models touch Western creators in a way its language model does not. If Seedance 2.5 ships in early July and lands anywhere near its demos, it will not need to win a benchmark to matter — it will already be one click away inside an editor that hundreds of millions of people open every week.

What to Watch Next

A four-model announcement is a statement of ambition. The follow-through is what counts, and there are four specific things worth tracking over the coming weeks.

  • Does Seedance 2.5 actually ship in early July? The whole story changes if the headline model slips. Watch the Jimeng/Dreamina and Volcano Engine API channels for a public release, and watch which regions get it first — early reports suggest a possible China-first rollout.
  • The first independent benchmarks for Doubao Seed 2.1. Once SWE-bench maintainers, Artificial Analysis, or independent researchers run it, the "competitive with GPT-5.5" claim becomes testable. That is the moment the marketing meets reality.
  • Seedance 2.5's real output resolution at GA. Given the 2K-versus-720p precedent and the fact that ByteDance never stated a native resolution for 2.5, this is the spec to confirm rather than assume when the model goes public.
  • Whether the price holds. Aggressive launch pricing is easy; sustaining it at 180-trillion-token scale is the real test of ByteDance's economics. If Doubao Seed 2.1's pricing sticks, it puts genuine pressure on every frontier lab's per-token economics in the markets where ByteDance competes.

ByteDance did not just ship four models on June 23. It made an argument — that the future of applied AI is multimodal, cheap, and distributed through products people already use, not sold as premium APIs. Three of those four models still have to actually arrive to prove the point. The next few weeks, starting with whether Seedance 2.5 makes its early-July date, will tell us whether the argument holds.

Frequently Asked Questions

What is ByteDance’s FORCE conference?

FORCE (Volcano Engine Primal Power Conference) is ByteDance’s developer and enterprise event for Volcano Engine, its cloud and model-as-a-service platform. At the summer 2026 edition in Beijing on June 23, ByteDance announced four AI models in one day: Doubao Seed 2.1 (text), Seedance 2.5 (video), Seedream 5.0 Pro (image), and Doubao Audio 1.0 (audio).

What is Doubao Seed 2.1?

Doubao Seed 2.1 is ByteDance’s flagship large language model, announced at FORCE 2026 and positioned for what the company calls the Coding and Agent era. It comes in a Pro tier (deep reasoning, complex coding and agents) and a lower-cost Turbo tier. It is the only model from the FORCE announcement available through the Volcano Engine API today, priced at 6 yuan per million input tokens and 30 yuan per million output tokens, roughly 0.88 and 4.42 US dollars.

When is Seedance 2.5 available?

Seedance 2.5 is not generally available yet. As of its June 23, 2026 announcement it is in enterprise beta, with a public launch targeted for early July 2026, possibly rolling out in China first. ByteDance’s confirmed gains for it are native 30-second clips and up to 50 multimodal reference inputs; the native 4K upgrade shown at the same event applies to the existing Seedance 2.0 model, not to Seedance 2.5.

Is Doubao Seed 2.1 really comparable to GPT-5.5?

That is a ByteDance claim, not an independently verified fact. ByteDance says Doubao Seed 2.1 is competitive with GPT-5.5, Claude Opus 4.x, and Gemini 3.1 Pro, citing benchmarks like Terminal Bench and OpenAI’s GDPval. But the model launched the same day, so no third-party benchmark exists yet, and Chinese sources were not even consistent on which Opus version it was compared against. Treat the comparison as a marketing position until independent testing is published.

How much does Doubao Seed 2.1 cost?

Doubao Seed 2.1 Pro is priced at 6 yuan per million input tokens and 30 yuan per million output tokens, with cached input as low as 1.2 yuan, which converts to roughly 0.88, 4.42, and 0.18 US dollars respectively. The Turbo tier costs about half that. ByteDance also claims a blended cost near 1.96 yuan per million tokens in cached coding and agent workloads, and roughly 80 percent below Claude Opus 4.x, though those figures depend on cache assumptions.

What is Seedream 5.0 Pro and Doubao Audio 1.0?

Seedream 5.0 Pro is ByteDance’s new top-tier image model, adding interactive editing, multi-layer separation, and native multilingual text generation; it is listed as coming soon with no pricing confirmed. Doubao Audio 1.0 is an audio generation model in invite-only beta that ByteDance says can produce up to about two minutes of audio with multi-character dialogue, background music, and ambient sound. Both were announced at FORCE on June 23, 2026 but are not generally available.

Related Articles

Was this review helpful?
Anthony M. — Founder & Lead Reviewer
Anthony M.Verified Builder

We're developers and SaaS builders who use these tools daily in production. Every review comes from hands-on experience building real products — DealPropFirm, ThePlanetIndicator, PropFirmsCodes, and many more. We don't just review tools — we build and ship with them every day.

Written and tested by developers who build with these tools daily.