Skip to content
news13 min read

NVIDIA Nemotron 3 Ultra: The Best US Open-Weights Model — But China Still Leads (June 2026)

Nemotron 3 Ultra is NVIDIA’s largest open-weights model (≈550B MoE, 55B active), announced June 1 at Computex. It scores 48 on the Artificial Analysis Index — the top US open model — but China’s Kimi K2.6 leads at 54. Weights ship June 4.

Author
Anthony M.
13 min readVerified June 3, 2026Tested hands-on
NVIDIA Nemotron 3 Ultra — the best US open-weights model, Index 48
NVIDIA Nemotron 3 Ultra — announced at Computex, the leading US open-weights model

Nemotron 3 Ultra is NVIDIA’s largest open-weights language model, announced June 1, 2026 at Jensen Huang’s Computex keynote. It is a Mixture-of-Experts model with roughly 550 billion total parameters and 55 billion active at 90% sparsity, and it scores 48 on the Artificial Analysis Intelligence Index — the highest of any US open-weights model. But it is not the global leader: China’s Kimi K2.6 scores 54, a 6-point gap that the evaluators call meaningful. Weights are not downloadable yet; they are expected to ship June 4.

What NVIDIA Announced

On June 1, 2026, during Jensen Huang’s Computex keynote, NVIDIA announced Nemotron 3 Ultra — the new flagship of its open-weights model family and, by the numbers, the most capable open model an American lab has put forward. Independent evaluation firm Artificial Analysis, a tier-one benchmark provider and official NVIDIA partner, published its first read the same day.

The headline number is an Artificial Analysis Intelligence Index score of 48. That composite metric blends reasoning, knowledge, math, and coding evaluations into a single figure, and 48 is enough to put Nemotron 3 Ultra at the front of the US open-weights pack. The catch — and it is a significant one — is that the open-weights frontier is not American. China’s Kimi K2.6, from Moonshot AI, scores 54 and ranks around fourth among all models in the world, open or closed.

So the honest framing is this: Nemotron 3 Ultra is the best open-weights model the United States has shipped, and it is still behind China. Both things are true at once, and the gap is not a rounding error.

The Key Specs

Nemotron 3 Ultra is a Mixture-of-Experts (MoE) architecture with approximately 550 billion total parameters and 55 billion active per forward pass, which works out to about 90% sparsity. In plain terms, the model is enormous, but only a fraction of it fires for any given token, which keeps inference cheaper than a dense model of the same size.

  • Total parameters: approximately 550 billion (MoE).
  • Active parameters: approximately 55 billion per token.
  • Sparsity: roughly 90%.
  • Precision: weights released in BF16, with NVFP4 quantization planned for faster, lower-cost inference.
  • Throughput: over 300 tokens per second on a pre-release DeepInfra endpoint.
  • Artificial Analysis Intelligence Index: 48.

This is the largest model in the Nemotron 3 line and the largest recent US open-weights release. NVIDIA also claims roughly 5x faster inference and 30% lower cost than comparable open-weights alternatives — vendor figures we are reporting as claims, not as independently verified results.

US open-weights leads but China stays ahead — Nemotron 3 Ultra vs Kimi K2.6
The honest picture: the US now leads open-weights, but China’s frontier still sits higher

Why It Is the US Open-Weights Leader

To understand why a score of 48 matters, you have to look at what it is beating. Among American open-weights models, Nemotron 3 Ultra now sits clearly on top:

  • Nemotron 3 Ultra: 48.
  • Google Gemma 4 31B: 39.
  • NVIDIA Nemotron 3 Super: 36.
  • OpenAI gpt-oss-120b: 33.

A 9-point jump over Google’s Gemma 4 is substantial on a composite index, and the gap over OpenAI’s gpt-oss-120b is wider still. NVIDIA has effectively reset the ceiling for what an American open model can do. For developers and enterprises that need a downloadable model they can run themselves, Nemotron 3 Ultra is now the most capable Western option on the table.

It also continues NVIDIA’s steady push into being a model company, not just a chip company. The broader Nemotron 3 family — the smaller Nemotron Nano and the Nano Omni release from April 28 — has been around for a while and has already passed 50 million downloads. Ultra is the new top-tier addition, not a rebrand of those earlier models.

Open Weights, and Why That Word Matters

"Open weights" is not the same as fully open source, but it is the part that matters most for practical use. It means the trained model parameters can be downloaded, inspected, fine-tuned, and run on your own hardware without routing data through a vendor’s API. For governments, banks, defense contractors, and any organization with data-residency requirements, that property is often non-negotiable.

This is the same dynamic that has driven interest in models like Cohere’s Command A on the enterprise side and the open-weight coding frontier represented by MiniMax M3. The difference Nemotron 3 Ultra brings is raw measured intelligence at the top of the US field — a frontier-adjacent score in a format you can actually self-host.

Nemotron 3 Ultra Mixture-of-Experts architecture — 550B total, 55B active
Inside the MoE: 90% sparsity means only a fraction of the 550B parameters fire per token

Inside the Mixture-of-Experts Design

The 550B-total, 55B-active split is the headline architectural choice, and it is the reason a model this large can be served at over 300 tokens per second. In a dense 550-billion-parameter model, every parameter participates in computing every token, which is brutally expensive. In an MoE model, a router sends each token to a small subset of "experts," so only about 55 billion parameters do work at any moment.

At 90% sparsity, roughly nine out of ten parameters sit idle for any given token. That is how NVIDIA gets a model with frontier-scale capacity to run at a throughput that pre-release testers measured above 300 tokens per second on DeepInfra — well ahead of the 50 to 100 tokens per second that Artificial Analysis notes is typical for comparable Chinese peer models on similar endpoints.

The planned NVFP4 quantization is the second lever. Lower-precision weights shrink the memory footprint and let more of the model fit on fewer GPUs, which is where NVIDIA’s claimed 30% cost reduction comes from. The weights ship first in BF16; NVFP4 is described as planned rather than available at launch, so treat the cost and speed multipliers as forward-looking until independent benchmarks land.

The Part NVIDIA Did Not Lead With: China Is Still Ahead

Here is where the marketing and the measurement diverge. Nemotron 3 Ultra is the best US open-weights model. It is not the best open-weights model. That title still belongs to China.

Kimi K2.6, from Moonshot AI, scores 54 on the same Artificial Analysis Intelligence Index that gives Nemotron 3 Ultra a 48. Six points on a composite benchmark is not noise — Artificial Analysis explicitly describes the gap as meaningful, and Kimi K2.6 sits around fourth among all models worldwide, competing with closed frontier systems. The broader Chinese open-weights ecosystem has been relentless, from DeepSeek V4 to Alibaba’s Qwen 3.6, both of which have repeatedly set the open-weights pace this year.

That context is why we are not running the "best open model in the world" headline that a less careful outlet might. The accurate story is narrower and more useful: the United States now has a credible, top-tier open-weights model, but the frontier of open weights remains Chinese for now. Pretending otherwise would make the article less citable, not more impressive.

What This Means for US AI Sovereignty

For most of the past two years, the open-weights leaderboard has been a Chinese story, with American labs either keeping their best models closed or, in Meta’s case, stepping back from the open frontier entirely. Nemotron 3 Ultra changes the shape of that conversation. It gives US-aligned developers, agencies, and regulated industries a downloadable model that is genuinely competitive, rather than a generation behind.

That is a real shift. Open weights matter for sovereignty precisely because they remove the dependency on a foreign or proprietary API. An organization that needs to run inference air-gapped, fine-tune on sensitive data, or guarantee that prompts never leave its infrastructure now has a serious American option. The 6-point gap to Kimi K2.6 still exists, but the strategic picture is no longer "China owns open weights, full stop."

Nemotron 3 Ultra weights ship June 4 across HuggingFace, ModelScope, OpenRouter, NIM
Announced June 1, weights expected June 4 — distributed across HuggingFace, ModelScope, OpenRouter and NIM

Availability: Announced Now, Weights Expected June 4

This is the detail that is easy to get wrong, so it is worth being precise. As of June 1–2, Nemotron 3 Ultra is announced but not released. Artificial Analysis is explicit that it has not yet been able to publish full benchmarks because the model is not generally available, and the only live access is a pre-release inference endpoint on DeepInfra.

Reporting from Decrypt points to a ship date of June 4, 2026, at which point the weights are expected to become downloadable. NVIDIA has announced distribution across several channels:

  • HuggingFace — the standard hub for open-weights downloads.
  • ModelScope — broadening reach, including in Asian markets.
  • OpenRouter — hosted API access through a multi-model gateway.
  • build.nvidia.com (NIM) — NVIDIA’s own NIM microservice catalog for managed deployment.

If you are planning to evaluate the model, the practical takeaway is to wait for June 4. Anyone telling you the weights are downloadable right now is ahead of the facts. Hosted access through OpenRouter and DeepInfra is the only way to touch the model before then, and pricing for those endpoints will be usage-based.

How It Compares to the Open-Weights Field

Stacked against the open-weights models that have defined 2026, Nemotron 3 Ultra lands as the strongest US entry but not the outright champion. Against Gemma 4 31B (39), Nemotron 3 Super (36), and gpt-oss-120b (33), it is comfortably ahead. Against Kimi K2.6 (54), it trails by six points. That places it in a specific and useful bracket: the model to reach for if you want maximum measured intelligence from an American open-weights release, while acknowledging the global ceiling is currently higher.

The throughput story strengthens its case for production use. Over 300 tokens per second on a pre-release endpoint, versus the 50 to 100 tokens per second typical of comparable Chinese peers on similar hardware, suggests Nemotron 3 Ultra is engineered for serving at scale, not just topping a leaderboard. For latency-sensitive agentic workloads, raw speed at a high intelligence tier is a genuine differentiator — the same calculus that has made fast inference a selling point across the model market.

It is worth keeping the comparison honest in both directions. Index scores compress a lot of nuance into a single number, and a 48 does not mean Nemotron 3 Ultra is worse at every task than a 54. Composite benchmarks weight reasoning, math, knowledge, and coding together; a model can lead on one axis and trail on another while landing lower overall. The reason we still lean on the index is that it is the most consistent apples-to-apples signal available on launch day, and Artificial Analysis applies it identically across US and Chinese models. When full per-category benchmarks arrive at release, the picture will get more granular — and that granularity is exactly what serious adopters should wait for before committing a production stack to any single model.

What It Means for Developers and Enterprises

For the people who actually build with these models, the announcement raises a concrete question: when should you reach for Nemotron 3 Ultra over the alternatives? The short answer is when you need maximum measured intelligence in a self-hostable, US-origin package, and you have the GPU budget to run a roughly 550-billion-parameter MoE model.

That immediately rules in a specific set of buyers. Sovereign and regulated deployments — public-sector agencies, financial institutions, healthcare systems, defense-adjacent contractors — are the natural fit, because the open-weights format lets them run inference inside their own perimeter. For those buyers, the fact that Kimi K2.6 scores six points higher is often beside the point: a Chinese open-weights model may be off the table for procurement or compliance reasons regardless of its benchmark. In that framing, Nemotron 3 Ultra is not competing with Kimi K2.6 at all; it is the best option in a field that has been quietly narrow for a long time.

The cost picture is where caution is warranted. A model this size is a multi-GPU commitment, and the headline efficiency numbers — 5x faster, 30% cheaper — are NVIDIA’s own, tied to NVFP4 quantization that is planned rather than shipped. Teams sizing infrastructure should budget against the BF16 reality available at launch and treat the quantized economics as an upside to verify, not a baseline to assume. The smarter move for most teams is to prototype against the hosted OpenRouter or DeepInfra endpoints first, measure quality on their own tasks, and only then decide whether self-hosting the full weights is worth the hardware.

There is also a portfolio argument. Nemotron 3 Ultra does not have to be an all-or-nothing choice. Many production stacks route by task — a fast small model for simple calls, a frontier model for hard reasoning. Ultra slots cleanly into the high-intelligence, self-hosted tier of that kind of router, alongside or instead of a hosted frontier API, depending on data-sensitivity and latency needs.

NVIDIA the Model Company, Not Just the Chip Company

Strip away the leaderboard and there is a quieter strategic story here. Jensen Huang chose his Computex keynote — historically a hardware stage — to announce a model, not a GPU. That is deliberate. NVIDIA has spent years as the indispensable supplier to every other lab; with the Nemotron line, and now with Ultra at the top of the US open-weights field, it is staking a claim as a model maker in its own right.

The 50-million-download milestone for the broader Nemotron 3 family is the proof point that this is not a vanity project. Developers are already pulling these models in volume. Ultra extends that footprint upward into frontier-adjacent territory, and it does so in the open-weights format that maximizes distribution — HuggingFace, ModelScope, OpenRouter, and NVIDIA’s own NIM catalog all at once. Releasing open weights is also a smart way to drive demand for the hardware those weights run best on, which keeps the strategy coherent with the core chip business rather than cannibalizing it.

For the rest of the industry, an NVIDIA that ships competitive open models is a complicated development. It pressures other US labs that have kept their best work closed, and it gives the open-weights category a deep-pocketed American champion at exactly the moment the narrative had tilted toward China. Whether that champion can close the last six points to Kimi K2.6 is the question the next release cycle will answer.

Why It Matters

Nemotron 3 Ultra is significant for three reasons, even before a single weight is downloadable. First, it re-establishes the United States as a serious participant in open weights after a stretch where the category looked increasingly Chinese. Second, it deepens NVIDIA’s transformation from the company selling the shovels into a company also mining the gold — a frontier-adjacent open model is a different kind of asset than a GPU. Third, it sharpens the geopolitical framing of AI: the open-weights race is now genuinely US versus China at the top, not a one-sided story.

What it is not is a victory lap. A 6-point gap to Kimi K2.6 is the headline NVIDIA chose not to put on stage, and it is the number that keeps this launch honest. The right reading is measured optimism: a real step forward for American open AI, shipped into a world where China still holds the lead.

What to Watch After June 4

The first thing to watch is the gap between vendor claims and independent measurement. NVIDIA’s 5x-faster, 30%-cheaper figures and the NVFP4 cost story need third-party verification once the weights are public; Artificial Analysis has already said it will publish full benchmarks at release. The second is adoption — whether enterprises and governments actually deploy Nemotron 3 Ultra for sovereign workloads, or whether the 6-point intelligence gap to Kimi K2.6 sends them toward Chinese open weights anyway. The third is the next Chinese response, because if 2026 has shown anything, it is that the open-weights frontier does not stay still for long.

Frequently Asked Questions

What is NVIDIA Nemotron 3 Ultra?

Nemotron 3 Ultra is NVIDIA’s largest open-weights language model, announced June 1, 2026 at Jensen Huang’s Computex keynote. It is a Mixture-of-Experts (MoE) model with approximately 550 billion total parameters and 55 billion active parameters at 90% sparsity. It scores 48 on the Artificial Analysis Intelligence Index, making it the most intelligent US open-weights model to date.

How good is Nemotron 3 Ultra compared to other open-weights models?

On the Artificial Analysis Intelligence Index, Nemotron 3 Ultra scores 48, ahead of Google’s Gemma 4 31B (39), NVIDIA’s own Nemotron 3 Super (36), and OpenAI’s gpt-oss-120b (33). That makes it the leading US open-weights model. However, China’s Kimi K2.6 scores 54 — a 6-point lead that Artificial Analysis calls meaningful.

Is Nemotron 3 Ultra better than Kimi K2.6?

No. Kimi K2.6, from Chinese lab Moonshot AI, scores 54 on the Artificial Analysis Intelligence Index versus 48 for Nemotron 3 Ultra. Kimi K2.6 ranks roughly fourth among all models worldwide, open or closed. Nemotron 3 Ultra is the strongest American open-weights model, but the open-weights frontier overall still belongs to Chinese labs.

When will Nemotron 3 Ultra weights be available to download?

The weights are not downloadable as of June 1–2. Artificial Analysis describes the model as announced but not yet released. Reporting from Decrypt points to a ship date of June 4, 2026. Until then, only a pre-release inference endpoint on DeepInfra is accessible.

Where will Nemotron 3 Ultra be distributed?

NVIDIA has announced distribution across HuggingFace, ModelScope, OpenRouter, and build.nvidia.com (its NIM microservice catalog). The model is released under open weights in BF16 format, with NVFP4 quantization planned for faster, lower-cost inference.

How fast is Nemotron 3 Ultra?

On a pre-release DeepInfra endpoint, Nemotron 3 Ultra runs at over 300 tokens per second. NVIDIA claims roughly 5x faster inference and 30% lower cost than comparable open-weights alternatives, helped by the model’s 90% sparsity and planned NVFP4 quantization.

How is Nemotron 3 Ultra different from the rest of the Nemotron 3 family?

The broader Nemotron 3 family — including Nemotron Nano and the Nano Omni release from April 28 — has been available for some time and has surpassed 50 million downloads. Nemotron 3 Ultra is the brand-new, top-tier release announced June 1. It is the largest Nemotron 3 and the largest recent US open-weights release.

What does the Artificial Analysis Intelligence Index measure?

The Artificial Analysis Intelligence Index is a composite benchmark from Artificial Analysis, a tier-one independent evaluation firm and an official NVIDIA partner. It aggregates performance across reasoning, knowledge, math, and coding evaluations into a single number, letting models be ranked on a common scale. Nemotron 3 Ultra scores 48 on this index.

Why does Nemotron 3 Ultra matter for US AI sovereignty?

Nemotron 3 Ultra is the strongest open-weights model an American lab has shipped, narrowing but not closing the gap with China. Open weights can be downloaded, fine-tuned, and run on-premises without sending data to a vendor, which matters for governments and regulated industries. The US now has a credible open frontier model, even though Kimi K2.6 still leads at 54.

Can Nemotron 3 Ultra run on a single GPU?

Unlikely in BF16. With roughly 550 billion total parameters, the model is built for multi-GPU data-center deployment, not single-card setups. NVIDIA’s planned NVFP4 quantization is designed to cut the memory and cost footprint, but Nemotron 3 Ultra targets the high end of open-weights, not laptops or consumer GPUs.

Is Nemotron 3 Ultra free to use?

The weights are released under open terms, so once they ship on June 4 you can download and self-host them at no licensing cost, subject to NVIDIA’s license. Running them still requires substantial GPU compute. Hosted access through providers such as OpenRouter, DeepInfra, and build.nvidia.com will carry usage-based pricing.

Related Articles

Was this review helpful?
Anthony M. — Founder & Lead Reviewer
Anthony M.Verified Builder

We're developers and SaaS builders who use these tools daily in production. Every review comes from hands-on experience building real products — DealPropFirm, ThePlanetIndicator, PropFirmsCodes, and many more. We don't just review tools — we build and ship with them every day.

Written and tested by developers who build with these tools daily.