Skip to content
analysis13 min read

Why OpenAI Built Its Own Chip — Inside the Inference-Cost War

OpenAI's Jalapeño is its first custom chip — an inference accelerator built with Broadcom, unveiled June 24 2026. Inference only; pre-training stays on NVIDIA. Unveiled, not shipping. Here's why it matters.

Author
Anthony M.
13 min readVerified June 26, 2026Tested hands-on
OpenAI Jalapeño — first custom AI inference chip co-designed with Broadcom, unveiled June 2026
OpenAI unveiled Jalapeño, its first in-house inference chip, on June 24, 2026 — built with Broadcom, not yet shipping.

OpenAI's Jalapeño is the company's first custom AI chip — an inference accelerator co-designed with Broadcom and unveiled on June 24, 2026. It is built specifically to run large language models in production (inference), not to train them; OpenAI says pre-training stays on NVIDIA hardware for now. Jalapeño is announced but not shipping: engineering samples are running workloads in the lab, with initial deployment targeted for the end of 2026. OpenAI claims "significantly better performance-per-watt" than current state-of-the-art chips — a vendor claim that has not been independently verified.

The handoff was staged like a milestone. Broadcom CEO Hock Tan and President Charlie Kawwas physically delivered the first chip to OpenAI CEO Sam Altman and President Greg Brockman. The symbolism was the point: the company that defined the modern AI boom by buying NVIDIA GPUs at scale now wants to design the silicon its models run on. This is the clearest signal yet of a structural shift across the industry — the slow, deliberate de-NVIDIA-ization of the labs that built the boom in the first place.

What OpenAI Actually Announced

On June 24, 2026, OpenAI and Broadcom unveiled Jalapeño, described by OpenAI as its first "Intelligence Processor" and the first AI accelerator in a multi-generation compute platform the two companies are building together. It is a custom ASIC — an application-specific integrated circuit — architected around OpenAI's own view of how LLM inference should work, rather than a general-purpose GPU adapted to the task.

Three facts matter more than the marketing, and each needs to be stated precisely:

  • It is for inference, not training. Jalapeño is designed to run already-trained models in response to user requests — answering a ChatGPT prompt, generating code, serving an API call. As TechCrunch reported, "it's likely that more performance-intensive tasks like pre-training will still rely on Nvidia hardware." OpenAI is not trying to replace NVIDIA across the board; it is carving out the single largest, most repetitive slice of its compute bill.
  • It is unveiled, not shipped. There is no ship date. Engineering samples are reportedly running machine-learning workloads in the lab at production target frequency and power — including OpenAI's own GPT-5.3-Codex-Spark model — but the chip is in testing, not production. Initial deployment is targeted for the end of 2026, with real volume expected in 2027.
  • The headline performance number is a claim, not a benchmark. OpenAI says early results show "significantly better performance-per-watt than current state-of-the-art alternatives." That figure comes solely from OpenAI's own testing. No independent benchmark exists, and a technical report has only been promised. Treat it as a vendor claim until third parties can measure it.

The partnership behind Jalapeño was first announced in October 2025, when OpenAI and Broadcom committed to deploying roughly 10 gigawatts of OpenAI-designed accelerators. That commitment now extends through 2029, with Microsoft named as a deployment partner for the resulting capacity. Broadcom provides the silicon implementation, networking, and connectivity (its Tomahawk switching is part of the stack); Celestica handles boards, racks, and system integration. OpenAI's contribution is the chip architecture itself.

One detail OpenAI leaned into hard: speed. The company said the process from design to tape-out took just nine months — what it calls the fastest ASIC development cycle for high-performance semiconductors it is aware of — and that its own models helped accelerate parts of the design work. Whether or not that is a record, it is a genuinely aggressive timeline for custom silicon, where three-to-four-year cycles are normal.

Diagram: pre-training stays on NVIDIA GPUs while inference moves to OpenAI's custom Jalapeño chip
The split that defines the strategy: training stays on NVIDIA, inference moves in-house.

Inference vs. Pre-Training: Why the Distinction Is the Whole Story

To understand why OpenAI built Jalapeño for inference and not training, you have to understand that they are two completely different computing problems with different economics.

Pre-training is the one-time, capital-intensive process of teaching a model. It runs once per model generation, demands enormous clusters of tightly interconnected chips doing dense matrix math for weeks or months, and rewards raw flexibility — you do not always know in advance what the next architecture will need. This is NVIDIA's fortress. Its GPUs, paired with the CUDA software stack and NVLink interconnect, are the most flexible high-performance training hardware on the market, and OpenAI has every reason to keep training there.

Inference is the opposite. It runs constantly — every prompt, every API call, every Codex completion, billions of times a day, forever. It is far more predictable: you know exactly which models you are serving and exactly how they behave. And critically, at OpenAI's scale, inference is now the dominant and ever-growing share of the compute bill. A model is trained once but served for its entire life.

That predictability is what makes inference the perfect target for custom silicon. When you know the workload precisely, you can strip out everything a general-purpose GPU carries for flexibility's sake and optimize ruthlessly for one thing: tokens per watt per dollar. Brockman framed exactly this in his comments to the press: "We have a deep understanding of the workload. We've really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what's possible?"

That single sentence is the entire thesis of custom inference silicon. You do not beat NVIDIA at being NVIDIA. You beat NVIDIA's economics on the one workload you understand better than anyone on Earth — because you invented the models running on it.

Why It Matters: The Inference-Cost War

The reason every major lab is suddenly designing chips is not pride. It is gross margin.

For the last three years, the AI industry has run on a simple arrangement: labs raise enormous sums, then hand a large fraction of it to NVIDIA for GPUs. NVIDIA's gross margins have sat in the 70-to-75% range — meaning roughly three of every four dollars a lab spends on accelerators is NVIDIA's profit, not silicon cost. For a company serving inference at OpenAI's scale, that margin is the single biggest controllable line item in the business. Every percentage point of performance-per-watt clawed back at the chip level compounds across billions of daily requests.

This is the "inference-cost war," and it has three fronts:

  • Cost per token. The price of serving an LLM response. Custom silicon optimized for a known model can, in principle, drive this down meaningfully — which is exactly the lever OpenAI is pulling.
  • Energy per token. Inference at gigawatt scale is constrained by power, not just chips. "Performance-per-watt" is the metric OpenAI chose to lead with precisely because power is the binding constraint on how much intelligence you can actually deliver.
  • Supply independence. Designing your own accelerator reduces dependence on a single vendor's allocation and pricing power — though, as we have covered before, it does not remove dependence on TSMC, which still manufactures nearly all of this silicon.

Owning the inference chip also lets OpenAI co-design hardware and models together. If you control both the accelerator and the model architecture, you can shape one around the other — quantization schemes, memory layouts, attention patterns — in ways no external GPU vendor can match. That is the deeper prize, and it is why this is framed not as a one-off product but as a "multi-generation platform."

Visual of the AI inference-cost war — labs designing custom chips to cut the cost and energy of serving models
The fight has moved from "who has the most GPUs" to "who serves intelligence cheapest per watt."

The De-NVIDIA-ization of the Labs

Jalapeño does not stand alone. It is the latest move in a coordinated, industry-wide effort to route around NVIDIA's near-monopoly on AI accelerators — a trend we have tracked across our coverage of NVIDIA's own moves and the broader silicon supply crunch.

The pattern is now unmistakable. Every hyperscaler and frontier lab with enough inference volume to justify the engineering cost is building, buying, or backing custom silicon:

  • Google has run its own TPUs (Tensor Processing Units) for inference and training for years — the most mature in-house AI silicon program in the industry.
  • Amazon built Trainium and Inferentia, its training and inference accelerators, to lower the cost of AI on AWS and reduce reliance on bought-in GPUs.
  • Microsoft has its Maia accelerators — and is also a named deployment partner for OpenAI's Jalapeño-class capacity, hedging on both sides of the table.
  • Anthropic runs heavily on Amazon's Trainium and has been shopping for inference silicon, including stakes in UK chip startups, as we reported in our piece on Fractile's Series B.
  • Mistral has publicly said it is exploring its own chips, even while admitting it still runs on NVIDIA today — the same "training stays, inference moves" hedge, as we covered in Mensch's comments.

The strategic logic is identical in every case: keep the flexibility of NVIDIA for the unpredictable work, and build custom silicon for the predictable, high-volume work where you can win on cost. OpenAI is simply the largest and most visible player to do it.

The Other Front: Cracking the CUDA Moat

Custom chips solve only half of NVIDIA's advantage. The other half is software — specifically CUDA, the programming layer that has locked a generation of AI developers into NVIDIA hardware. A new accelerator is worthless if nobody can easily run their models on it.

That is why a second, less-noticed announcement on the very same day — June 24, 2026 — may matter just as much as Jalapeño. Qualcomm agreed to acquire Modular for about $3.9 billion in an all-stock deal (CNBC, June 24). Modular builds a vendor-neutral software layer that lets AI models run across different hardware — NVIDIA, AMD, and others — without rewriting code for each chip. In other words, it is explicitly designed to break the CUDA lock-in that keeps developers tethered to NVIDIA. The deal is expected to close in the second half of 2026, subject to regulatory approval.

Read together, the two announcements describe a pincer movement. Jalapeño attacks NVIDIA's hardware margins on inference. The Qualcomm-Modular deal attacks the software moat that makes NVIDIA hardware sticky in the first place. Neither dethrones NVIDIA on its own — but for the first time, the industry is moving against both pillars of the monopoly at once.

One caution on the surrounding chatter: reports have circulated about Qualcomm also pursuing other AI-hardware acquisitions. Those remain reported, not confirmed, and should not be treated as done deals. The Modular acquisition is the confirmed one.

How Jalapeño Compares to NVIDIA — and Why It Doesn't Replace It

The single most important thing to get right about this story is what Jalapeño is not. It is not an NVIDIA killer, and OpenAI never claimed it was.

DimensionNVIDIA GPUs (e.g. Blackwell / Rubin)OpenAI Jalapeño
Primary jobTraining + inference (general-purpose)Inference only (LLM-optimized)
FlexibilityVery high — any model, any architectureNarrow — tuned to OpenAI's own models
AvailabilityShipping at scale todayUnveiled; engineering samples only, no ship date
SoftwareCUDA ecosystem, broad developer lock-inInternal stack, co-designed with OpenAI models
Performance claimIndependently benchmarked over years"Significantly better perf-per-watt" — OpenAI's own claim, unverified
Who it servesThe entire marketOpenAI's own inference fleet (plus Microsoft)

NVIDIA still wins where it has always won: training, flexibility, and a software ecosystem no one else can match. What changes is the inference layer — the largest and fastest-growing slice of OpenAI's compute. If Jalapeño delivers anything close to its claimed efficiency, OpenAI keeps more of its own margin and gains leverage in every future NVIDIA negotiation, even on the chips it keeps buying. That leverage may be worth as much as the silicon itself.

It is also worth remembering the ceiling that applies to everyone in this story. As TSMC's own leadership has warned, advanced-node and packaging capacity is reportedly sold out into 2027. Designing your own chip does not move you to the front of the manufacturing queue. Jalapeño, Trainium, Maia, and NVIDIA's Rubin all depend on the same handful of TSMC fabrication and advanced-packaging lines. Custom silicon changes who captures the margin; it does not, by itself, create more wafers.

Our Take

We read Jalapeño less as a product launch and more as a declaration of intent — and the timing of the announcement, staged with a physical chip handoff and a co-launched software-moat deal from Qualcomm on the same day, makes that intent hard to miss.

The strategically honest reading is this: OpenAI has reached the scale where the math on custom inference silicon finally works. Below a certain volume, the engineering cost of designing your own chip dwarfs any savings. Above it, every watt you save compounds into real money across billions of daily requests. By unveiling Jalapeño now — and explicitly scoping it to inference while leaving training on NVIDIA — OpenAI is signaling that it has crossed that threshold and intends to own its cost structure from the model down to the silicon.

What would prove this skeptical framing wrong is simple and measurable: an independent, third-party benchmark confirming the "significantly better performance-per-watt" claim, followed by Jalapeño actually deploying at scale in 2027 rather than slipping. Custom-silicon roadmaps are famous for slipping. Until samples become shipping hardware and the claim survives outside testing, the right posture is interested but unconvinced. The strategy is sound; the execution is unproven.

The bigger picture is the one to hold onto. For the first time, the same companies that built the AI boom on NVIDIA's chips are systematically working to reduce their dependence on them — on the hardware front with custom accelerators, and on the software front with CUDA alternatives. NVIDIA is not in trouble. But the era in which it was the only road into frontier AI is quietly ending, and Jalapeño is the loudest proof yet.

What's Next

The milestones to watch are concrete. First: a technical report from OpenAI with real numbers behind the performance-per-watt claim — and whether independent reviewers can reproduce them. Second: whether initial deployment actually lands by the end of 2026 as targeted, or slips into 2027 (and how far). Third: the regulatory path for Qualcomm's Modular acquisition, whose close in the second half of 2026 will determine how quickly a credible CUDA alternative reaches developers at scale. Fourth: NVIDIA's response — pricing, roadmap, or otherwise — to a customer that is now also, partially, a competitor.

We will update this piece as the technical report and deployment details land. For now, the headline is clean: OpenAI has unveiled its first inference chip, it is real but not yet shipping, the standout efficiency number is OpenAI's own unverified claim, and the deeper story is an industry quietly building its way out of a single vendor's grip.

Editorial note: This is analysis and commentary, not sponsored content. ThePlanetTools.ai has no commercial relationship with OpenAI, Broadcom, NVIDIA, or Qualcomm. Performance figures attributed to OpenAI are the company's own claims and have not been independently verified as of publication.

Frequently Asked Questions

What is OpenAI's Jalapeño chip?

Jalapeño is OpenAI's first custom AI chip, unveiled on June 24, 2026 and co-designed with Broadcom. It is an inference accelerator — an application-specific chip built to run large language models in production, such as answering ChatGPT prompts and serving API requests. OpenAI calls it its first "Intelligence Processor" and the first in a multi-generation compute platform built with Broadcom. It is announced but not yet shipping; engineering samples are running in the lab, with initial deployment targeted for the end of 2026.

What is the difference between inference and pre-training, and why does it matter here?

Pre-training is the one-time, compute-heavy process of teaching a model — it runs once per model generation and rewards flexible, high-performance hardware, which is NVIDIA's strength. Inference is running the already-trained model in response to user requests; it happens constantly and is highly predictable. Jalapeño targets inference precisely because the workload is predictable and dominant in cost, making it ideal for custom silicon optimized for tokens per watt. OpenAI has said pre-training will continue to rely on NVIDIA hardware.

When will Jalapeño be available?

There is no firm ship date. As of the June 24, 2026 announcement, Jalapeño is in testing — engineering samples are reportedly running machine-learning workloads in the lab at production target frequency and power, including OpenAI's GPT-5.3-Codex-Spark model. OpenAI and Broadcom are targeting initial deployment by the end of 2026, with meaningful volume expected in 2027. It is unveiled, not shipping.

Does Jalapeño replace NVIDIA for OpenAI?

No. Jalapeño is for inference only. OpenAI has indicated that pre-training — the most performance-intensive work — will continue to run on NVIDIA hardware. The chip is designed to cut the cost and energy of serving models at scale, not to replace NVIDIA across the board. NVIDIA still leads on training flexibility and its CUDA software ecosystem, and OpenAI is expected to keep buying NVIDIA GPUs even as Jalapeño deploys.

Why did OpenAI partner with Broadcom to build the chip?

Designing a high-performance chip requires silicon implementation, networking, and connectivity expertise that OpenAI does not have in-house. Broadcom is one of the few companies that builds custom ASICs and the networking silicon (including its Tomahawk switches) to connect them at data-center scale. Under the partnership first announced in October 2025, OpenAI designs the accelerator architecture, Broadcom handles the silicon and networking, and Celestica builds the boards, racks, and systems. The companies are committing to roughly 10 gigawatts of OpenAI-designed accelerators through 2029.

Has the performance-per-watt claim been verified?

No. OpenAI says early results show "significantly better performance-per-watt than current state-of-the-art alternatives," but that figure comes solely from OpenAI's own testing. No independent benchmark exists yet, and a technical report has only been promised. Until third parties can measure the chip, the efficiency claim should be treated as a vendor claim rather than an established fact.

What does the Qualcomm-Modular deal have to do with this?

On the same day Jalapeño was unveiled, Qualcomm agreed to acquire Modular for about $3.9 billion in an all-stock deal (CNBC, June 24, 2026). Modular builds vendor-neutral software that lets AI models run across different chips without rewriting code, directly targeting NVIDIA's CUDA software lock-in. Together, the two announcements attack NVIDIA on both fronts — Jalapeño on hardware margins, Modular on the software moat. The deal is expected to close in the second half of 2026, subject to regulatory approval.

Which other AI labs are building their own chips?

Most large players with enough inference volume are. Google has run its own TPUs for years, Amazon built Trainium and Inferentia, and Microsoft has its Maia accelerators (Microsoft is also a major buyer of OpenAI's Jalapeño capacity). Anthropic runs heavily on Amazon's Trainium and has backed UK inference-chip startups such as Fractile, and Mistral has said it is exploring its own silicon while still running on NVIDIA. The shared logic: keep NVIDIA for flexible training, build custom chips for predictable, high-volume inference.

Related Articles

Was this review helpful?
Anthony M. — Founder & Lead Reviewer
Anthony M.Verified Builder

We're developers and SaaS builders who use these tools daily in production. Every review comes from hands-on experience building real products — DealPropFirm, ThePlanetIndicator, PropFirmsCodes, and many more. We don't just review tools — we build and ship with them every day.

Written and tested by developers who build with these tools daily.