Google Runs an LLM On a Chip With No Cloud

The Coralboard is a developer edge-AI board, unveiled by Synaptics and Google Research at Google I/O 2026, that runs Google's Gemma 3 270M language model directly on-device with hardware acceleration and no cloud connection required. Announced in a Synaptics press release dated May 19, 2026 and shown at the Gemma pavilion in Mountain View, the board pairs a Synaptics Astra SL2619 SoC (part of the SL2610 family) with a 1 TOPS Coral NPU and 2GB of DDR4 memory, and it is the clearest signal yet that a genuinely open large language model can run on a fingertip-sized accelerator instead of a data center.

The headline is deceptively simple: a small board runs a small Gemma model locally. The implications are not simple at all. For most of the generative AI era, "running a model" has meant sending text to a remote server, waiting for a response, and trusting that the data, the latency, and the bill all behave. The Coralboard collapses that loop. The model lives on the silicon in your hand, the inference happens there, and nothing about your prompt leaves the device. That is a different shape of AI — one where privacy is a default rather than a promise, and where the network is optional rather than load-bearing.

This article breaks down exactly what was announced, what the hardware can and cannot do, why a 270-million-parameter model is the interesting size rather than a disappointing one, and what the Coralboard tells us about where edge AI is heading in 2026. We did not test the board ourselves — units shipped only to Google I/O 2026 attendees in a limited edition — so this is an analysis of the official specifications and the live demo, not a hands-on review.

What Synaptics and Google Research actually announced

According to the official Synaptics press release, co-announced with Google Research and dated May 19, 2026, the two companies used Google I/O 2026 to showcase "immersive edge AI experiences powered by the Coralboard." The board was on display at the Gemma pavilion at the Shoreline Amphitheatre venue in Mountain View, and the central claim is unambiguous: it runs Google's Gemma 3 270M model on-device, with hardware acceleration, and with no cloud connection required.

That last clause is the entire story. Plenty of boards can call a cloud API. The Coralboard does not need to. The Gemma 3 270M weights sit on the device, the inference path runs through a dedicated neural accelerator, and the round-trip to a server simply does not exist. The press release frames the partnership as a collaboration between Synaptics — a company with a long history in human-interface and embedded silicon — and Google Research, the team responsible for both the Gemma model family and the Coral NPU architecture underneath the board.

Two facts about the rollout matter for anyone tempted to buy one today. First, the units shown at Google I/O 2026 were a limited edition handed out to event attendees only. Second, general availability and pricing are to be announced later in 2026. There is no public price, no order page, and no confirmed ship date for the broader developer community as of this writing. Anyone quoting you a Coralboard price right now is guessing.

The Coralboard hardware, decoded

The Coralboard is built around the Synaptics Astra SL2619 SoC, which is part of the broader SL2610 family. The application processing comes from two Arm Cortex-A55 cores running at 2GHz, paired with a single Arm Cortex-M52 core at 200MHz that handles low-power and real-time duties. That A55-plus-M52 split is a common embedded pattern: the big cores carry the operating system and heavier workloads, while the tiny M-class core stays awake for always-on sensing without draining power.

The part that makes the board interesting for AI is the accelerator. The Coralboard carries a 1 TOPS Coral NPU built on what Synaptics calls the Torq NPU subsystem. One trillion operations per second is modest next to a desktop GPU, but it is purpose-built and energy-frugal, and crucially it is designed to handle both convolutional neural networks and transformer workloads. That dual competence is the unlock. Older edge accelerators were tuned almost exclusively for the convolutional vision models of the 2010s; a transformer-capable NPU is what lets a board like this run a language model such as Gemma 3 270M rather than only image classifiers.

Memory rounds out the picture. The board ships with 2GB of DDR4 RAM, with a 1GB variant offered as an option. Two gigabytes is comfortably enough headroom for a sub-billion-parameter model in a quantized format plus the runtime, an operating system, and an application — though it is the kind of budget that rewards careful memory management rather than the casual abundance developers enjoy on a laptop.

Exploded view of an edge AI board showing dual application cores, a real-time core, a 1 TOPS NPU and 2GB DDR4 memory — The Coralboard pairs two Cortex-A55 cores, a low-power Cortex-M52 core, a 1 TOPS Coral NPU and 2GB DDR4 — enough to keep Gemma 3 270M entirely on-device.

It helps to set expectations honestly. This is not a board for hosting a 70-billion-parameter model, and it is not trying to be. It is a board for running a small, capable model with low latency and zero network dependency. The right mental comparison is not a cloud GPU instance but a Raspberry Pi with an accelerator that actually understands transformers — a developer platform for building on-device intelligence rather than a server replacement.

The Coral NPU: a RISC-V accelerator co-designed by Google

The accelerator at the heart of the Coralboard did not appear at I/O 2026 out of nowhere. The Coral NPU itself was announced on October 15, 2025, as a RISC-V-based neural processing architecture co-designed by Google Research and Google DeepMind. RISC-V is an open instruction-set architecture, and choosing it for an NPU is a deliberate bet on an open, extensible foundation rather than a proprietary, locked design.

The co-design between a hardware research group and a model research group is the detail worth lingering on. When the same organization shapes both the silicon architecture and the models that will run on it, the two can be tuned for each other. Gemma is Google's open model family; the Coral NPU is Google's open accelerator architecture; and the Coralboard is the first widely shown product where they meet in a single developer board, with Synaptics providing the SoC integration and the surrounding system.

For developers, an open RISC-V base matters for a practical reason beyond ideology: it lowers the odds of being trapped behind a single vendor's closed toolchain. The instruction set can be inspected, extended, and implemented by multiple silicon partners, which is exactly the kind of foundation that lets an ecosystem grow around a board rather than around one company's roadmap.

An open RISC-V neural accelerator die handling both transformer and vision workloads — The Coral NPU is a RISC-V-based accelerator co-designed by Google Research and DeepMind — built to handle transformers, not just convolutional vision models.

Torq: the open-source MLIR toolchain that makes it work

Hardware is only half of an edge AI platform; the software path from a trained model to running silicon is the other half, and historically it has been where edge AI projects go to die. Synaptics addresses this with Torq, an open-source toolchain built on MLIR — the Multi-Level Intermediate Representation compiler infrastructure that has become the connective tissue of modern machine-learning compilation.

The headline for developers is breadth of input. Torq supports models authored in PyTorch, JAX, and LiteRT, which covers the overwhelming majority of how models are actually built and exported today. Rather than forcing teams to rewrite a model in a bespoke format, the toolchain ingests mainstream representations and lowers them, through MLIR, toward the Coral NPU's instruction set. That is the difference between a board a researcher demos once and a board a product team can actually ship on.

MLIR matters here precisely because it is not a single-purpose converter. It is a layered compiler framework designed to represent computation at multiple levels of abstraction and progressively optimize and specialize it for a target. Building the Torq toolchain on MLIR rather than a one-off proprietary compiler is the same open-foundation philosophy that drove the RISC-V choice for the NPU — and it is the kind of decision that determines whether a platform attracts a community or quietly fades.

The Torq MLIR toolchain compiling PyTorch, JAX and LiteRT models down to an edge NPU — The open-source Torq toolchain, built on MLIR, ingests PyTorch, JAX and LiteRT models and lowers them to the Coral NPU — no bespoke rewrite required.

The Jellectronica demo: jellyfish, YOLOv8, and generative music

The most memorable thing at the Gemma pavilion was not a benchmark chart. It was a piece called "Jellectronica." A YOLOv8 object-detection model ran on the Coralboard and tracked live jellyfish from a feed connected to the Monterey Bay Aquarium, and the positions and movements of those jellyfish drove a generative-music performance in real time. The drifting of the animals became the score.

It is easy to file an art demo under "fun but irrelevant," and that would be a mistake. The demo is doing real technical work in disguise. Running YOLOv8 — a transformer-and-convolution object detector — on the board in real time, with low enough latency to drive a live performance, is a concrete proof that the 1 TOPS Coral NPU can handle a genuine streaming vision workload, not just a static classification. It is the kind of task where any cloud round-trip would introduce latency that the human ear would catch instantly. On-device inference is what makes the responsiveness possible.

The choice of demo also signals the intended audience. This is a board for makers, researchers, installation artists, and product engineers who want intelligence embedded in a physical thing that reacts to the world around it — a camera, a sensor, a microphone — without a server in the loop. Jellyfish driving music is whimsical; the underlying capability is not.

Split comparison of on-device edge inference versus a cloud round-trip showing latency and privacy differences — On-device inference removes the network round-trip entirely: no latency tax, no data leaving the device, no dependency on connectivity.

Why a 270M model is the right size, not a small one

The instinct, after a year of headlines about trillion-parameter frontier models, is to read "270 million parameters" as tiny to the point of being a toy. That instinct is wrong for this use case, and understanding why is the key to understanding the whole product.

Gemma 3 270M is one of the smallest members of Google's open Gemma 3 family, and small is the entire point on an edge device. A model that fits in 2GB of memory, runs on a 1 TOPS accelerator, and produces useful output with low latency is worth more on a board like this than a brilliant model that cannot fit at all. The frontier of edge AI is not the largest model; it is the most capable model that fits the constraints. A 270M model can handle classification, structured extraction, intent parsing, on-device assistants for narrow domains, and a surprising amount of practical language work when it is the right tool for a scoped job.

There is also a quieter strategic point. Google's open Gemma family — which we covered when Gemma 4 landed under Apache 2.0 — gives the company a model lineup that spans from frontier-scale weights down to a 270M variant that runs on edge silicon. The Coralboard is the bottom rung of that ladder made tangible. It is the proof that "open model" and "runs on a tiny accelerator with no cloud" are now the same sentence, not two separate ambitions. Open weights cut both ways, of course — the same openness that lets Gemma run on a board also lets tools strip guardrails off open models — but for an edge developer the trade is overwhelmingly worth it.

Privacy and the case for the local LLM

The phrase doing the heavy lifting in the announcement is "no cloud connection required." For a class of applications, that is not a convenience — it is the requirement. A medical device, an industrial sensor, a consumer product handling personal audio or video, a system deployed in a region with strict data-residency rules: in all of these, the safest data is the data that never leaves the device. On-device inference makes that the architecture rather than a feature toggle.

This is the same thesis that has been building across the industry. We have argued that local LLMs disrupt cloud pricing because the marginal cost of an on-device inference is effectively zero once the hardware is bought, and that Apple's portable MLX work on M5 silicon points the same direction from the consumer-laptop end. The Coralboard pushes the thesis down to a board you can embed in a product, where the economics and the privacy story compound: no per-query cloud bill, no data egress, no dependency on a network that may be slow, metered, or absent.

It is worth being precise about the limits. On-device privacy protects the inference; it does not magically make a small model as capable as a frontier one, and it does not remove the responsibility to design the application well. But for the large and growing set of tasks where a scoped model is sufficient, the local LLM turns privacy from a feature you market into a property you no longer have to think about.

Can you really run an LLM on an edge board?

This is the question the Coralboard exists to answer, so it deserves a direct response: yes, with caveats that are really just honesty about scope. You can run a small, quantized language model such as Gemma 3 270M on an edge board with a transformer-capable NPU, and you can do it with low latency and no cloud. What you cannot do is run a 70-billion-parameter model on 2GB of memory and a 1 TOPS accelerator, and nobody serious is claiming otherwise.

The reason it works now, when it did not a few years ago, is a convergence of three things. First, the models got smaller and better — a 270M model in 2026 is far more useful than a 270M model in 2022. Second, the accelerators learned transformers — the Coral NPU handles attention-style workloads, not just convolutions. Third, the toolchains matured — an MLIR-based path like Torq can take a PyTorch or JAX model and actually compile it down to the silicon without a research team's worth of manual effort. Remove any one of those and the board does not work. Together, they make a local LLM on a fingertip-sized board a real product instead of a conference slide.

The practical takeaway for developers evaluating edge AI: stop asking "can the board run a big model" and start asking "what is the smallest model that solves my actual problem, and can it fit here." That reframing is the one the Coralboard rewards, and it is the one that most edge projects should have been using all along.

Where the Coralboard sits in the edge AI landscape

The Coralboard does not enter an empty field. The Raspberry Pi ecosystem, with and without add-on accelerators, has been the default maker platform for years, and Google's own earlier Coral Edge TPU products carved out a niche for on-device vision. What is new is the explicit transformer focus and the tight pairing with an open model family.

The earlier Coral Edge TPU devices were built for the convolutional era; they accelerated vision models beautifully but were never designed with language models in mind. A Raspberry Pi, meanwhile, is a wonderfully general computer but without a dedicated transformer accelerator it leans on its CPU for inference, which limits how large or fast a model it can run. The Coralboard's distinguishing move is to put a transformer-capable NPU, an open RISC-V architecture, an MLIR toolchain, and a first-party open model on the same board and treat language inference as a first-class workload rather than a stretch goal.

Against that backdrop, the board reads less like a single product launch and more like a statement of direction. It says the next phase of edge AI is not just smarter cameras; it is small language and multimodal models embedded in physical devices, running locally, on open foundations. Whether the Coralboard specifically becomes the platform of record is a question for general availability and pricing — both still pending — but the category it represents is clearly where Google Research and Synaptics are placing their chips.

The Google I/O 2026 context

The Coralboard did not arrive in isolation. It was one thread in a Google I/O 2026 that leaned heavily on on-device and efficient AI, alongside the cloud-scale launches like the new Gemini 3.5 Flash model that grabbed most of the keynote oxygen. Reading the two together is instructive: at the top of the stack, Google is pushing frontier and agentic capability in the cloud; at the bottom, it is pushing genuinely open models onto silicon you can hold.

That barbell strategy — frontier in the cloud, open and tiny at the edge, with the Gemma family bridging the two — is the most coherent way to read Google's 2026 positioning. The Coralboard is the edge anchor of that strategy made physical. It is the answer to the question of what the smallest, most private, most local end of Google's AI ladder actually looks like when you can pick it up.

What we still do not know

Honesty requires a clear list of open questions. We do not know the price — Synaptics says pricing will be announced later in 2026, and we will not invent a number. We do not know the general-availability date beyond "later in 2026." We do not have independent, third-party benchmarks for Gemma 3 270M inference on the 1 TOPS Coral NPU; the live Jellectronica demo is strong qualitative evidence that real-time vision works, but tokens-per-second figures for language inference on this exact board are not yet public. And we do not yet know how broad the third-party silicon ecosystem around the Coral NPU and Torq will become, which is the real test of whether the open foundations pay off.

None of those unknowns undercut the significance of what was shown. A widely visible, well-supported developer board that runs an open Google model on-device with no cloud is a milestone regardless of the eventual sticker price. But they are the right questions to keep asking as the board moves from a Google I/O giveaway to a product anyone can buy.

The takeaway

The Coralboard is the most concrete evidence yet that the local-LLM era has reached the edge. A Synaptics Astra SL2619 SoC, a 1 TOPS Coral NPU built on an open RISC-V architecture, 2GB of DDR4, an MLIR-based Torq toolchain that ingests PyTorch, JAX, and LiteRT models, and Google's open Gemma 3 270M model running on-device with no cloud — that is a complete, coherent edge AI platform, not a tech demo dressed up as one. The Jellectronica jellyfish performance proved the real-time chops; the open foundations suggest staying power; and the "no cloud connection required" claim is the part that will matter for years.

For developers and product teams, the message is simple. The question is no longer whether you can run a model on a small board. It is which small model solves your problem, and whether you want it living on the device — private, low-latency, and network-optional — instead of in someone else's data center. The Coralboard is Google Research and Synaptics betting that, for a fast-growing slice of applications, the answer is on the device. On the evidence shown at Google I/O 2026, it is a hard bet to argue with — once we see the price.

Frequently asked questions

What is the Coralboard?

The Coralboard is a developer edge-AI board unveiled by Synaptics and Google Research at Google I/O 2026, in a Synaptics press release dated May 19, 2026. It runs Google's Gemma 3 270M language model directly on-device with hardware acceleration and no cloud connection required. It pairs a Synaptics Astra SL2619 SoC (part of the SL2610 family) with a 1 TOPS Coral NPU and 2GB of DDR4 memory.

Can the Coralboard really run a large language model without the cloud?

Yes, for the right size of model. The Coralboard runs Google's Gemma 3 270M on-device with hardware acceleration and no cloud connection required, which is the headline claim in the official Synaptics announcement. It cannot run a 70-billion-parameter model on 2GB of memory and a 1 TOPS accelerator, and nobody is claiming it can. The point is that a small, capable model now runs locally with low latency and zero network dependency.

What hardware is inside the Coralboard?

The board is built around the Synaptics Astra SL2619 SoC (part of the SL2610 family): two Arm Cortex-A55 cores at 2GHz plus one Arm Cortex-M52 core at 200MHz. The AI accelerator is a 1 TOPS Coral NPU built on the Synaptics Torq NPU subsystem, which handles both convolutional neural networks and transformer workloads. It ships with 2GB of DDR4 RAM, with a 1GB variant offered as an option.

What is the Coral NPU?

The Coral NPU is a RISC-V-based neural processing architecture co-designed by Google Research and Google DeepMind. It was announced on October 15, 2025, ahead of the Coralboard. RISC-V is an open instruction-set architecture, so choosing it for the NPU is a deliberate bet on an open, extensible foundation rather than a proprietary, locked design. The Coral NPU is what lets the board accelerate transformer workloads such as Gemma 3 270M, not just older vision models.

Why does the Coralboard use a 270-million-parameter model instead of a larger one?

Because small is the point on an edge device. Gemma 3 270M is one of the smallest members of Google's open Gemma 3 family, and a model that fits in 2GB of memory, runs on a 1 TOPS accelerator, and produces useful output with low latency is worth more on this board than a brilliant model that cannot fit at all. The frontier of edge AI is the most capable model that fits the constraints, not the largest model overall.

What was the Jellectronica demo at Google I/O 2026?

Jellectronica was the Coralboard demo at the Gemma pavilion. A YOLOv8 object-detection model ran on the board and tracked live jellyfish from a feed connected to the Monterey Bay Aquarium, and the animals' movements drove a generative-music performance in real time. It is a concrete proof that the 1 TOPS Coral NPU can handle a genuine streaming vision workload with low enough latency to drive a live performance, where any cloud round-trip would add audible delay.

What is the Torq toolchain?

Torq is Synaptics' open-source software toolchain for the Coral NPU, built on MLIR (Multi-Level Intermediate Representation), the layered compiler infrastructure used across modern machine learning. Torq supports models authored in PyTorch, JAX, and LiteRT, so developers can compile mainstream model formats down to the Coral NPU without rewriting them in a bespoke format. Building it on MLIR rather than a proprietary one-off compiler mirrors the open-foundation choice behind the RISC-V NPU.

How is the Coralboard different from a Raspberry Pi or a Coral Edge TPU?

A Raspberry Pi is a wonderfully general computer, but without a dedicated transformer accelerator it leans on its CPU for inference, which limits model size and speed. The earlier Coral Edge TPU devices were built for the convolutional vision era and were never designed for language models. The Coralboard's distinguishing move is to combine a transformer-capable NPU, an open RISC-V architecture, an MLIR toolchain, and a first-party open model on one board, treating language inference as a first-class workload.

How much does the Coralboard cost and when can I buy one?

There is no public price yet. The units shown at Google I/O 2026 were a limited edition handed out to event attendees only, and Synaptics says general availability and pricing will be announced later in 2026. As of this writing there is no order page and no confirmed broad ship date, so any specific price you see quoted is a guess rather than an official figure.

Why does on-device inference matter for privacy?

Because the safest data is the data that never leaves the device. When inference runs on the Coralboard with no cloud connection, the prompt and the input data — audio, video, sensor readings — stay on the silicon. For medical devices, industrial sensors, consumer products handling personal data, or deployments under strict data-residency rules, on-device inference makes privacy the architecture rather than a feature toggle, and it removes any per-query cloud bill and data egress.

What does LiteRT support mean for developers?

LiteRT is the runtime format for lightweight, on-device models, and Torq's support for it — alongside PyTorch and JAX — means developers can bring models from the most common authoring and deployment paths and target the Coral NPU without a rewrite. In practice it lowers the barrier between a model trained on a workstation and a model running on the board, which is the difference between a one-off research demo and something a product team can ship.

Is the Coralboard a server replacement for AI workloads?

No. The Coralboard is a developer platform for on-device intelligence, not a replacement for cloud GPUs running large models. The right comparison is a Raspberry Pi with an accelerator that understands transformers, not a data-center instance. Its value is low-latency, private, network-optional inference for small models like Gemma 3 270M — the growing set of tasks where a scoped model is sufficient — rather than hosting frontier-scale models.

Google Runs Gemma 3 270M On-Device on the Coralboard — No Cloud Required