Cohere Command A: 111B Open-Weights, Runs on Just 2 GPUs

Cohere Command A is a 111-billion-parameter open-weights enterprise language model with a 256K-token context window and multilingual coverage across 23 languages. Its headline claim: it runs on just two GPUs (A100 or H100) while matching or beating GPT-4o and DeepSeek-V3 on agentic enterprise tasks, with roughly 150% higher throughput than Cohere's previous Command R+ 08-2024. Cohere has since expanded it into a family — Command A Reasoning and Command A Vision — positioning the Canadian-American lab as the enterprise-deployment specialist most people forget when they list OpenAI, Anthropic, and Google.

The Lab Everyone Forgets

Ask most people to name the AI labs that matter in 2026 and you will hear the same trio: OpenAI, Anthropic, Google DeepMind. Maybe DeepSeek if they follow the open-weights scene. Cohere rarely makes that list. That is a strategic oversight, not a quality verdict.

Cohere has never tried to win the consumer chatbot race or the frontier-benchmark leaderboard. It built something narrower and, for a specific buyer, more valuable: an enterprise model family engineered around deployment economics, data residency, and multilingual reach. Command A is the clearest expression of that thesis. It is not the smartest model in the world. It is arguably the most deployable serious model for a regulated enterprise that needs to run inference inside its own perimeter.

We have been tracking the enterprise-AI deployment problem closely — the gap between what frontier APIs can do and what a bank, hospital, or defence contractor can actually put into production behind a firewall. Command A is built squarely for that gap. Cohere describes it as its most performant model to date, optimized for tool use, agents, retrieval-augmented generation (RAG), and multilingual workloads.

The original Command A (model ID command-a-03-2025) shipped in 2025; the family has since grown with Command A Reasoning and Command A Vision. We are treating this as an analysis of the full Command A family as it stands in 2026, not a same-day launch report — the announcements have been staggered, and we will date each component honestly below.

What Command A Actually Is

Command A is a dense 111B-parameter model with a 256K-token context window and a maximum output of 8K tokens. It supports 23 languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Chinese, Arabic, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. Cohere positions it for enterprise agents, RAG with citations, and tool use.

The number that matters most for the buyer Cohere is chasing is not a benchmark score — it is "two GPUs." Cohere states Command A requires only two GPUs (A100s or H100s) to run. For context, frontier-class open-weights models routinely demand 8-GPU nodes or multi-node clusters for serving at acceptable latency. Halving or quartering the hardware footprint is the difference between "we can self-host this" and "we will just call an API and hope compliance signs off."

Cohere also reports roughly 150% higher throughput than its predecessor, Command R+ 08-2024. Independent reporting from VentureBeat framed the efficiency story in tokens-per-second terms, citing figures on the order of 73 tokens per second versus GPT-4o around 38 tokens per second on long-context workloads. We could not re-verify that exact throughput pair from Cohere's primary documentation, so we report it as a third-party figure attributed to VentureBeat rather than a measurement of our own. The directional claim — materially faster than a comparable proprietary model at a fraction of the serving hardware — is consistent across Cohere's own materials.

The Deployment Math Is the Product

Here is the strategic read. OpenAI, Anthropic, and Google compete on capability and API convenience. Cohere competes on a different axis entirely: where the weights run and who controls them. Command A is offered through Cohere's API but also through private deployments and inside a hyperscaler VPC. The model being efficient enough to run on two GPUs is what makes the private-deployment story credible rather than aspirational.

For a European bank locked out of certain U.S. frontier APIs by procurement policy, or a defence contractor that cannot send prompts off-premises, "good enough on two GPUs you control" beats "excellent on an API you legally cannot use." That is the whole pitch, and it is a real market.

Open Weights, Enterprise Terms

Command A's weights are openly available, which matters for the self-host crowd. This is not the same as a permissive Apache-2.0 release in the mold of Google Gemma 4 or the Llama-style ecosystem — it is open-weights with enterprise licensing and a clear commercial path through Cohere. For the enterprise buyer, that combination is often preferable to a fully permissive license, because it comes with a vendor of record, support, and indemnification conversations attached.

Two-GPU footprint — the deployment economics that define Command A

Two GPUs: The Economics Nobody Talks About

The AI conversation is obsessed with capability and almost silent on serving cost. For enterprises running models in production, serving economics dominate the total cost of ownership far more than a few points on a benchmark.

Why Two GPUs Changes the Buy Decision

Consider a mid-size enterprise standing up an internal agent platform. With a model that needs an 8-GPU H100 node per replica, every unit of scale is a node. With a model that serves acceptably on two GPUs, the same budget buys roughly four times the concurrency, or the same concurrency at roughly a quarter of the hardware capital and power draw. Multiply that across redundancy, regions, and dev/staging/prod environments and the gap compounds into the millions.

This is the unglamorous reason Command A matters. It is not winning a leaderboard. It is winning the spreadsheet that a VP of Infrastructure actually signs.

The Sovereign Angle

There is a second-order effect. A model small enough to self-host on two GPUs is a model a government or a regulated industry can run inside a sovereign perimeter without negotiating a hyperscaler exception. Cohere's broader strategic positioning — including its work on European sovereign AI — leans directly on this. We covered the structural shift toward sovereign deployment in our analysis of the Cohere and Aleph Alpha merger forming Europe's sovereign AI champion, and Command A is the model layer that makes that thesis operational rather than political.

The same dynamic shows up elsewhere in the market. Mistral has been building cybersecurity-grade models for European banks locked out of U.S. frontier APIs, a pattern we examined in our piece on Mistral's sovereign cybersecurity model for EU banks. Read together, these are not isolated stories — they are the enterprise market re-pricing what "good enough, but mine" is worth.

Command A versus GPT-4o and DeepSeek-V3 — agentic enterprise tasks

Benchmarks: On Par or Better Than GPT-4o

Cohere's headline benchmark claim is specific: Command A is on par with or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks, with significantly greater efficiency. That phrasing is doing precise work. Cohere is not claiming frontier supremacy over the latest reasoning models. It is claiming parity with two strong, widely deployed models on the tasks enterprises actually run — business reasoning, STEM, coding, tool use, and RAG — at a fraction of the serving cost.

Why GPT-4o Is the Right Comparison

It is worth noting that GPT-4o itself was retired from OpenAI's lineup in 2026, a transition we documented in our report on GPT-4o's official retirement and what it means for users and developers. That makes Command A's comparison point a moving target — but it also clarifies the strategy. Cohere is not benchmarking against the absolute frontier. It is benchmarking against the workhorse class of models that enterprises standardized on, and arguing it can replace them inside your own infrastructure.

For teams evaluating the broader proprietary frontier, the relevant reference points today are models like GPT-5.5 and Claude Opus 4.7. Command A does not claim to beat those on raw capability — and our head-to-head on Claude Opus 4.7 versus GPT-5.5 makes clear how high that frontier bar now sits. Command A's argument is orthogonal: it does not need to win the frontier to win the enterprise deployment.

Where DeepSeek-V3 Fits

DeepSeek-V3 is the open-weights efficiency benchmark everyone reaches for, and the lineage has moved fast — we tracked the next generation in our coverage of DeepSeek V4 and its 1M-context architecture. Cohere claiming parity with DeepSeek-V3 on agentic enterprise tasks while running on two GPUs and shipping with enterprise support and a Western vendor of record is a deliberately different value proposition: comparable capability, but with the procurement, residency, and support story that regulated buyers require.

The Honest Caveat

Benchmark parity claims from a model vendor should always be read as directional, not gospel. "On par or better across agentic enterprise tasks" is a curated framing. What we can say with confidence: Command A is a credible, production-grade model in the GPT-4o/DeepSeek-V3 capability class, and its differentiator is not the ceiling of its intelligence but the floor of its deployment cost. For the enterprise buyer, that trade is frequently the right one.

23 languages — Command A's multilingual reach for global enterprises

23 Languages: The Global Enterprise Bet

Command A's 23-language support is not a checkbox feature — it is core to who Cohere is selling to. The languages span Western Europe (French, Spanish, Italian, German, Portuguese, Dutch), Eastern Europe (Polish, Czech, Ukrainian, Romanian, Russian), the Middle East (Arabic, Hebrew, Persian), and major Asian markets (Japanese, Korean, Chinese, Vietnamese, Indonesian, Hindi), plus Turkish and Greek.

Why Multilingual Is a Moat for the Enterprise Buyer

A global bank or manufacturer does not operate in English. It operates in the languages of its customers and regulators across dozens of jurisdictions. A model that handles a customer-service agent in Japanese, a compliance summarizer in German, and a contract-analysis RAG pipeline in Arabic — all from one deployable artifact — removes a procurement nightmare. The alternative is stitching together per-language models or paying frontier-API rates with data leaving the building.

This is where Cohere's positioning is internally consistent. Multilingual breadth plus two-GPU deployability plus open weights plus enterprise licensing is a coherent package aimed at exactly one buyer: the multinational that needs serious AI inside its own walls, in many languages, without an OpenAI or Anthropic procurement exception.

The Limits of 23

Twenty-three languages is broad but not universal. Low-resource languages, many African and South Asian languages, and numerous regional dialects are outside the supported set. For a buyer whose footprint is concentrated in those markets, Command A's multilingual story is thinner than the headline suggests. The honest framing: this is built for the OECD-plus-major-Asia enterprise, not for genuinely global long-tail language coverage.

The Command A family — base, Reasoning, and Vision in the enterprise stack

The Command A Family: Reasoning and Vision

Cohere has not left Command A as a single model. It expanded the line into a family, which tells you the strategy is durable, not a one-off release.

Command A Reasoning

Command A Reasoning (model ID command-a-reasoning-08-2025, introduced in an August 2025 announcement) is Cohere's first reasoning model — designed to "think" before producing output. It keeps the 111B parameter count and 256K context window, supports the same 23 languages, and raises the maximum output to 32K tokens for longer reasoning traces. Its deployment footprint is larger than base Command A: Cohere optimizes it for 4x H100 GPUs for production workloads, with 4x A100 GPUs viable for non-production use. The positioning is "enterprise-grade control for AI agents" — reasoning capability with the same controllability and deployment story enterprises bought the base model for.

Command A Vision

Command A Vision (model ID command-a-vision-07-2025, announced July 31, 2025) is Cohere's first multimodal model. It carries a 128K-token context window, accepts up to 20 images per request, and is built for enterprise visual workloads: document analysis, chart and graph interpretation, OCR, in-image table extraction, and multilingual image processing. Two limitations are explicit and worth noting for builders: tool use is not supported with Command A Vision, and it accepts images as input but does not generate them. The use case is enterprise document intelligence, not creative image generation.

What the Family Signals

A base model, a reasoning model, and a vision model — all sharing the same deployment philosophy and the same 23-language enterprise spine — is a product line, not an experiment. It signals that Cohere intends to be the default for the regulated enterprise that wants a coherent, self-hostable family rather than a grab-bag of frontier APIs from different vendors with different data-handling terms.

How Command A Compares to the Open-Weights Field

The open-weights enterprise tier is suddenly crowded. The differentiator is no longer "is there a capable open model" — there are several — but "which one fits a regulated enterprise's deployment, support, and residency requirements."

The Llama-style permissive ecosystem optimizes for the broadest possible community and tinkerer adoption. Google's Gemma 4 under Apache 2.0 optimizes for frictionless commercial reuse. DeepSeek optimizes for raw efficiency and capability per dollar — and its trajectory toward DeepSeek V4 with hybrid attention and 1M context keeps raising the efficiency bar. Mistral Large 3 optimizes for European frontier credibility with open-weights options.

Cohere's Command A optimizes for none of those exactly. It optimizes for the enterprise procurement and deployment process itself: two-GPU serving, 23-language coverage, RAG-and-tool-use focus, private deployment, and a Western vendor of record. That is a narrower target than "best open model," and for the buyer it is built for, narrower is the point.

Who Should Actually Care About Command A

Be honest about fit. Command A is not the model for a startup chasing the absolute frontier of reasoning or for a hobbyist who wants the most permissive license possible. The clearest fits:

Regulated enterprises — banks, insurers, healthcare, that need inference inside their own perimeter for compliance reasons.
Multinationals — that need consistent quality across many languages from one deployable artifact.
Public sector and defence — that cannot send prompts to a U.S. hyperscaler API as a matter of policy.
Infrastructure-cost-sensitive teams — running agents at scale where two-GPU serving versus eight-GPU serving is a budget-defining difference.
Teams that want a vendor of record — not just weights on a hub, but support, licensing, and indemnification conversations.

If your priority is the smartest possible model regardless of where it runs, the proprietary frontier still wins. Command A's argument is that for a very large class of enterprise workloads, "smartest" is not the binding constraint — "deployable, multilingual, and mine" is.

The Strategic Read: Why the Forgotten Lab Is Well-Positioned

The market narrative fixates on the frontier race because the frontier race is the exciting story. But the largest pool of enterprise AI spend over the next few years is not "give me the smartest model." It is "give me a production-grade model I can deploy, govern, and afford inside my own environment, in my customers' languages, with a vendor I can hold accountable."

That is precisely the market Cohere built Command A for, and it is a market the frontier-obsessed trio is structurally less optimized to serve. OpenAI, Anthropic, and Google are API-first companies with deployment as an accommodation. Cohere is a deployment-first company with the API as the convenient front door. For the regulated enterprise buyer, that difference in posture is not a footnote — it is the entire decision.

None of this makes Cohere a frontier lab, and we are not arguing it is. The strategic point is narrower and more durable: in the part of the enterprise market defined by deployment, residency, and multilingual reach rather than peak capability, the lab most people forget is one of the best-positioned vendors in the industry. Command A — and its Reasoning and Vision siblings — is the proof of that thesis, not a marketing slogan around it.

Our Take

Command A is the most interesting model almost nobody is talking about, and the reason it gets overlooked is also the reason it matters. It does not win the headline benchmark. It wins the deployment spreadsheet. For the enterprise buyer drowning in the gap between frontier API capability and what they can legally and economically run in production, a 111B open-weights model that serves on two GPUs, covers 23 languages, and ships with a Western vendor of record is not a compromise — it is the actual answer to the question they are being asked.

Our honest verdict: do not file Cohere under "also-ran behind the trio." File it under "the enterprise-deployment specialist whose entire product line is engineered for the workloads where the frontier labs are structurally weakest." On capability, Command A is in the GPT-4o/DeepSeek-V3 class, not the absolute frontier — and Cohere does not claim otherwise. On deployability for a regulated, multilingual enterprise, it is one of the strongest options on the market. Those are two different competitions, and Cohere is quietly winning the one with the bigger budget. We will be watching whether the Reasoning and Vision additions deepen that moat — and noting, for our own roadmap, that Command A's public model IDs and pricing make it a strong candidate for a full standalone tool review.

Frequently Asked Questions

What is Cohere Command A?

Cohere Command A is a 111-billion-parameter open-weights enterprise language model with a 256K-token context window and support for 23 languages. It is optimized for tool use, agents, retrieval-augmented generation (RAG), and multilingual workloads, and Cohere states it requires only two GPUs (A100 or H100) to run — making private, self-hosted deployment economically realistic for enterprises.

How many GPUs does Command A need?

Cohere states that the base Command A requires only two GPUs — A100s or H100s — to run. This is a deliberate efficiency target: many comparable frontier-class models need 8-GPU nodes or multi-node clusters. Command A Reasoning has a larger footprint, optimized for 4x H100 GPUs for production workloads (4x A100 GPUs for non-production use).

Is Command A better than GPT-4o?

Cohere states Command A is on par with or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks, with significantly greater efficiency. That is a parity-plus-efficiency claim on enterprise workloads (business reasoning, STEM, coding, tool use, RAG), not a claim of frontier supremacy over the latest reasoning models. GPT-4o was itself retired from OpenAI's lineup in 2026, so the practical takeaway is that Command A targets the deployable workhorse class rather than the absolute frontier.

How fast is Command A compared to GPT-4o?

Cohere reports roughly 150% higher throughput than its predecessor, Command R+ 08-2024. Third-party reporting from VentureBeat framed the efficiency advantage in tokens-per-second terms, citing figures on the order of 73 tokens per second for Command A versus around 38 tokens per second for GPT-4o on long-context workloads. We report the tokens-per-second pair as a VentureBeat-attributed figure, not a measurement of our own; the directional efficiency claim is consistent across Cohere's primary materials.

What languages does Command A support?

Command A supports 23 languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Chinese, Arabic, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. The coverage is built for OECD-plus-major-Asia multinational enterprises; low-resource and many African and South Asian languages are outside the supported set.

Is Command A open-weights?

Yes. Command A's weights are openly available, which enables self-hosting. This is open-weights with enterprise licensing and a clear commercial path through Cohere, rather than a fully permissive Apache-2.0-style release like Google Gemma 4. For regulated enterprise buyers, the vendor-of-record model is often preferable to a fully permissive license because it comes with support and indemnification conversations attached.

What is Command A Reasoning?

Command A Reasoning (model ID command-a-reasoning-08-2025, introduced in an August 2025 announcement) is Cohere's first reasoning model, designed to "think" before generating output. It keeps the 111B parameters and 256K context window, supports the same 23 languages, raises maximum output to 32K tokens, and is optimized for 4x H100 GPUs in production. It is positioned as enterprise-grade control for AI agents.

What is Command A Vision?

Command A Vision (model ID command-a-vision-07-2025, announced July 31, 2025) is Cohere's first multimodal model. It has a 128K-token context window, accepts up to 20 images per request, and targets enterprise visual workloads: document analysis, chart and graph interpretation, OCR, and in-image table extraction. Two limitations matter for builders: tool use is not supported, and it accepts images as input but does not generate them.

How does Command A compare to DeepSeek-V3 and Mistral?

Cohere claims parity with DeepSeek-V3 on agentic enterprise tasks while running on two GPUs with enterprise support and a Western vendor of record. Versus Mistral Large 3, Command A trades some frontier ambition for a tighter focus on deployment economics, 23-language coverage, and regulated-industry procurement fit. The differentiator across all of them is not raw capability but which model best matches an enterprise's deployment, residency, and support requirements.

Who should use Command A?

Command A fits regulated enterprises (banks, insurers, healthcare) that need inference inside their own perimeter, multinationals needing consistent multilingual quality from one artifact, public sector and defence organizations that cannot use U.S. hyperscaler APIs by policy, and infrastructure-cost-sensitive teams where two-GPU serving versus eight-GPU serving is budget-defining. Teams chasing the absolute smartest model regardless of deployment should still look to the proprietary frontier.

Why is Cohere overlooked compared to OpenAI, Anthropic, and Google?

Cohere never competed in the consumer chatbot or frontier-benchmark race that drives most coverage. It built an enterprise-deployment specialist product line — efficient serving, multilingual reach, private deployment, RAG and tool-use focus. That makes it strategically well-positioned for the largest pool of enterprise AI spend (deployable, governable, affordable in-house models) even though it is structurally underexposed in capability-race headlines.

Can Command A be deployed privately or on-premises?

Yes. Command A is available through Cohere's API but also through private deployments and inside a hyperscaler VPC. Its two-GPU serving efficiency is precisely what makes the private and sovereign-deployment story credible rather than aspirational — it is the technical fact that turns "self-host a serious model" from impractical into a budget line a VP of Infrastructure will sign.

Cohere Command A: 111B Open-Weights, 256K Context, Runs on 2 GPUs