MiniMax M3: Open-Weight Coding Frontier, 1M Context, MSA

MiniMax M3 is a large language model launched by Chinese AI company MiniMax on June 1, 2026, positioned as the first model to combine frontier-level coding, a context window of up to one million tokens, and native multimodal input in a single system, built on a new architecture called MiniMax Sparse Attention (MSA). MiniMax reports that MSA decodes about 15.6x faster and prefills about 9.7x faster than its previous M2 generation at a one-million-token context. At launch, M3 is reachable only through the MiniMax API and the MiniMax Agent app; the open weights and a technical report are promised within roughly ten days, so the weights are announced but not yet downloadable. All headline benchmark numbers are self-reported by MiniMax and await independent verification.

Editorial note: This is an analytical explainer by Anthony Martinez (CEO & Founder, ThePlanetTools.ai). ThePlanetTools.ai has no affiliation with MiniMax and earns nothing from this piece. There are no affiliate links here. We have not independently tested MiniMax M3; this article explains what MiniMax has announced, separates vendor claims from confirmed facts, and flags what remains unverified.

On June 1, 2026, MiniMax did something that the AI industry has spent two years insisting was a trade-off: it announced a single model that claims to be good at frontier coding, handle a one-million-token context window, and accept text, image, and video natively — all at once. The pitch, straight from MiniMax’s launch blog, is that M3 is the first and only open-weight model to fold all three of those capabilities into one system. That is a bold sentence, and most of this article is about which parts of it are confirmed, which are promised, and which are simply marketing until somebody outside MiniMax can check.

The short version: the model is real and you can call it today, the efficiency architecture is genuinely interesting, the benchmark numbers are all vendor self-reported, the open weights are promised rather than shipped, and the pricing depends entirely on which source you read. If you only remember one thing, remember that "open-weight" in MiniMax’s headline is a promise for roughly ten days from now, not a download link you have today.

What MiniMax M3 actually is

MiniMax M3 is the latest flagship from MiniMax, a Shanghai-based AI company that has been shipping increasingly capable models. It is a distinct release from the earlier M2 and from M2.7, the self-evolving model that optimized itself across more than a hundred autonomous rounds. M3 is not an incremental point release; MiniMax is presenting it as a new architectural generation.

The headline framing from MiniMax is that M3 is "the first and only open-weight model" to combine three things that usually live in separate products:

Frontier coding — performance on software-engineering and agentic-coding benchmarks that MiniMax says rivals or beats leading closed models.
One-million-token context — a window of up to 1M tokens, with a minimum guaranteed 512K.
Native multimodality — text, image, and video as input, text as output, plus the ability to operate a desktop computer.

That combination is the entire story. Western labs have shipped strong coding models, long-context models, and multimodal models, but the marketing claim here is that you no longer have to choose. Whether that holds up under independent testing is the open question, and MiniMax has not made it easy to check yet, because the two things that would let outsiders verify — the open weights and the technical report — are both still pending at launch.

The "open-weight" claim needs a giant asterisk

This is the part that deserves the most care, because it is the part most likely to be repeated incorrectly. MiniMax’s headline says "open-weight." At launch, M3 is available through exactly two channels: the MiniMax API and the MiniMax Agent product. You cannot download the weights. You cannot self-host. The model is, for the moment, functionally a closed API product with an open-weight promise attached.

MiniMax has said the open weights and a full technical report will arrive within roughly ten days of launch. If that happens, the "open-weight" framing becomes accurate and the model joins the genuinely downloadable tier alongside releases like DeepSeek V4. Until then, the correct way to describe M3 is "open weights announced, not yet available." I am being pedantic about this on purpose, because the gap between "open weight" and "open weight promised" is exactly the gap that AI marketing loves to blur, and readers deserve the precise version.

There is also a track record question. Plenty of labs have promised open weights "soon" and then delayed, narrowed the license, or quietly shipped a smaller variant than the one benchmarked. None of that has happened here — it is day one — but it is the reason the promise is not the same as delivery. The model is worth taking seriously; the open-weight label is worth holding to its own deadline.

MSA: the efficiency engine behind the claims

The MiniMax Sparse Attention architecture decoding much faster and prefilling much faster than the previous generation at one million tokens of context — MiniMax says its MSA architecture decodes about 15.6x faster and prefills about 9.7x faster than M2 at a one-million-token context.

The architecture MiniMax is leaning on is called MSA — MiniMax Sparse Attention. The idea behind any sparse-attention design is straightforward even if the implementation is not: instead of having every token attend to every other token (the dense-attention default that makes long context brutally expensive), the model attends to a carefully selected subset. Done well, this slashes the compute and memory cost that normally grows quadratically with input length.

MiniMax reports that, at a one-million-token context, MSA delivers roughly 15.6x faster decoding and roughly 9.7x faster prefill compared to its previous M2 architecture. Those two phases matter for different reasons. Prefill is the up-front cost of ingesting your prompt — the bottleneck when you feed a model an entire codebase or a long video transcript. Decode is the per-token generation cost — what determines how fast the model streams its answer. A model that improves both at the extreme end of context length is attacking the exact economics that make million-token windows painful to run.

This is the same broad research direction that other Chinese labs have been chasing hard. We recently covered an Alibaba and Nanjing University method that converted full-attention Qwen3 into sparse attention and reported a 9.36x prefill speedup at one million tokens. The pattern is consistent: labs working under compute constraints are pouring energy into making long context cheap, because efficiency is a lever they can pull without access to the largest training clusters. MSA is MiniMax’s entry in that race.

The caveat is unavoidable: MiniMax has not yet published the technical report. So the 15.6x and 9.7x figures are real claims from the vendor, but the mechanism — how MSA selects which tokens to attend to, what it trades away, how accuracy holds at the tail — is not something anyone outside MiniMax can inspect yet. When the report lands, that is the document to read closely.

The benchmarks are strong — and entirely self-reported

Vendor-reported coding and agentic benchmark scores for MiniMax M3 compared against frontier closed models, flagged as self-reported and awaiting independent verification — All of M3’s headline benchmark numbers are vendor self-reported. The community plans to re-test them independently.

Here is the full set of numbers MiniMax published on its launch blog. Every one of these is vendor self-reported. I am listing them because they are the actual claims, not because they are confirmed:

SWE-Bench Pro: 59.0% — the headline coding number, on a harder variant of the standard software-engineering benchmark.
Terminal-Bench 2.1: 66.0% — agentic terminal tasks. Note the version: 2.1, not 2.0 (more on that below).
MCP Atlas: 74.2% — tool-use and Model Context Protocol task performance.
SWE-fficiency: 34.8% — efficiency-aware software engineering.
KernelBench Hard: 28.8% — low-level GPU kernel generation, a genuinely hard category.
OSWorld-Verified: 70.06% — computer-use / desktop-operating tasks.
Video-MME: 84.6 — multimodal video understanding, evaluated across 512 frames.
BrowseComp: 83.5 — web-browsing agentic tasks, which MiniMax cites against 79.3 for Claude Opus 4.7.

Taken at face value, that is a frontier-class scorecard, especially the coding and agentic numbers. But "taken at face value" is doing heavy lifting. Vendor benchmarks are run by the vendor, on the vendor’s harness, with the vendor’s prompts and the vendor’s choice of which competitor scores to cite. None of that makes the numbers fake; it makes them unverified. The community has already signaled it plans to re-run M3 on neutral, independent harnesses, and that is the result that will actually matter.

The Terminal-Bench version trap

One detail is easy to miss and easy to abuse: MiniMax reports Terminal-Bench 2.1, scoring 66.0%. Other models you may see quoted are scored on Terminal-Bench 2.0. These are not the same benchmark, and a score on 2.1 is not directly comparable to a score on 2.0. Benchmark versions change task sets, scoring, and difficulty; comparing across versions is apples to oranges.

This is exactly the kind of thing that gets flattened in a launch-day headline — "M3 scores 66 on Terminal-Bench, beating model X at 62" — when the two numbers are on different versions. If you see M3’s Terminal-Bench figure cited against another model, check that both are on 2.1 before you believe the comparison. I am flagging it because it is a real, legitimate-looking trap, not a knock on MiniMax for using the newer version.

The claim against GPT-5.5, Gemini 3.1 Pro, and Opus 4.7

MiniMax’s sharpest marketing line is that M3 surpasses GPT-5.5 and Gemini 3.1 Pro on coding (pointing at SWE-Bench Pro) and approaches or slightly exceeds Claude Opus 4.7 on some agentic measures, citing the BrowseComp 83.5 versus 79.3 figure. Read that as a vendor claim, full stop. MiniMax reports it; nobody independent has confirmed it.

It is worth putting in context. GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro are the three closed frontier models that have defined the top of the coding leaderboard for most of 2026. For a model that promises open weights to even credibly claim parity with that tier — on the vendor’s own numbers — is a meaningful escalation, regardless of whether the exact ranking survives independent testing. The history of the past two years is that Chinese open-weight efforts keep narrowing the gap faster than the closed labs expected. The honest read is: plausible direction, unproven specifics. If independent harnesses confirm even rough parity at a fraction of the price, that is the actual story; if they don’t, M3 is still a strong model that overclaimed at launch. Either way, wait for the re-tests.

Native multimodal and computer control

The third leg of M3’s pitch is that it is natively multimodal rather than a text model with vision bolted on. MiniMax says M3 takes text, image, and video as input and produces text as output, and it backs the video claim with a Video-MME score of 84.6 evaluated across 512 frames — a non-trivial number of frames, which suggests genuine long-video reasoning rather than single-frame captioning.

On top of that, MiniMax says M3 can operate a desktop computer, citing the OSWorld-Verified score of 70.06%. Computer-use is the capability that turns a chatbot into an agent that can actually click, type, and navigate software on your behalf. Folding that into the same model as the coding and long-context capabilities is the whole "one model, not three" argument. Whether the integration is as seamless in practice as the benchmark table implies is, again, something only hands-on independent testing will settle.

Pricing: a real divergence, not a single number

A frontier-class coding model offered at a fraction of the cost of leading Western closed models, with the exact per-token price varying by source and SKU — The disruption story is cost: a frontier-class coding model at a fraction of Western closed-model pricing — though the exact figure varies by source.

I want to be careful here because the pricing story is genuinely messy, and asserting one confident dollar figure would be misleading. The MiniMax blog did not state an explicit price at the time of writing. What exists is a set of reported figures from secondary sources that do not fully agree:

Some secondary sources cite roughly $0.60 per million input tokens and $2.40 per million output tokens, with a promotional rate near $0.30 and $1.20, and a doubled rate for the 512K-to-1M context tier.
Separately, The Information reported a figure near $0.12 per million input tokens, and framed M3 as roughly 40x cheaper than Claude Opus 4.7 at a cited $5 per million input.

Those two stories are not the same number, and I am not going to pretend they are. The "$0.12 per million input" and the "40x cheaper than Opus 4.7" framing comes specifically from The Information; attribute it there, not to MiniMax. The $0.60 / $2.40 figures come from other secondary reporting. The likely explanation for the spread is that pricing varies by SKU — different context tiers, promotional versus standard rates, input versus output — and that until MiniMax publishes an official price sheet, every number floating around is provisional.

What is safe to say at the level the evidence supports: the strategic intent is aggressive low pricing for a frontier-class coding model. Whether the real, settled rate ends up nearer $0.12 or nearer $0.60 per million input, it is dramatically below the per-token cost of the leading Western closed models. That cost gap is the disruption, and it is the part that survives even with the pricing uncertainty.

Where to use it today: API, Agent, and the gateways

At launch, M3 lives in the MiniMax API and the MiniMax Agent product. The notable speed of the ecosystem response is that within a few hours of release, M3 was also available through several third-party platforms — Ollama, OpenRouter, and Novita AI among them. That matters because it means developers could route requests to M3 almost immediately without building a direct MiniMax integration.

The presence of OpenRouter in that list is its own small signal: multi-model gateways have become the default way developers try a new model on day one, and a frontier release that does not show up on them quickly is a release nobody can easily test. Once the promised weights ship, self-hosting via Ollama and similar tooling becomes a real option too — but at launch, these API and gateway routes are the practical front doors.

The market’s mixed verdict

For all the strong technical claims, the financial market was not uniformly impressed. MiniMax’s Hong Kong-listed shares fell roughly 12% on the morning of the launch. A launch-day stock drop is not a verdict on model quality — it can reflect a stack of unrelated things at once: skepticism about self-reported benchmarks, worry that aggressive low pricing compresses margins, plain "sell the news" profit-taking after a pre-launch run-up, or doubt about whether the open-weight promise will actually be kept on schedule.

I read the drop as the market pricing in exactly the uncertainty this article keeps returning to. A model that is brilliant on paper but priced to the floor, with unverified benchmarks and weights that are promised rather than shipped, is a hard thing for investors to value on day one. The technical achievement and the business case are two different scorecards, and they did not move in the same direction on June 1.

What would move this from "promising" to "proven"

Because so much of M3’s story is claim rather than confirmation, it is worth being explicit about what evidence would settle it. Three things, in order of importance:

The open weights actually ship, on a real open license, within the promised window. If the weights arrive in roughly ten days and match the benchmarked model, the open-weight claim becomes true and self-hosting becomes possible. If they slip, narrow, or shrink, that is the story.
Independent benchmark re-tests on neutral harnesses. If outside groups reproduce even rough parity with GPT-5.5, Gemini 3.1 Pro, and Opus 4.7 on coding and agentic tasks — on matched benchmark versions — the "frontier" label is earned. If the gap is large, M3 is still good but overclaimed.
An official, stable price sheet. Once MiniMax publishes real per-token rates by SKU, the cost-disruption argument can be made with a number instead of a range, and the "40x cheaper" framing can be checked against the actual comparison.

Until those three land, the accurate summary is the one I opened with: a genuinely interesting efficiency architecture and an aggressive strategic bet, wrapped in claims that are strong, self-reported, and not yet independently confirmed. That is not a criticism — it is day one. It is just the difference between taking a direction seriously and treating a launch slide as settled fact.

The bottom line

MiniMax M3 is one of the more ambitious launches of 2026 precisely because of what it tries to combine: frontier coding, a million-token window, native multimodality, and aggressive pricing, all on a new sparse-attention architecture. If the open weights ship as promised and independent testing backs even part of the benchmark story, M3 becomes a serious pressure point on the closed Western frontier — the open-weight, low-cost option that does the expensive things. If the weights slip or the benchmarks don’t hold up independently, it is still a capable model that arrived with more confidence than confirmation.

For now, the responsible position is to call it exactly what the evidence supports: open weights promised (not yet downloadable), benchmarks self-reported (not yet verified), pricing reported but divergent (no official sheet), and a sparse-attention architecture whose details are coming in a technical report that has not landed. Watch the next two weeks. That is when the promises either turn into facts or turn into the story.

Frequently asked questions

What is MiniMax M3?

MiniMax M3 is a large language model launched by Chinese AI company MiniMax on June 1, 2026. MiniMax positions it as the first model to combine frontier-level coding, a context window of up to one million tokens, and native multimodal input (text, image, and video) in a single system, built on a new architecture called MiniMax Sparse Attention (MSA). At launch it is available through the MiniMax API and the MiniMax Agent product. MiniMax has promised to release the open weights and a technical report within roughly ten days, so the weights are announced but not yet downloadable.

Are MiniMax M3 weights actually open and downloadable right now?

No. As of the June 1, 2026 launch, MiniMax M3 is reachable only through the MiniMax API and the MiniMax Agent app. MiniMax has said the open weights and a technical report will follow within roughly ten days, but at launch you cannot download or self-host the model. The accurate framing is "open weights promised," not "open weights available." This matters because a model you can call over an API but not download is, for now, functionally closed, and the open-weight claim depends entirely on MiniMax shipping the files it promised.

What is the MSA (MiniMax Sparse Attention) architecture?

MSA stands for MiniMax Sparse Attention, the new attention architecture M3 is built on. Rather than computing attention over every token at every step, a sparse-attention design attends to a selected subset, which cuts the compute and memory cost that normally explodes at long context. MiniMax reports that, at a one-million-token context, MSA decodes about 15.6x faster and prefills about 9.7x faster than its previous M2 architecture. MiniMax has not yet published the full technical report, so the precise mechanism behind those figures is not independently verifiable at launch.

How long is the MiniMax M3 context window?

MiniMax states M3 supports a context window of up to one million tokens, with a minimum guaranteed window of 512K tokens. The MSA sparse-attention architecture is what makes operating at that length cheaper than a dense-attention model of comparable size would be. The 512K-to-1M tier is also where some reported pricing is said to be higher, so the full one-million-token window may carry a premium rate depending on the source.

Is MiniMax M3 really better at coding than GPT-5.5 and Gemini 3.1 Pro?

MiniMax reports that M3 surpasses GPT-5.5 and Gemini 3.1 Pro on coding benchmarks such as SWE-Bench Pro, and approaches or slightly exceeds Claude Opus 4.7 on some agentic measures. These are self-reported vendor numbers published on MiniMax’s own blog. They have not been independently verified, and the community has signaled it intends to re-test M3 on neutral harnesses. Treat the "beats GPT-5.5 and Gemini 3.1 Pro" claim as a vendor claim pending independent confirmation, not as an established fact.

What benchmark scores did MiniMax report for M3?

On its launch blog, MiniMax reported the following self-reported scores: SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, MCP Atlas 74.2%, SWE-fficiency 34.8%, KernelBench Hard 28.8%, OSWorld-Verified 70.06%, Video-MME 84.6 across 512 frames, and BrowseComp 83.5 versus a cited 79.3 for Claude Opus 4.7. Note that the Terminal-Bench figure is on version 2.1, which is not the same as the 2.0 version other models are sometimes scored on, so cross-model comparisons on that benchmark are not apples to apples. All of these numbers are vendor-published and await independent verification.

How much does MiniMax M3 cost?

The reported pricing varies by source and SKU, and the MiniMax blog did not state an explicit dollar price at the time of writing, so there is no single confirmed number. Some secondary sources cite roughly $0.60 per million input tokens and $2.40 per million output tokens, with a promotional rate near $0.30 and $1.20 and a doubled rate for the 512K-to-1M context tier. Separately, The Information reported a figure near $0.12 per million input tokens, framing M3 as roughly 40x cheaper than Claude Opus 4.7 at a cited $5 per million input. Until MiniMax publishes an official price sheet, treat any single figure as provisional.

How is MiniMax M3 different from MiniMax M2 and M2.7?

M3 is a distinct, newer release from June 1, 2026, built on the new MSA sparse-attention architecture, whereas M2 and the self-evolving M2.7 used the prior generation. MiniMax frames M3’s efficiency gains — about 15.6x faster decoding and 9.7x faster prefill at one million tokens — relative to M2 specifically. M3 also adds native multimodal input and the up-to-one-million-token window as headline features, and unlike earlier releases it carries the explicit promise of open weights plus a technical report within roughly ten days.

Is MiniMax M3 multimodal, and can it control a computer?

Yes. MiniMax describes M3 as natively multimodal, accepting text, image, and video as input and producing text as output. It reported a Video-MME score of 84.6 using 512 frames. MiniMax also says M3 can operate a desktop computer, and it reported an OSWorld-Verified score of 70.06% on computer-use tasks. These agentic and multimodal capabilities are part of why MiniMax frames M3 as a single model spanning coding, long context, and multimodal rather than three separate specialized systems.

Where can I use or access MiniMax M3?

At launch, M3 is available through the MiniMax API and the MiniMax Agent product. Within a few hours of release it also became available through several third-party platforms, including Ollama, OpenRouter, and Novita AI, which let developers route requests to the model without a direct MiniMax integration. Once the promised open weights ship, self-hosting should become possible too, but at launch these API and gateway routes are the practical ways to try it.

Why did MiniMax stock fall on launch day?

MiniMax’s Hong Kong-listed shares fell roughly 12% on the morning of the June 1, 2026 launch, a sign that the market reaction was mixed despite the strong technical claims. A launch-day decline can reflect several things at once: skepticism about self-reported benchmarks, concern that aggressive low pricing compresses margins, "sell the news" profit-taking after a run-up, or doubts about whether the open-weight promise will be kept. The drop is a market signal, not a verdict on the model’s quality.

Should I trust MiniMax M3’s claims yet?

Treat the launch claims as a strong but unverified vendor pitch. The benchmark scores are self-reported, the open weights and technical report are promised rather than shipped, and the pricing varies by source with no official sheet at the time of writing. The honest position is to take MiniMax’s direction seriously — a low-cost, long-context, multimodal coding model is a real competitive threat — while waiting for the open weights, the technical report, and independent benchmark re-tests before treating "beats GPT-5.5 and Gemini 3.1 Pro" as settled.

MiniMax M3 Is Here: Open-Weight (Promised) Coding Frontier With 1M Context, Native Multimodal, and the New MSA Architecture