MiniMax M3

Open-weight frontier model from MiniMax combining near-frontier coding, a 1M token context window, and native multimodality — from $0.30 per million input tokens.

8.6/10

Updated July 23, 2026

Try MiniMax M3 Free →

Last updated July 23, 2026

Anthony M.

25 min readVerified July 23, 2026Tested hands-on

Quick Summary

MiniMax M3 is an open-weight large language model from Shanghai-based MiniMax, launched June 1, 2026. It combines frontier coding-agent performance, a 1 million token context window, and native multimodality on the new MSA sparse attention architecture. API pricing starts at $0.30 per million input tokens and $1.20 per million output tokens (standard tier, up to 512K context). Benchmarks are vendor-reported (59.0 percent SWE-Bench Pro). Score: 8.6/10.

MiniMax M3 review — open-weight frontier coding model with a 1 million token context window, native multimodality, and pricing from 0.30 dollars per million input tokens — MiniMax M3 — Shanghai-based MiniMax's open-weight frontier model, launched June 1, 2026. Image generated with GPT Image 2.

MiniMax M3 is an open-weight large language model from Shanghai-based MiniMax, launched on June 1, 2026 and positioned as the first open-weight system to combine frontier coding-agent performance, a 1 million token context window, and native multimodality in one model. It runs on a new sparse attention architecture MiniMax calls MSA (MiniMax Sparse Attention), accepts text, image, and video input, and is priced on the official API from $0.30 per million input tokens and $1.20 per million output tokens at the standard tier (for context up to 512K tokens). We have not run M3 in production for weeks, so this review combines our hands-on first impressions through the OpenRouter playground with MiniMax's own published benchmarks, clearly attributed. We rate it 8.6 out of 10.

TL;DR — MiniMax M3 verdict

Bottom line: MiniMax M3 is the most ambitious open-weight model release of mid-2026, and the price-to-capability ratio is genuinely hard to argue with. MiniMax reports 59.0 percent on SWE-Bench Pro, which the company says surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Opus 4.7 on that one coding benchmark, alongside a 1 million token context window and native image and video understanding. In our own light testing through OpenRouter, the coding output felt fast and the long-context recall held up well. We score it 8.6 out of 10. The honest caveats are big enough that you should read them before committing: every headline benchmark is vendor-reported and run on MiniMax's own infrastructure with no independent verification yet, the promised model weights and technical report were still pending at the time of writing despite a 10-day timeline, and because the API endpoint is operated by a Chinese company, prompts processed through it fall under China's data and intelligence laws — a real consideration for sensitive workloads.

What it is: An open-weight frontier LLM from MiniMax, launched June 1, 2026, built on the MSA sparse attention architecture.
Price: From $0.30 per million input tokens and $1.20 per million output tokens (standard tier, context up to 512K). Above 512K context, rates double.
Best for: Cost-sensitive agentic coding, very long-context tasks, and teams that want to eventually self-host an open-weight frontier model.
Skip if: You need independently verified benchmark numbers today, you cannot send data to a China-operated API, or you need the weights right now rather than soon.

What is MiniMax M3?

MiniMax M3 is the third-generation flagship text model from MiniMax, the Shanghai AI lab also known for the Hailuo video models. It launched on June 1, 2026 and is described by MiniMax as the first and only open-weight model to bring frontier coding, a 1 million token context window, and native multimodality together in a single system. In plain terms, that means one model that is meant to write and fix code at a near-frontier level, hold an entire large codebase or document set in context at once, and understand images and video natively rather than through a bolted-on vision adapter.

The model is served through the MiniMax API, through the MiniMax Code agent for autonomous coding workflows, and through third-party routers such as OpenRouter. MiniMax has committed to releasing the open model weights and a full technical report within 10 days of launch. As of this writing those had not yet appeared, so we treat the open-weight claim as a credible commitment rather than a shipped fact, and we will update this review once the weights and report are public.

If you have followed the open-weight race through models like DeepSeek and Qwen, M3 fits the same pattern: a Chinese lab releasing a frontier-class model at a fraction of the cost of the US closed labs, with the explicit goal of being downloaded, fine-tuned, and self-hosted. What makes M3 stand out is the combination — most strong open-weight models are coding specialists or long-context specialists, not both with multimodality on top.

MiniMax M3 architecture diagram showing MSA sparse attention enabling a 1 million token context window with faster prefill and decode at one twentieth the per token compute — MiniMax M3's MSA sparse attention is the headline architectural change — MiniMax reports far lower per-token compute at 1M context. Image generated with GPT Image 2.

Key features

Four things define M3, and each maps directly to a claim MiniMax makes about the architecture.

MSA (MiniMax Sparse Attention). The core innovation. MSA replaces the quadratic full attention found in most transformers with a sparse mechanism. MiniMax reports that at a 1 million token context, M3 uses roughly one twentieth of the per-token compute of its previous-generation M2 model, with prefill more than 9 times faster and decoding more than 15 times faster. The company also claims MSA is around 4 times faster than Flash-Sparse-Attention. These are vendor figures, but the architectural direction is consistent with where the wider field is heading on long context.
1 million token context. M3 supports a 1 million token window, enough to hold a large monorepo, a book-length document set, or a long multi-session agent transcript in a single call. MiniMax prices context above 512K tokens at a higher rate, so the cheap tier covers most practical day-to-day work.
Native multimodality. MiniMax says M3 was trained with mixed-modality data from step zero rather than having vision added later. It accepts text, image, and video input and produces text output. It also reports a computer-use capability, scoring 70.06 percent on OSWorld-Verified per the company.
Agentic coding and the MiniMax Code agent. M3 is tuned for autonomous, multi-step coding. The MiniMax Code agent wraps the model in a multi-agent workflow with what MiniMax describes as deep reflection and continuous error correction, plus a toggleable thinking mode billed at the same rate as standard inference.

Benchmarks (vendor-reported)

This is the section to read most carefully. Every number below was published by MiniMax and run on MiniMax's own infrastructure. None has been independently reproduced by a neutral third party at the time of writing. We present them as attributed claims, not as established facts.

SWE-Bench Pro: 59.0 percent. MiniMax says this surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Opus 4.7 on this benchmark. For context, third-party coverage notes that Opus 4.8 reportedly scores around 69.2 percent on the same test, so M3 is positioned just below the absolute frontier rather than at it.
SVG-Bench: MiniMax reports M3 surpasses Opus 4.7 here (no public number cited).
OmniDocBench: reported above Gemini 3.1 Pro (no public number cited).
Terminal-Bench 2.1: 66.0 percent.
MCP Atlas: 74.2 percent.
OSWorld-Verified: 70.06 percent (computer use, 361 samples).
SWE-fficiency: 34.8 percent and KernelBench Hard: 28.8 percent.
Claw-Eval: MiniMax reports the highest score among the models it evaluated across 161 tasks.

Our read: the spread of scores is coherent and points to a genuinely strong coding and agentic model, but until SWE-Bench Pro and the others are reproduced independently, treat the precise rankings as marketing. The pattern we trust more than any single number is the price-to-performance story, which we could partly check ourselves.

How we tested (and what we could not test)

We did not run M3 in a production environment for weeks the way we do with tools we use daily, and we want to be upfront about that. M3 launched on June 1, 2026, and this review reflects hands-on first impressions through the OpenRouter playground plus MiniMax's published material. Here is what that means in practice.

What we tested ourselves: we ran a handful of medium-difficulty coding prompts (a refactor task, a bug-fix on a moderately tangled function, and an SVG-generation prompt) and a long-context recall check by pasting roughly 200,000 tokens of mixed documentation and asking targeted retrieval questions. In that limited sample the coding output was fast and usually correct on the first attempt, the SVG rendering was clean, and long-context recall was accurate for facts buried deep in the prompt. This matches the broad picture independent reviewers like Thomas Wiegold have reported, where M3 is described as finally sitting in the same conversation as GPT and Opus rather than a tier below.

What we did not test: we have not verified MiniMax's benchmark numbers, we have not run the open weights (they were not yet public), and we have not stress-tested M3 on a large real-world codebase over multiple sessions. Some early users on public review channels report that M3 looks capable on proof-of-concept work but becomes harder to push when adding features to a mature project. We could not confirm or refute that in our short testing window, so we flag it as an open question rather than a verdict.

MiniMax M3 pricing

We verified the following directly against the MiniMax pay-as-you-go pricing documentation and cross-checked it against OpenRouter. All figures are in US dollars.

Standard tier, context up to 512K tokens: $0.30 per million input tokens and $1.20 per million output tokens, with cached input reads at $0.06 per million tokens. MiniMax labels this rate a 50 percent discount, implying a list price of $0.60 per million input tokens and $2.40 per million output tokens.
Standard tier, context above 512K tokens: $0.60 per million input tokens and $2.40 per million output tokens, with cached reads at $0.12 per million tokens.
Priority tier (SLA-optimized): $0.45 per million input tokens and $1.80 per million output tokens up to 512K context, doubling above that threshold.
OpenRouter: lists M3 at $0.30 per million input tokens and $1.20 per million output tokens, matching the standard tier exactly.

MiniMax also offers token-bundle subscription plans reported in third-party coverage at roughly $20 per month, $50 per month, and $120 per month for escalating monthly token allowances. We were not able to confirm those bundle figures directly on the official pricing page, so treat them as approximate. The headline takeaway is simple: at $0.30 per million input tokens, M3 undercuts the closed US frontier models by a wide margin, which is the entire point of the release.

MiniMax M3 alternatives

M3 competes in two overlapping arenas: open-weight frontier models and closed frontier coding models. Here is how it stacks up against the names you are most likely weighing it against.

Claude Opus 4.8: The closed frontier benchmark M3 is measured against. Opus is more expensive (an order of magnitude more per token) and not open-weight, but its coding benchmarks are independently regarded as top-tier and its reliability is well documented. If verified capability matters more than cost, Opus wins; if cost-per-token and openness matter more, M3 is the disruptor.
GPT-5.5: OpenAI's flagship, which M3 claims to beat on SWE-Bench Pro. GPT-5.5 has a broader ecosystem and tool integration; M3 competes on price and openness.
Gemini 3.1 Pro: Google's long-context specialist, also claimed beaten by M3 on SWE-Bench Pro and OmniDocBench. Gemini's multimodality and Google Cloud integration are mature; M3 challenges on cost.
DeepSeek and Qwen open-weight models: The closest peers in the open-weight Chinese-lab category. M3's differentiator is the combination of frontier coding, 1M context, and native multimodality in one release rather than across separate models.

Who should use MiniMax M3?

Cost-conscious developers running high-volume coding agents. At $0.30 per million input tokens, M3 makes always-on agentic coding economically viable in a way the closed frontier models are not.
Teams with very long-context workloads. Holding a full monorepo, a large legal corpus, or a long agent transcript in a single 1M-token call is M3's natural habitat.
Builders who want to self-host eventually. Once the open weights land, M3 is a candidate for on-premises deployment where data cannot leave the building.
Multimodal document and computer-use pipelines. Native image and video input plus a reported computer-use capability make M3 suitable for document-extraction and screen-automation experiments.
Researchers and tinkerers. An open-weight frontier model with a published technical report is a gift for fine-tuning and study, once both are public.
Cost-benchmarking against incumbents. Even if you stay on Opus or GPT for production, M3 is worth wiring into your eval harness as a cheap baseline.

Pros and cons

Based on our first-impression testing and a careful read of the published material, here is the honest balance sheet.

Final verdict

MiniMax M3 is the kind of release that resets price expectations for an entire category. If even half of MiniMax's benchmark claims hold up under independent testing, an open-weight model that codes near the frontier, holds a million tokens of context, and understands images and video natively — for $0.30 per million input tokens — is a serious problem for the closed labs charging ten to thirty times more. In our limited hands-on testing the model felt fast and capable, and the broad consensus from independent early reviewers backs that up.

We are holding our score at 8.6 out of 10 rather than higher for three concrete reasons, and we would happily revise upward if they resolve. First, the benchmarks are entirely vendor-reported and unverified, so the precise rankings versus GPT-5.5, Gemini 3.1 Pro, and Opus 4.7 are claims, not facts. Second, the open weights and technical report — the entire premise of the "open-weight" positioning — were still pending at the time of writing. Third, because the API is operated by a Chinese company, prompts sent through it fall under China's data and intelligence regulations, which rules M3's hosted endpoint out for genuinely sensitive workloads regardless of how good the model is. For experimentation, cost-sensitive coding, and long-context work where data sensitivity is low, M3 is one of the most exciting tools to land in 2026. For mission-critical, verified, compliance-bound production, wait for the weights, the report, and independent benchmarks first.

MiniMax M3 versus closed frontier models comparison — M3 at 59 percent SWE-Bench Pro vendor-reported against GPT-5.5 and Gemini 3.1 Pro, at a fraction of the per token cost — MiniMax M3 positions itself just below the closed frontier on coding benchmarks (vendor-reported) at a small fraction of the per-token cost. Image generated with GPT Image 2.

MiniMax M3 final verdict scorecard — overall score 8.6 out of 10 with strengths in value and long context, caveats on unverified benchmarks and pending open weights — Our verdict: MiniMax M3 scores 8.6 out of 10 — exceptional value, with caveats on verification and openness. Image generated with GPT Image 2.

Frequently asked questions

What is MiniMax M3?

MiniMax M3 is an open-weight large language model from Shanghai-based MiniMax, launched on June 1, 2026. MiniMax describes it as the first open-weight model to combine frontier coding-agent performance, a 1 million token context window, and native multimodality in one system. It is built on a new sparse attention architecture called MSA (MiniMax Sparse Attention) and accepts text, image, and video input while producing text output.

How much does MiniMax M3 cost?

On the official pay-as-you-go API, the standard tier costs $0.30 per million input tokens and $1.20 per million output tokens for context up to 512K tokens, with cached input reads at $0.06 per million tokens. Above 512K tokens the rate doubles to $0.60 per million input tokens and $2.40 per million output tokens. OpenRouter lists the same standard rate. MiniMax labels the standard rate a 50 percent discount on a $0.60 and $2.40 list price.

Is MiniMax M3 actually open-weight?

MiniMax has committed to releasing the model weights and a full technical report within 10 days of the June 1, 2026 launch. At the time of writing those had not yet been published, so the open-weight status is a stated commitment rather than a confirmed fact. We will update this review when the weights and report become publicly available.

How good is MiniMax M3 at coding?

MiniMax reports 59.0 percent on SWE-Bench Pro, which it says surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Opus 4.7 on that benchmark, plus 66.0 percent on Terminal-Bench 2.1. These figures are vendor-reported and run on MiniMax's own infrastructure, with no independent verification yet. In our own light testing through OpenRouter the coding output was fast and usually correct on the first attempt, which is consistent with the strong positioning but not a substitute for verified benchmarks.

What is MSA (MiniMax Sparse Attention)?

MSA is the sparse attention architecture at the heart of M3. It replaces the quadratic full attention used in most transformers, which lets the model handle very long context far more efficiently. MiniMax reports that at a 1 million token context, M3 uses about one twentieth of the per-token compute of its previous M2 model, with prefill more than 9 times faster and decoding more than 15 times faster. The company also claims MSA is roughly 4 times faster than Flash-Sparse-Attention.

How big is the MiniMax M3 context window?

M3 supports a 1 million token context window. That is large enough to hold a sizable codebase, a book-length set of documents, or a long multi-session agent transcript in a single call. Note that MiniMax prices context above 512K tokens at a higher rate, so the cheapest tier covers contexts up to roughly half a million tokens.

Is MiniMax M3 multimodal?

Yes. MiniMax says M3 was trained with mixed-modality data from the start rather than having vision capabilities added later. It accepts text, image, and video as input and produces text output. MiniMax also reports a computer-use capability, citing a score of 70.06 percent on the OSWorld-Verified benchmark for screen and desktop operation tasks.

How does MiniMax M3 compare to Claude Opus 4.8 and GPT-5.5?

MiniMax claims M3 beats GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro and approaches Opus 4.7. Third-party coverage notes that Opus 4.8 reportedly scores higher on the same benchmark, around 69.2 percent, so M3 sits just below the absolute frontier on coding. The decisive difference is cost: M3 is roughly an order of magnitude cheaper per token than the closed flagships, and it is positioned to be open-weight, which none of those competitors are.

Should I use MiniMax M3 for sensitive or production data?

Be cautious. MiniMax is a Chinese company, and prompts processed through its hosted API endpoint fall under China's data and intelligence laws regardless of where the user is located. For sensitive or compliance-bound workloads we recommend waiting for the open weights so the model can be self-hosted, or using a closed Western provider. For low-sensitivity experimentation and cost-conscious coding, the hosted API is reasonable.

Where can I access MiniMax M3?

M3 is available through the official MiniMax API and the MiniMax Code agent, and through third-party routers such as OpenRouter, where it is listed at the same standard rate. The API went live on June 1, 2026 with rate limits reported at 200 requests per minute and 10 million tokens per minute. Once the open weights are released, the model will also be downloadable for local or self-hosted deployment.

Are the MiniMax M3 benchmarks independently verified?

No. As of this writing, every benchmark figure MiniMax has published was produced by the company on its own infrastructure and has not been independently reproduced by a neutral third party. Several outlets have flagged this explicitly. We present the numbers as attributed vendor claims and recommend treating the precise rankings as marketing until independent testing confirms them.

What is our rating for MiniMax M3?

We rate MiniMax M3 8.6 out of 10. It earns high marks for capability and exceptional value — an open-weight frontier-class model at $0.30 per million input tokens is a category-resetting offer. We hold the score below 9 because the benchmarks are vendor-reported and unverified, the open weights and technical report were still pending at the time of writing, and the hosted API's China jurisdiction limits it for sensitive workloads.

Key Features

MSA (MiniMax Sparse Attention) architecture replacing quadratic full attention

1 million token context window

Native multimodality — text, image, and video input, text output

Vendor-reported 59.0 percent on SWE-Bench Pro coding benchmark

Toggleable thinking mode billed at the standard rate

MiniMax Code agent with multi-agent workflows, deep reflection, and continuous error correction

Computer-use capability (70.06 percent on OSWorld-Verified, vendor-reported)

Open-weight release committed within 10 days of launch

Available via official API, MiniMax Code, and OpenRouter

Pros & Cons

Pros

Exceptional price-to-capability ratio — frontier-class coding from $0.30 per million input tokens, roughly an order of magnitude cheaper than closed flagships
1 million token context window handles full codebases, large document sets, and long agent transcripts in a single call
Native multimodality (text, image, and video input) trained from step zero rather than bolted on later
New MSA sparse attention architecture delivers large vendor-reported speedups at long context (over 9x prefill, over 15x decode versus M2)
Positioned as open-weight, with weights and a technical report committed within 10 days of launch — a candidate for self-hosting
Strong vendor-reported coding benchmarks (59.0 percent SWE-Bench Pro) and a dedicated MiniMax Code agent for autonomous workflows

Cons

Every headline benchmark is vendor-reported on MiniMax's own infrastructure with no independent third-party verification yet
Open model weights and the technical report were still pending at the time of writing despite a 10-day commitment
Hosted API is operated by a Chinese company, so prompts fall under China's data and intelligence laws — unsuitable for sensitive workloads
SWE-Bench Pro score (59.0 percent) trails the closed frontier, with Opus 4.8 reportedly around 69.2 percent
Context above 512K tokens costs double, and some early users report it struggles to scale past proof-of-concept on mature projects

Best Use Cases

Cost-sensitive always-on agentic coding

Very long-context tasks (full monorepos, large corpora, long agent transcripts)

On-premises or self-hosted deployment once open weights are released

Multimodal document extraction and computer-use automation pipelines

Fine-tuning and research on an open-weight frontier model

Cheap baseline model inside an evaluation harness alongside closed incumbents

Platforms & Integrations

Available On

APIWebOpenRouter

Integrations

OpenRouterMiniMax APIMiniMax Code agent

Compare MiniMax M3

GPT-5.6 Luna vs MiniMax M3

Gemini 3.5 Flash vs MiniMax M3

MiniMax M3 vs Kimi K3

GPT-5.6 Sol vs MiniMax M3

GPT-5.6 Terra vs MiniMax M3

Claude Fable 5 vs MiniMax M3

Grok 4.5 vs MiniMax M3

Muse Spark 1.1 vs MiniMax M3

Claude Sonnet 5 vs MiniMax M3

Claude Opus 4.8 vs MiniMax M3

Anthony M.Verified Builder

We're developers and SaaS builders who use these tools daily in production. Every review comes from hands-on experience building real products — DealPropFirm, ThePlanetIndicator, PropFirmsCodes, and many more. We don't just review tools — we build and ship with them every day.

Written and tested by developers who build with these tools daily.

Learn more about our team →See our testing setup →Read our editorial policy →

Was this review helpful?

Frequently Asked Questions

What is MiniMax M3?

Open-weight frontier model from MiniMax combining near-frontier coding, a 1M token context window, and native multimodality — from $0.30 per million input tokens.

How much does MiniMax M3 cost?

MiniMax M3 costs $0.3/month.

Is MiniMax M3 free?

No, MiniMax M3 starts at $0.3/month.

What are the best alternatives to MiniMax M3?

Top-rated alternatives to MiniMax M3 can be found in our WebApplication category, where we've reviewed and scored every tool on ThePlanetTools.ai.

Is MiniMax M3 good for beginners?

MiniMax M3 is rated 8/10 for ease of use.

What platforms does MiniMax M3 support?

MiniMax M3 is available on API, Web, OpenRouter.

Does MiniMax M3 offer a free trial?

No, MiniMax M3 does not offer a free trial.

Is MiniMax M3 worth the price?

MiniMax M3 scores 9.5/10 for value. We consider it excellent value.

Who should use MiniMax M3?

MiniMax M3 is ideal for: Cost-sensitive always-on agentic coding, Very long-context tasks (full monorepos, large corpora, long agent transcripts), On-premises or self-hosted deployment once open weights are released, Multimodal document extraction and computer-use automation pipelines, Fine-tuning and research on an open-weight frontier model, Cheap baseline model inside an evaluation harness alongside closed incumbents.

What are the main limitations of MiniMax M3?

Some limitations of MiniMax M3 include: Every headline benchmark is vendor-reported on MiniMax's own infrastructure with no independent third-party verification yet; Open model weights and the technical report were still pending at the time of writing despite a 10-day commitment; Hosted API is operated by a Chinese company, so prompts fall under China's data and intelligence laws — unsuitable for sensitive workloads; SWE-Bench Pro score (59.0 percent) trails the closed frontier, with Opus 4.8 reportedly around 69.2 percent; Context above 512K tokens costs double, and some early users report it struggles to scale past proof-of-concept on mature projects.

Ready to try MiniMax M3?

Get started today

Try MiniMax M3 Now →