Skip to content
K

Kimi K2.7

Moonshot AI's open-weight 1T-parameter MoE coding model — 32B active, 256K context, Modified MIT, metered at $0.95 in / $4.00 out per million tokens.

8.4/10
Last updated June 19, 2026
Author
Anthony M.
22 min readVerified June 19, 2026Tested hands-on

Quick Summary

Kimi K2.7 (Kimi K2.7-Code) is Moonshot AI's open-weight coding-focused large language model announced June 12, 2026. It is a 1 trillion parameter Mixture-of-Experts model with 32 billion active parameters and a 256K context window, released under a Modified MIT license with weights on HuggingFace. Self-reported scores include 62.0 on Kimi Code Bench v2 and 81.1 on MCP Mark Verified; Moonshot did not submit independent public benchmarks. Metered API: $0.95 per million input tokens cache miss, $0.19 cache hit, $4.00 output.

Kimi K2.7 by Moonshot AI — open-weight 1 trillion parameter MoE coding model, 256K context, metered API, glassmorphism brand visual
Kimi K2.7 (Kimi K2.7-Code) is Moonshot AI's open-weight 1 trillion parameter Mixture-of-Experts coding model, released June 12, 2026.

Kimi K2.7 is Moonshot AI's open-weight coding model: a 1 trillion parameter Mixture-of-Experts system with 32 billion active parameters, a 256K-token context window, and a native MoonViT vision encoder, released June 12, 2026 under a Modified MIT license. It is built for agentic coding and tool use, sold as a metered pay-as-you-go API at $0.95 per million input tokens with no subscription, and its weights are downloadable on day one from HuggingFace.

Verdict: 8.4 out of 10

Kimi K2.7 is one of the most practical open-weight coding models we have tested this year. It pairs frontier-scale architecture with genuinely cheap metered input pricing and same-day downloadable weights, and it shines on agentic tool-use workloads. The catch is honesty around evidence: every benchmark Moonshot published is from its own in-house harness, with no independent public-suite results yet. We score it 8.4 out of 10 — excellent value for teams who want to self-host or pay only for what they use, with eyes open about the unverified scores.

Who it's for: developers and teams who want a cheap, self-hostable, MCP-friendly coding model and can tolerate vendor-reported benchmarks.

Overview: what Kimi K2.7 actually is

Kimi K2.7, also marketed as Kimi K2.7-Code, is the latest open-weight coding model from Moonshot AI, the Beijing-based lab behind the Kimi family. It landed on June 12, 2026 — exactly one day before Zhipu's GLM-5.2 — in what has become a crowded month for Chinese open-weight launches. Where the previous generation, Kimi K2.6, was a strong but token-hungry coder, K2.7 reframes the pitch around efficiency: Moonshot says it reaches higher scores while spending roughly 30 percent fewer reasoning tokens, which directly lowers the cost of every coding task you run through it.

The model is positioned squarely at agentic coding and tool use rather than chat. It ships with an OpenAI-compatible API, native support for tool calls and JSON mode, and a vision encoder so it can read screenshots and UI mockups inside a coding loop. Crucially, the weights are open and available to download from HuggingFace the same day the API went live — there was no staggered release or waiting list, which is not always the case with frontier-scale Chinese models.

In our testing we treated K2.7 the way most teams will use it: as a drop-in coding model behind an OpenAI-compatible endpoint, wired into an agent that calls tools, edits files, and occasionally looks at an image. That is the workload it was clearly built for, and it is where the model is at its most convincing.

Key features

K2.7 is a frontier-scale model that behaves like a cheap one. The headline numbers are large, but the active footprint per token is small, which is what keeps inference affordable.

  • 1 trillion total parameters, 32 billion active. A sparse Mixture-of-Experts design with 384 experts and 8 selected per token (plus 1 shared expert) means only a fraction of the network fires for any given token.
  • 256K context window. 262,144 tokens of context, with automatic context caching that makes repeated long-context calls dramatically cheaper.
  • Native MoonViT vision. A 400 million parameter vision encoder lets the model read images, screenshots, and UI mockups — useful inside coding workflows where you want it to look at a design and produce markup or fix a layout.
  • Open weights, Modified MIT license. You can download and self-host the model today; the license is effectively MIT with an added attribution clause for very large commercial deployments above a user threshold.
  • Agentic, tool-first orientation. OpenAI-compatible API with tool calls and JSON mode, and Moonshot's strongest self-reported category is MCP tool use.
  • Token efficiency. Roughly 30 percent fewer reasoning tokens than Kimi K2.6 to reach a comparable or higher score, which compounds into real savings at scale.

Architecture and specs

K2.7 keeps the same architectural family as Kimi K2.6 but tunes it for efficiency. It is a 61-layer model — 60 Mixture-of-Experts layers plus 1 dense layer — using MLA attention. The MoE router selects 8 of 384 experts per token, with a single shared expert always active, which is how a 1 trillion parameter model runs with only 32 billion parameters active at inference time.

SpecKimi K2.7
Total parameters1 trillion
Active parameters per token32 billion
Experts384 (8 selected per token, 1 shared)
Layers61 (60 MoE + 1 dense), MLA attention
Context window256K (262,144 tokens)
Vision encoderMoonViT, 400 million parameters
LicenseModified MIT (open weights)
Release dateJune 12, 2026
API compatibilityOpenAI-compatible (tool calls, JSON mode)

The practical upshot of the sparse design is that you get frontier-scale capability without frontier-scale inference cost, whether you are renting the metered API or running the weights on your own GPU cluster. The 256K context is generous for most day-to-day coding agents, though it is smaller than some rivals, as we discuss below.

Pricing

Kimi K2.7 is metered, pay-as-you-go, with no subscription tier to commit to. You pay per token, and the rates split between input and output. Input costs $0.95 per million tokens on a cache miss, and automatic context caching drops repeated input to $0.19 per million cached tokens. Output costs $4.00 per million tokens.

Kimi K2.7 metered API pricing — 0.95 dollars per million input tokens, 0.19 cached, 4.00 output, glassmorphism pricing card
Kimi K2.7 metered API pricing: dual-rate token billing with no subscription.

The cache-hit economics matter more than they first appear. Agentic coding sessions reuse a lot of the same context — the same files, the same system prompt, the same tool definitions — over and over. With automatic caching, that repeated input is billed at roughly a fifth of the cache-miss rate, so the effective input cost of a long agent run drifts toward $0.19 per million tokens rather than $0.95.

Where the model is less aggressive is output. At $4.00 per million output tokens, K2.7 is far more expensive on generation than the ultra-cheap open rivals — DeepSeek's V4-Flash, for example, sits around $0.28 per million output tokens. If your workload generates enormous volumes of code, that gap adds up. K2.7's pitch is cheap input plus token efficiency, not the lowest output rate on the market.

Benchmarks (self-reported)

Here is the important caveat first: every benchmark number Moonshot published for K2.7 comes from its own in-house harness. As of release there were no independent third-party results on standard public suites — no SWE-bench Verified, no LiveCodeBench, no Terminal-Bench, no Aider Polyglot, no GPQA Diamond. So K2.7 is not a model "without benchmarks," but the benchmarks it has are vendor-reported and run in Moonshot's own evaluation environment. Treat them as directional, not independently verified.

With that framing, the self-reported table is genuinely informative because Moonshot published comparison columns against its own predecessor and two frontier closed models. The pattern is clear: K2.7 is strongest on agentic and MCP tool-use categories, and trails the closed frontier models on raw coding.

Benchmark (in-house)Kimi K2.7-CodeKimi K2.6Claude Opus 4.8GPT-5.5
Kimi Code Bench v262.050.967.469.0
MCP Mark Verified81.172.876.492.9
MCP Atlas76.069.481.379.4
Program Bench53.648.363.869.1
MLS Bench Lite35.126.742.835.5
Kimi Claw 24/7 Bench46.942.950.452.8
Kimi K2.7 self-reported in-house benchmark comparison versus Kimi K2.6, Claude Opus 4.8, and GPT-5.5, glassmorphism data visual
Kimi K2.7 self-reported in-house benchmark scores versus its predecessor and two frontier closed models.

The most notable result is MCP Mark Verified, where K2.7's self-reported 81.1 edges past Claude Opus 4.8 at 76.4 — though GPT-5.5 still leads the category at 92.9. On the headline Kimi Code Bench v2, K2.7 jumps from K2.6's 50.9 to 62.0, a 21.8 percent improvement, but it sits behind both Opus 4.8 (67.4) and GPT-5.5 (69.0). The honest read: K2.7 narrows the gap to the frontier on its own metrics and leads on tool use, but you should not assume it matches closed models on independent coding suites until someone runs them.

How we tested it

We ran Kimi K2.7 the way a team would in production: behind its OpenAI-compatible endpoint, wired into an agent that reads a repository, calls tools, edits files, and runs commands. We focused on three things — agentic coding loops, MCP-style tool use, and the vision path — over roughly a week of daily use.

On agentic tool use, the model lived up to its strongest self-reported category. In our testing it was reliable about calling the right tool with well-formed arguments, recovering from a failed tool call, and chaining several steps without losing the plot. JSON mode held up; we rarely had to retry for malformed output. This is the workload where K2.7 felt most like a frontier model rather than a budget one.

On raw code generation, it was strong but not best-in-class. For well-scoped tasks — implement this function, refactor this module, write tests for this file — it produced clean, idiomatic code on the first or second try. On gnarlier multi-file changes it occasionally needed more steering than the top closed models, which tracks with the in-house benchmark gap on Program Bench.

The vision path is a genuine differentiator. We handed it screenshots of UI mockups and asked for markup, and MoonViT did the reading well enough to produce usable starting components. It is not a design tool, but for a coding model the ability to look at a screenshot inside the same loop is a real convenience.

The token-efficiency claim also held in practice. Compared with how we remembered K2.6 behaving, K2.7 reached answers with noticeably less back-and-forth reasoning, and with cache hits on repeated context, the per-task bill stayed low. In our production workflow, the combination of cheap cached input and fewer reasoning tokens was the single biggest reason to keep it in rotation.

Use cases

  • Cheap metered agentic coding where you pay only for what you consume instead of a flat monthly subscription.
  • Self-hosted, on-premises coding deployments that need downloadable weights from day one for privacy or compliance reasons.
  • Tool-use and MCP-heavy agent workflows — the model's strongest self-reported category and, in our testing, its most convincing one.
  • Multimodal coding tasks that involve reading screenshots, UI mockups, or diagrams via the MoonViT vision encoder.
  • Long-context work that benefits from automatic caching, where repeated context (the same files and prompts) makes the effective input cost very low.
  • Teams migrating from Kimi K2.6 who want the token-efficiency gain within the same license family and architecture.
  • Cost-sensitive startups that want frontier-scale capability on input-heavy workloads without frontier-scale bills.

Pros and cons

Pros

  • Open weights available to download and self-host today on HuggingFace under a Modified MIT license — no waiting period, unlike some rival Chinese launches.
  • Roughly 30 percent fewer reasoning tokens than its predecessor Kimi K2.6 to reach a higher score, which lowers the effective cost per coding task.
  • Strong agentic tool-use orientation — Moonshot self-reports 81.1 on MCP Mark Verified and 76.0 on MCP Atlas, its best published category.
  • Cheap metered, pay-as-you-go API at $0.95 per million input tokens on a cache miss, with automatic context caching dropping repeated input to $0.19 per million.
  • 1 trillion total parameters with only 32 billion active per token (384 experts, 8 selected) keeps inference cost low for a frontier-scale model.
  • Includes a 400M-parameter MoonViT vision encoder, so it can read screenshots and UI mockups inside coding workflows.
  • OpenAI-compatible API that plugs into agents and editors supporting custom model endpoints.

Cons

  • No independent third-party benchmarks on standard public suites — Moonshot skipped SWE-bench Verified, LiveCodeBench, Terminal-Bench, and Aider Polyglot, so every published number is self-reported in its own harness.
  • 256K context window is smaller than DeepSeek V4's 1M, which matters for whole-repository prompts and very long autonomous sessions.
  • On the lab's own table, Kimi K2.7-Code (62.0 on Kimi Code Bench v2) trails Claude Opus 4.8 and GPT-5.5 on its headline coding metric.
  • Modified MIT license adds an attribution clause for very large commercial deployments above a user threshold — irrelevant for most teams but not pure MIT.
  • Output tokens are expensive relative to ultra-cheap open rivals — $4.00 per million output tokens is far above DeepSeek V4-Flash at roughly $0.28 per million.

Who it's for, and who should skip it

Buy in if you run agentic coding agents, care about MCP and tool use, want to self-host on your own hardware, or have input-heavy workloads where cheap cached input and token efficiency dominate your bill. K2.7 is one of the best value-per-token open-weight coding models available right now.

Skip it if your workload is output-heavy code generation where DeepSeek V4-Flash's far cheaper output wins, if you need a 1M-token context for whole-repository prompts, or if you cannot ship on vendor-reported benchmarks and require independent public-suite verification before adopting a model.

Alternatives

DeepSeek V4. The most direct open-weight rival. It offers a 1M-token context — four times K2.7's window — and its V4-Flash tier is dramatically cheaper on output at roughly $0.28 per million tokens. If your bottleneck is long-repository context or high output volume, DeepSeek V4 is the more economical pick; K2.7 counters with stronger self-reported tool use and cheaper cached input.

GLM-5.2. Zhipu's model, launched a single day after K2.7, and unlike Moonshot it published results on standard public suites, which gives it an evidence advantage for buyers who refuse to rely on in-house numbers. It is the natural cross-shop if independent benchmarks are a hard requirement.

Qwen 3.6. Alibaba's open-weight family remains a broad, well-supported alternative with a large ecosystem of fine-tunes and tooling. It is worth evaluating alongside K2.7 if ecosystem maturity and community support matter as much as raw scores.

Closed frontier models. For pure capability ceiling, Claude Opus 4.8 and GPT-5.5 still lead K2.7 on most of Moonshot's own coding columns. The trade is obvious: they are not open weight, not self-hostable, and not metered at K2.7's input prices.

Frequently asked questions

What is Kimi K2.7?

Kimi K2.7 (Kimi K2.7-Code) is Moonshot AI's open-weight coding model, released June 12, 2026. It is a 1 trillion parameter Mixture-of-Experts model with 32 billion active parameters, a 256K-token context window, and a native MoonViT vision encoder, distributed under a Modified MIT license with downloadable weights on HuggingFace.

How much does Kimi K2.7 cost?

It is metered, pay-as-you-go, with no subscription. Input costs $0.95 per million tokens on a cache miss, $0.19 per million cached tokens on a cache hit, and output costs $4.00 per million tokens. You can also self-host the open weights and pay only your own compute.

Can I self-host Kimi K2.7?

Yes. The weights are open and were available to download from HuggingFace on the day of release under a Modified MIT license. The modified clause adds an attribution requirement only for very large commercial deployments above a user threshold; for most teams it behaves like a standard MIT license.

What is the context window of Kimi K2.7?

256K tokens (262,144), with automatic context caching that makes repeated long-context calls much cheaper. That is smaller than DeepSeek V4's 1M-token window but generous for most day-to-day coding agents.

Are Kimi K2.7's benchmarks independently verified?

No. Every published benchmark — Kimi Code Bench v2, MCP Mark Verified, MCP Atlas, Program Bench, MLS Bench Lite, and Kimi Claw 24/7 Bench — comes from Moonshot's own in-house harness. As of release there were no independent results on standard public suites such as SWE-bench Verified or LiveCodeBench, so the scores should be read as vendor-reported and directional.

Is Kimi K2.7 better than DeepSeek V4?

It depends on the workload. K2.7 reports stronger agentic tool use and offers cheaper cached input, while DeepSeek V4 has a far larger 1M-token context and much cheaper output via its V4-Flash tier at roughly $0.28 per million output tokens. Pick K2.7 for input-heavy, MCP-driven agents; pick DeepSeek V4 for long-repository context or output-heavy generation.

Does Kimi K2.7 support vision?

Yes. It includes a 400 million parameter MoonViT vision encoder, so it can read images, screenshots, and UI mockups directly inside coding workflows — for example, turning a screenshot of a design into starting markup.

How does Kimi K2.7 compare to Claude Opus 4.8 and GPT-5.5?

On Moonshot's own table, K2.7 leads Claude Opus 4.8 on MCP Mark Verified (81.1 versus 76.4) but trails both Opus 4.8 and GPT-5.5 on the headline Kimi Code Bench v2 (62.0 versus 67.4 and 69.0). Since these are in-house scores, treat the comparison as directional rather than definitive.

Our verdict

Kimi K2.7 earns 8.4 out of 10. It is a frontier-scale, open-weight coding model that behaves like a budget one: cheap metered input, automatic caching, 30 percent better token efficiency than its predecessor, and same-day downloadable weights under a near-MIT license. For agentic, MCP-heavy, self-hosted, or input-heavy coding work, it is excellent value and was a pleasure to run in our testing.

The reservation is evidence, not capability. Moonshot shipped only in-house benchmarks, output pricing is expensive next to the cheapest open rivals, and the 256K context is smaller than some competitors. None of that is disqualifying — it just means you should adopt K2.7 for what it provably does well (cheap, efficient, tool-using, self-hostable coding) and verify the rest against your own workload before betting a critical pipeline on the headline scores.

Key Features

1 trillion total parameters, Mixture-of-Experts with 384 experts and 8 selected per token, 32 billion active parameters
256K (262,144) token context window with automatic context caching for cheaper repeated long-context calls
61 layers (60 MoE plus 1 dense) with MLA attention, the same family of architecture as Kimi K2.6
400M-parameter MoonViT vision encoder for multimodal reading of images and screenshots in coding tasks
Modified MIT license with open weights available now on HuggingFace
Metered API pricing: $0.95 per million input tokens cache miss, $0.19 per million cache hit, $4.00 per million output
OpenAI-compatible API with ToolCalls and JSON Mode support for agentic workflows
Roughly 30 percent reduction in reasoning tokens versus Kimi K2.6 for the same or better coding score

Pros & Cons

Pros

  • Open weights available to download and self-host today on HuggingFace under a Modified MIT license — no waiting period, unlike some rival Chinese launches
  • Roughly 30 percent fewer reasoning tokens than its predecessor Kimi K2.6 to reach a higher score, which lowers the effective cost per coding task
  • Strong agentic tool-use orientation — Moonshot self-reports 81.1 on MCP Mark Verified and 76.0 on MCP Atlas, its best published category
  • Cheap metered, pay-as-you-go API at $0.95 per million input tokens on a cache miss, with automatic context caching dropping repeated input to $0.19 per million
  • 1 trillion total parameters with only 32 billion active per token (384 experts, 8 selected) keeps inference cost low for a frontier-scale model
  • Includes a 400M-parameter MoonViT vision encoder, so it can read screenshots and UI mockups inside coding workflows
  • OpenAI-compatible API that plugs into agents and editors supporting custom model endpoints

Cons

  • No independent third-party benchmarks on standard public suites — Moonshot skipped SWE-bench Verified, LiveCodeBench, Terminal-Bench, and Aider Polyglot, so every published number is self-reported in its own harness
  • 256K context window is smaller than DeepSeek V4's 1M, which matters for whole-repository prompts and very long autonomous sessions
  • On the lab's own table, Kimi K2.7-Code (62.0 on Kimi Code Bench v2) trails Claude Opus 4.8 and GPT-5.5 on its headline coding metric
  • Modified MIT license adds an attribution clause for very large commercial deployments above a user threshold — irrelevant for most teams but not pure MIT
  • Output tokens are expensive relative to ultra-cheap open rivals — $4.00 per million output is far above DeepSeek V4-Flash at $0.28

Best Use Cases

Cheap metered agentic coding where you pay only for what you consume rather than a flat subscription
Self-hosted on-premises coding deployments that need downloadable weights on day one
Tool-use and MCP-heavy agent workflows, the model's strongest self-reported category
Multimodal coding tasks that involve reading screenshots, UI mockups, or diagrams
Teams switching from Kimi K2.6 that want the token-efficiency gain at the same license family

Platforms & Integrations

Available On

APIWebSelf-hosted (open weights)

Integrations

OpenAI-compatible APIHuggingFaceMCPCustom agent endpoints

Compare Kimi K2.7

Anthony M. — Founder & Lead Reviewer
Anthony M.Verified Builder

We're developers and SaaS builders who use these tools daily in production. Every review comes from hands-on experience building real products — DealPropFirm, ThePlanetIndicator, PropFirmsCodes, and many more. We don't just review tools — we build and ship with them every day.

Written and tested by developers who build with these tools daily.

Was this review helpful?

Frequently Asked Questions

What is Kimi K2.7?

Moonshot AI's open-weight 1T-parameter MoE coding model — 32B active, 256K context, Modified MIT, metered at $0.95 in / $4.00 out per million tokens.

How much does Kimi K2.7 cost?

Kimi K2.7 has a free tier. Premium plans start at $0.95/month.

Is Kimi K2.7 free?

Yes, Kimi K2.7 offers a free plan. Paid plans start at $0.95/month.

What are the best alternatives to Kimi K2.7?

Top-rated alternatives to Kimi K2.7 can be found in our WebApplication category, where we've reviewed and scored every tool on ThePlanetTools.ai.

Is Kimi K2.7 good for beginners?

Kimi K2.7 is rated 8.2/10 for ease of use.

What platforms does Kimi K2.7 support?

Kimi K2.7 is available on API, Web, Self-hosted (open weights).

Does Kimi K2.7 offer a free trial?

Yes, Kimi K2.7 offers a free trial.

Is Kimi K2.7 worth the price?

Kimi K2.7 scores 8.3/10 for value. We consider it excellent value.

Who should use Kimi K2.7?

Kimi K2.7 is ideal for: Cheap metered agentic coding where you pay only for what you consume rather than a flat subscription, Self-hosted on-premises coding deployments that need downloadable weights on day one, Tool-use and MCP-heavy agent workflows, the model's strongest self-reported category, Multimodal coding tasks that involve reading screenshots, UI mockups, or diagrams, Teams switching from Kimi K2.6 that want the token-efficiency gain at the same license family.

What are the main limitations of Kimi K2.7?

Some limitations of Kimi K2.7 include: No independent third-party benchmarks on standard public suites — Moonshot skipped SWE-bench Verified, LiveCodeBench, Terminal-Bench, and Aider Polyglot, so every published number is self-reported in its own harness; 256K context window is smaller than DeepSeek V4's 1M, which matters for whole-repository prompts and very long autonomous sessions; On the lab's own table, Kimi K2.7-Code (62.0 on Kimi Code Bench v2) trails Claude Opus 4.8 and GPT-5.5 on its headline coding metric; Modified MIT license adds an attribution clause for very large commercial deployments above a user threshold — irrelevant for most teams but not pure MIT; Output tokens are expensive relative to ultra-cheap open rivals — $4.00 per million output is far above DeepSeek V4-Flash at $0.28.

Ready to try Kimi K2.7?

Start with the free plan

Try Kimi K2.7 Free