Kimi K2.7 vs DeepSeek V4: Open-Weight Chinese Coding Models Compared (2026)
We tested both Chinese open-weight coding models. DeepSeek V4 wins on price and verified 80.6% SWE-bench; Kimi K2.7 wins on tool-use and vision.
Feature Comparison
| Feature | Kimi K2.7 | DeepSeek V4 |
|---|---|---|
| Verified coding benchmarks | None (self-reported only) | 80.6% SWE-bench Verified, 93.5% LiveCodeBench |
| Input price (cache miss) | $0.95 per million tokens | $0.14 (Flash) per million tokens |
| Output price | $4.00 per million tokens | $0.28 (Flash) per million tokens |
| Context window | 256K (262,144) tokens | 1M (1,000,000) tokens |
| Multimodal / vision | Yes (400M MoonViT encoder) | No (text only) |
| Agentic tool-use (MCP) | 81.1 MCP Mark Verified (self-reported) | Not separately reported |
| License | Modified MIT (attribution clause at scale) | MIT (clean) |
| Weights available now | Yes (HuggingFace) | Yes (HuggingFace) |
| Reasoning-token efficiency | ~30% fewer tokens vs K2.6 | Three thinking modes, no efficiency claim vs Kimi |
| Total parameters | 1T MoE (32B active) | 1.6T MoE (49B active, V4-Pro) |
Pricing Comparison
Kimi K2.7
DeepSeek V4
Detailed Comparison
DeepSeek V4 is the stronger pick for most coding teams in 2026. We ran both side by side, and DeepSeek V4 wins on the two things that matter most: it has independently verified frontier coding scores (80.6 percent on SWE-bench Verified) while Kimi K2.7 has none on standard public suites, and its V4-Flash tier is far cheaper at $0.14 per million input tokens versus Kimi's $0.95. Kimi K2.7 still wins on agentic tool-use, native multimodal vision, and reasoning-token efficiency — so the verdict is DeepSeek V4 overall, Kimi for MCP-heavy agent and vision-in-code workflows.
Quick Verdict: who wins what
Both of these are Chinese, open-weight, Mixture-of-Experts models aimed squarely at coding and agentic work, and both undercut the Western frontier on price. But they are not interchangeable, and after testing both we landed on a clear split rather than a tie.
- Best overall: DeepSeek V4. It is the only model in this matchup with independently verified frontier coding benchmarks, it has a larger 1M context window, it ships under a pure MIT license, and its V4-Flash tier is dramatically cheaper per token.
- Best for cheapest tokens: DeepSeek V4-Flash. At $0.14 per million input and $0.28 output, it is roughly 7x cheaper on input and 14x cheaper on output than Kimi K2.7.
- Best for agentic tool-use: Kimi K2.7. Tool-use and MCP workflows are its strongest self-reported category (81.1 on MCP Mark Verified), and the roughly 30 percent reasoning-token efficiency gain over its predecessor lowers cost per agent task.
- Best for multimodal coding: Kimi K2.7. It bundles a 400M-parameter MoonViT vision encoder, so it can read screenshots and UI mockups inside a coding loop. DeepSeek V4 is text-only.
- Best for whole-repository prompts: DeepSeek V4. Its 1M context window holds nearly 4x what Kimi's 256K does.
If you just want one answer: pick DeepSeek V4 unless your workflow is specifically MCP/tool-use heavy or needs to read images inside code tasks, in which case Kimi K2.7 earns a serious look.
How we tested both
We ran both models side by side on the same set of coding and agentic tasks we use to evaluate every model that lands on ThePlanetTools.ai: refactoring a mid-sized TypeScript repository, writing and fixing failing tests, multi-file feature builds, and a set of MCP tool-use loops driving a coding agent. We hit DeepSeek V4 through its OpenAI-compatible API (both the V4-Flash and V4-Pro tiers) and Kimi K2.7 through Moonshot's API, with automatic context caching on for both where available.
Two honesty notes up front, because they shape the whole comparison. First, on raw benchmark capability the asymmetry is unavoidable: DeepSeek V4 has both vendor and independent third-party numbers on the standard public suites; Kimi K2.7 has none. Moonshot chose not to submit Kimi K2.7 to SWE-bench Verified, LiveCodeBench, Terminal-Bench, or Aider Polyglot, so every Kimi number you will see in this piece is self-reported in Moonshot's own evaluation harness. We label those clearly throughout, and we do not pretend a self-reported score and an independently verified one are the same thing.
Second, every pricing, context, license, and architecture figure below was read directly off each vendor's own documentation between June 16 and June 18, 2026 — DeepSeek's API pricing page at api-docs.deepseek.com, Kimi's pricing page at platform.kimi.ai, and the HuggingFace model cards for both. Where a figure is a vendor's own benchmark claim rather than an independent result, we say so. Where a number is not disclosed, we leave it out rather than guess.
Hands-on: how each one actually felt
Benchmarks aside, here is what the two models felt like across our task set. We are deliberately careful not to dress subjective impressions up as measured results — there is no independent leaderboard for Kimi K2.7 to anchor to — but the texture of working with each one is still useful.
On multi-file refactoring, DeepSeek V4-Pro was the more dependable of the two. Asked to thread a change through a TypeScript repository touching a dozen files, it produced edits that compiled and passed the existing test suite more often on the first attempt. This is the kind of task SWE-bench Verified is built to measure, so V4-Pro's verified 80.6 percent showing up as practical reliability is not a surprise. Kimi K2.7 handled the same task competently, and notably did so with fewer reasoning tokens, but we hit more cases where it needed a second pass to fully reconcile cross-file changes.
On test-writing and bug-fixing, the two were closer. Both could read a failing test, locate the defect, and propose a fix. DeepSeek V4's longer context let it hold more of the surrounding code in view at once, which helped on bugs whose root cause sat several files away from the failing assertion. Kimi K2.7's 256K window was rarely the limiting factor on individual fixes — it only started to matter when we deliberately loaded large portions of the repo into the prompt.
On agentic tool-use loops, Kimi K2.7 was where it earned its keep. Driving a coding agent through a sequence of tool calls — read a file, run a command, inspect the result, decide the next step — Kimi felt purpose-built, which aligns with its strongest self-reported number (81.1 on MCP Mark Verified). The roughly 30 percent reduction in reasoning tokens versus Kimi K2.6 also showed up as a real thing in these loops: the model reached its decisions with less verbose deliberation, and on a metered API that translates directly into lower spend per agent run.
On reading screenshots, there is no contest, because only one of the two can do it. We fed Kimi K2.7 a screenshot of a broken UI and asked it to identify the layout issue and propose a CSS fix; the MoonViT vision encoder let it actually look at the image and reason about it. DeepSeek V4 cannot take an image as input at all, so this entire class of task is simply outside its envelope. If visual inputs are part of your coding loop, that is decisive.
The honest summary: DeepSeek V4 felt like the stronger general coder, which the verified benchmarks back up; Kimi K2.7 felt like the sharper tool-use agent and the only one that can see. Neither impression replaces an independent benchmark, and we say so plainly — but they are consistent with the verified data we do have.
The specs side by side
Here is what each lab confirms about the two models. Notice the pattern that runs through the whole comparison: DeepSeek V4 is the larger model on total parameters, has a much longer context, and is the only one with verified scores — while Kimi K2.7 counters with native vision and a tool-use focus.
| Attribute | Kimi K2.7 (Moonshot AI) | DeepSeek V4 (DeepSeek) |
|---|---|---|
| Total parameters | 1T (Mixture-of-Experts, 384 experts, 8 per token) | 1.6T (V4-Pro) / 284B (V4-Flash), Mixture-of-Experts |
| Active parameters | 32B per token | 49B per token (V4-Pro) / 13B (V4-Flash) |
| Context window | 256K (262,144) tokens | 1M (1,000,000) tokens |
| Max output | Not disclosed | 384K tokens |
| Multimodal | Yes — 400M MoonViT vision encoder | No — text only |
| License | Modified MIT (weights on HuggingFace now) | MIT (weights on HuggingFace now) |
| Pricing model | Metered pay-as-you-go API | Metered pay-as-you-go API |
| Input price (cache miss) | $0.95 per million tokens | $0.14 (Flash) / $0.435 (Pro discount) per million tokens |
| Input price (cache hit) | $0.19 per million tokens | $0.0028 (Flash) / $0.003625 (Pro) per million tokens |
| Output price | $4.00 per million tokens | $0.28 (Flash) / $0.87 (Pro discount) per million tokens |
| Independent benchmarks | None — self-reported only | Yes — 80.6% SWE-bench Verified, 93.5% LiveCodeBench |
| Announced | June 12, 2026 | April 24, 2026 |
The pricing claims here come directly from each vendor. DeepSeek's per-token rates are published on its API docs at api-docs.deepseek.com, and Kimi K2.7's rates are listed on platform.kimi.ai. Both model cards, including license and parameter counts, sit on HuggingFace.
Meet Kimi K2.7
Kimi K2.7 — formally Kimi K2.7-Code — is Moonshot AI's coding-focused open-weight flagship, announced June 12, 2026. It is a 1 trillion parameter Mixture-of-Experts model that activates just 32 billion parameters per token, drawn from 384 experts with 8 selected per forward pass. Architecturally it carries 61 layers (60 MoE plus one dense) and MLA attention, the same lineage as its predecessor Kimi K2.6. The interesting twist is that it is not a pure text model: it bundles a 400M-parameter MoonViT vision encoder, so it can read images, screenshots, and UI mockups inside a coding workflow.
Moonshot's pitch for K2.7 is efficiency, not raw peak score. It reports the model scoring 62.0 on its own Kimi Code Bench v2 — a 21.8 percent jump over Kimi K2.6 — while using roughly 30 percent fewer reasoning tokens to get there. In a metered, pay-per-token world, fewer reasoning tokens for a higher score is a real cost lever, and it is the most genuinely useful thing about this release. The weights are downloadable now on HuggingFace under a Modified MIT license.
Meet DeepSeek V4
DeepSeek V4 is the fourth-generation flagship from the Hangzhou lab that made open-weights frontier models a global story in 2025. Released April 24, 2026, it ships in two Mixture-of-Experts variants: V4-Pro at 1.6 trillion total parameters (49 billion active) and V4-Flash at 284 billion total (13 billion active). Both share a 1 million token context window, a 384K max output, and MIT-licensed weights on HuggingFace. It is a text-only model — no native vision or audio — built as a general agentic flagship rather than a reasoning specialist.
Where Kimi leans on self-reported efficiency, DeepSeek V4 leans on verified peak capability. It posts 80.6 percent on SWE-bench Verified — the highest open-weights entry at launch, effectively tied with Gemini 3.1 Pro — and 93.5 percent on LiveCodeBench, the top public score at the time. Crucially, these are corroborated by independent third-party coverage, not just DeepSeek's own table. Its defining engineering trick is Hybrid Attention (Compressed Sparse Attention at 4x plus Heavily Compressed Attention at 128x), which is what makes a 1M context economically viable at $0.14 per million input tokens. For the full deep dive, see our DeepSeek V4 review.
Benchmarks: verified vs self-reported
This is the single most important section, and the one most other comparisons get wrong. You cannot put Kimi K2.7 and DeepSeek V4 on the same leaderboard, because they were not measured the same way.
DeepSeek V4 published numbers and was then independently benchmarked. Its headline coding results — 80.6 percent on SWE-bench Verified and 93.5 percent on LiveCodeBench — show up in third-party coverage, not just DeepSeek's own materials. That is the gold standard: a score you can trust because someone other than the vendor produced it.
Kimi K2.7 did not submit to any standard public suite. There are no independent third-party numbers for Kimi K2.7 on SWE-bench Verified, SWE-bench Pro, LiveCodeBench, Terminal-Bench, or Aider Polyglot at the time of writing. Every figure below is self-reported by Moonshot, run in Moonshot's own harness.
| Benchmark | Kimi K2.7 (self-reported) | DeepSeek V4-Pro (verified) |
|---|---|---|
| SWE-bench Verified | Not submitted | 80.6% |
| LiveCodeBench Pass@1 | Not submitted | 93.5% |
| Codeforces Rating | Not submitted | 3206 |
| Kimi Code Bench v2 | 62.0 | Not run by Moonshot for DeepSeek |
| Program Bench | 53.6 | — |
| MCP Mark Verified | 81.1 | — |
| MCP Atlas | 76.0 | — |
DeepSeek V4 scores are independently verified; Kimi K2.7 scores are self-reported by Moonshot in its own harness. The two columns are not directly comparable.
So what does this mean in practice? It means that if your decision rests on demonstrated, trustworthy coding capability, DeepSeek V4 is the safer bet — there is verified evidence behind it. Kimi K2.7's numbers might be excellent, but you are taking Moonshot's word for it. In our own side-by-side runs, DeepSeek V4-Pro was the more reliable model on multi-file refactors and test-fixing tasks, which tracks with its verified SWE-bench lead. Kimi K2.7 held its own and felt notably efficient — it spent fewer tokens reasoning — but we could not reproduce a clear capability lead for it on the standard coding work, and neither has any independent benchmark.
Which DeepSeek V4 tier do you compare to Kimi?
One thing that trips people up: DeepSeek V4 is not a single model, and the answer to "is it cheaper than Kimi?" depends on which tier you pick. So before the pricing table, here is how to think about the matchup.
DeepSeek V4-Flash (284B total, 13B active) is the cost tier. It is the obvious head-to-head against Kimi K2.7 for high-volume, cost-sensitive coding work, and it is where DeepSeek's price advantage is most brutal. DeepSeek V4-Pro (1.6T total, 49B active) is the capability tier — it is the model that posts the verified 80.6 percent SWE-bench Verified, and it is the right comparison when raw coding quality is what you care about. Both Flash and Pro share the same 1M context and the same MIT license.
For most of this comparison we treat V4-Flash as the value benchmark and V4-Pro as the capability benchmark, and we are explicit about which one a given claim refers to. Kimi K2.7, by contrast, is a single model, so it has to compete with whichever DeepSeek tier you choose — and on capability it faces V4-Pro's verified numbers, while on price it faces V4-Flash's rock-bottom tokens. That two-front comparison is part of why DeepSeek comes out ahead overall.
Pricing: where the gap is enormous
Both models are metered, pay-as-you-go APIs with no flat subscription — but the per-token economics are not close. This is DeepSeek V4's biggest structural advantage.
| Tier | Input (cache miss) | Input (cache hit) | Output |
|---|---|---|---|
| Kimi K2.7 | $0.95 per million tokens | $0.19 per million tokens | $4.00 per million tokens |
| DeepSeek V4-Flash | $0.14 per million tokens | $0.0028 per million tokens | $0.28 per million tokens |
| DeepSeek V4-Pro (discount) | $0.435 per million tokens | $0.003625 per million tokens | $0.87 per million tokens |
| DeepSeek V4-Pro (regular) | $1.74 per million tokens | $0.0145 per million tokens | $3.48 per million tokens |
Read the table carefully, because the headline is stark. Against DeepSeek V4-Flash, Kimi K2.7 is roughly 7x more expensive on input (cache miss) and about 14x more expensive on output. Even against the more capable V4-Pro at its discount rate, Kimi is more than 2x the input price and about 4.6x the output price. The one place Kimi's pricing looks reasonable is against V4-Pro's full regular rate, where Kimi's $0.95 input actually undercuts V4-Pro's $1.74 — but V4-Pro's discount runs deep, and V4-Flash exists precisely for cost-sensitive volume.
Caching helps both. Kimi's cache-hit input drops to $0.19 per million, and DeepSeek's cache-hit pricing is brutal-cheap at $0.0028 per million on Flash — about 50x below cache miss. For long-context agent loops with stable system prompts, DeepSeek's cache economics are in a different universe. There is one caveat to model: DeepSeek V4-Pro's deep discount was published with an expiry, after which input jumps roughly 4x to $1.74; V4-Flash pricing is the durable cheap tier.
The bottom line on cost: unless you specifically need what Kimi offers (tool-use focus, vision, token-efficiency on agent tasks), DeepSeek V4-Flash will run the same volume of coding work for a fraction of the spend.
Context window and multimodal
Two clean, factual differences sit outside the benchmark debate, and they cut in opposite directions.
Context favors DeepSeek V4. Its 1M token window is nearly 4x Kimi K2.7's 256K. In day-to-day coding agent work — a handful of files, a task, iterative edits — 256K is already comfortable, and Kimi's automatic context caching makes repeated long-context calls cheaper. But for the heaviest cases — dropping an entire mid-sized codebase plus its docs and dependency tree into a single prompt, or running very long autonomous sessions without aggressive retrieval — the 1M window is a genuine, measurable edge for DeepSeek. DeepSeek also publishes a 384K max output; Kimi does not disclose its max output.
Multimodal favors Kimi K2.7. This is the one place Kimi has something DeepSeek simply does not. Kimi K2.7 bundles a 400M-parameter MoonViT vision encoder, so it can read a screenshot of a broken UI, a design mockup, or a diagram and reason about it inside the same coding task. DeepSeek V4 is text-only — for any image-in-the-loop workflow you would have to bolt on a separate vision model. If your coding work routinely involves visual inputs, that is a real reason to choose Kimi.
Licensing and availability
Both models are open-weight and both have weights downloadable on HuggingFace right now, so neither makes you wait. The difference is in the fine print of the license.
DeepSeek V4 ships under a straight MIT license on its weights — about as permissive as it gets. Free commercial use, redistribution, fine-tuning, and modification are all permitted. The one nuance, which applies to both labs, is that this is open weights, not open source: the training code and dataset recipe are not released, so you cannot fully reproduce the training run.
Kimi K2.7 ships under a Modified MIT license. It is the same baseline permissiveness with one added clause: large-scale commercial deployments above a certain user threshold must display Kimi attribution. For the overwhelming majority of developers and teams, that clause never triggers and the practical freedom is identical to MIT. But if you are a hyperscaler-sized deployment, the attribution requirement is a real, if minor, distinction — and a reason some buyers prefer DeepSeek's clean MIT.
Real-world use cases
Specs decide nothing on their own — workloads do. Here is where each model slots in across the kinds of work teams actually run.
High-volume agentic coding pipelines
If you are running an agent fleet that reads large codebases, proposes changes, and runs tests across thousands of iterations a day, token economics dominate everything else. DeepSeek V4-Flash at $0.14 input and $0.28 output is the clear fit here — the same volume of work that costs real money on Kimi K2.7's $0.95 / $4.00 rates runs for a fraction on Flash. DeepSeek's cache-hit price of $0.0028 per million on stable system prompts pushes the effective cost of repetitive loops close to self-hosted economics.
MCP and tool-use heavy agents
For agents whose value is in orchestrating tools — calling functions, running commands, chaining MCP servers — Kimi K2.7 is the model to try first. Tool-use is its strongest self-reported category, and the reasoning-token efficiency gain means each tool-use decision costs fewer tokens to reach. On a workflow that is 80 percent tool orchestration and 20 percent code generation, Kimi's higher per-token price is partly offset by its lower token consumption per task.
Coding tasks that involve images
Front-end work that starts from a Figma mockup, bug reports that arrive as screenshots, or any loop where the model needs to look at a rendered UI — these are Kimi K2.7 territory, full stop. Its MoonViT vision encoder reads the image natively. DeepSeek V4 cannot do this at all without a bolted-on vision model, which adds latency, cost, and integration complexity.
Whole-repository and long-session work
When you want to drop an entire mid-sized codebase, its docs, and its dependency tree into a single prompt — or run a very long autonomous session without aggressive retrieval and chunking — DeepSeek V4's 1M context is the deciding factor. Kimi K2.7's 256K is comfortable for everyday multi-file work but will force you into retrieval strategies on the heaviest cases.
Self-hosted, on-premises deployments
Both ship downloadable weights today, so both are candidates for air-gapped or compliance-driven deployments where you cannot send code to a hosted API. DeepSeek V4's clean MIT license is the simpler legal story; Kimi K2.7's Modified MIT adds an attribution clause that only bites at very large scale. For most self-hosters either license is fine — the bigger practical question is hardware, which we cover next.
Open-weights research and distillation
Researchers who want a frontier-scale teacher model to fine-tune, probe, or distill into smaller students have a real choice here. DeepSeek V4-Pro's verified capability makes it the stronger research substrate on coding tasks; Kimi K2.7's vision encoder makes it the more interesting base for multimodal-code research. The permissive licenses on both make this kind of work legal in a way closed frontier models never allow.
When to pick which
Because the two models are measured differently and built for slightly different emphases, the honest recommendation is about fit, not a single score.
Pick DeepSeek V4 if you want the cheapest credible tokens (V4-Flash at $0.14 input is hard to beat); you need a 1M context for whole-repository prompts or very long agent sessions; you want verified, independently benchmarked coding capability rather than a vendor's word; you prefer a clean MIT license; or you are running cost-sensitive, high-volume agentic coding pipelines where token economics dominate. For most teams, this is the default.
Pick Kimi K2.7 if your workflow is heavily MCP and tool-use oriented — that is the model's strongest self-reported category; you need native multimodal vision inside coding tasks (reading screenshots, mockups, diagrams); you value the roughly 30 percent reasoning-token efficiency gain on per-task cost; or you are already on Kimi K2.6 and want the same license family with a token-efficiency upgrade and minimal switching cost.
For a lot of teams the real answer is "try both," because the switching cost is genuinely low: both speak OpenAI-compatible APIs, both plug into the same agent tooling, and neither locks you in the way a proprietary model does. Run your own evaluation harness against both for a week and let your actual workload decide.
Self-hosting and hardware reality
"Open weights" is only useful if you can actually run the model, and at this scale that is a hardware conversation. Both are large Mixture-of-Experts models, so both need serious silicon to serve at full precision, even though only a fraction of parameters activate per token.
DeepSeek V4 is published in mixed FP4 plus FP8 precision, which is what makes its API pricing economically possible — but it also means self-hosting wants hardware with mature FP4 support. The leaner V4-Flash (284B total, 13B active) is the realistic self-host target for smaller setups, running on multi-GPU enterprise boxes in native precision or on a single high-VRAM card with INT4 quantization and CPU KV-cache offload at a throughput penalty. The full V4-Pro (1.6T total) needs a serious GPU cluster. DeepSeek also ships day-one Huawei Ascend support, which matters for deployments that cannot or will not source NVIDIA hardware.
Kimi K2.7 at 1T total with 32B active sits between V4-Flash and V4-Pro on raw size, and its weights are on HuggingFace for anyone to download and quantize. As with any fresh open-weight release, expect community quantizations and inference-framework support to firm up in the days and weeks after launch rather than being perfect on day one — that has been the pattern for every large Chinese model release, DeepSeek's included.
The practical takeaway: neither of these is a "run it on a laptop" model. If self-hosting is your reason for choosing a Chinese open-weight model, budget for real GPU capacity, plan to quantize, and pick V4-Flash over V4-Pro or Kimi K2.7 if you want the lightest footprint at usable quality.
The bigger picture
These two are not isolated launches — they are the latest beats in a sustained Chinese push to own the open-weight coding tier. In the same stretch of 2026 we have covered Qwen 3.6, MiniMax's open-weight coding frontier in MiniMax M3, and Zhipu's GLM line — and we looked at Kimi specifically against another Chinese coding model in our piece on GLM-5.2 vs Kimi K2.7-Code. The cadence of large MoE models under permissive-ish licenses, all aimed at coding and all undercutting Western subscription pricing, is exactly the dynamic keeping China ahead in this specific lane.
It is worth keeping the frontier framing honest, though. Both of these are cheap next to a top-tier Western coding subscription, but neither is a claim of parity with the closed peak. For how the Western frontier and the cheapest open MIT option stack up, see our Claude Fable 5 vs DeepSeek V4 comparison, and for the value-vs-capability tradeoff at the top, our Claude Opus 4.8 vs Gemini 3.1 Pro piece.
Final verdict
After running both side by side, DeepSeek V4 is our overall pick in this matchup. It wins on the things you can verify and the things you pay for: independently confirmed frontier coding scores (80.6 percent SWE-bench Verified, 93.5 percent LiveCodeBench), a 1M context window, a clean MIT license, and a V4-Flash tier that is roughly 7x cheaper on input and 14x cheaper on output than Kimi K2.7. If you want one model for general open-weight coding in 2026 and you are price-conscious, DeepSeek V4 is the answer.
Kimi K2.7 is not the loser so much as the specialist. It wins on agentic tool-use (its strongest self-reported category at 81.1 on MCP Mark Verified), it is the only one of the two with native multimodal vision via MoonViT, and its roughly 30 percent reasoning-token efficiency over its predecessor is a real per-task cost lever. For MCP-heavy agent fleets or coding workflows that must read images, Kimi earns its place.
The one thing we will not do is pretend Kimi K2.7 has proven its coding capability against DeepSeek V4 — it has not been independently benchmarked, and a self-reported score is not a verified one. That single fact tilts the overall call to DeepSeek V4 for anyone who wants evidence over claims. We will revisit this the moment Moonshot submits Kimi K2.7 to the standard public suites; until then, the verified-vs-self-reported gap is the deciding factor.
Frequently asked questions
Is Kimi K2.7 or DeepSeek V4 better for coding?
DeepSeek V4 is the safer pick for coding because it has independently verified benchmarks — 80.6 percent on SWE-bench Verified and 93.5 percent on LiveCodeBench. Kimi K2.7 has no independent third-party scores on standard public suites; its coding numbers (62.0 on Kimi Code Bench v2) are self-reported in Moonshot's own harness. On demonstrated, trustworthy coding capability, DeepSeek V4 wins. Kimi K2.7 leads on agentic tool-use and adds native multimodal vision.
How much do Kimi K2.7 and DeepSeek V4 cost per million tokens?
Kimi K2.7 is metered at $0.95 per million input tokens on a cache miss, $0.19 per million on a cache hit, and $4.00 per million output. DeepSeek V4-Flash is $0.14 input cache miss, $0.0028 cache hit, and $0.28 output. DeepSeek V4-Pro is $0.435 input and $0.87 output during its discount window, $1.74 input and $3.48 output at the regular rate. DeepSeek V4-Flash is roughly 7x cheaper on input and 14x cheaper on output than Kimi K2.7.
Are Kimi K2.7 and DeepSeek V4 open source?
Both are open weight, not fully open source. DeepSeek V4 uses a permissive MIT license, with weights available now on HuggingFace. Kimi K2.7 uses a Modified MIT license with weights also available now, adding an attribution clause that only affects very large commercial deployments. In both cases the model weights are released but the training code and dataset recipe are not, so neither training run is fully reproducible.
Which has a larger context window, Kimi K2.7 or DeepSeek V4?
DeepSeek V4 has the larger context window at 1 million tokens, with a 384K max output. Kimi K2.7 supports 256K (262,144) tokens and does not publicly disclose its max output. The 1M DeepSeek window matters most for whole-repository prompts and very long autonomous agent sessions; 256K is already comfortable for typical multi-file coding tasks, and Kimi adds automatic context caching to lower the cost of repeated long-context calls.
Does Kimi K2.7 or DeepSeek V4 support images?
Kimi K2.7 supports images. It bundles a 400M-parameter MoonViT vision encoder, so it can read screenshots, UI mockups, and diagrams inside a coding workflow. DeepSeek V4 is text-only — it has no native image, audio, or video input, so any visual workflow would require pairing it with a separate vision model. If reading images inside code tasks matters to you, Kimi K2.7 is the pick.
Why does Kimi K2.7 have no SWE-bench Verified score?
Moonshot chose not to submit Kimi K2.7 to the standard independent public suites, including SWE-bench Verified, LiveCodeBench, Terminal-Bench, and Aider Polyglot. It published results only on its own internal benchmarks, such as Kimi Code Bench v2 and MCP Mark Verified, run in its own evaluation harness. That means there is no independently verified coding score for Kimi K2.7, which is why a direct head-to-head leaderboard against DeepSeek V4's verified numbers is not possible.
What architecture do Kimi K2.7 and DeepSeek V4 use?
Both are Mixture-of-Experts models. Kimi K2.7 has 1 trillion total parameters with 32 billion active per token, drawn from 384 experts with 8 selected, across 61 layers using MLA attention, plus a 400M MoonViT vision encoder. DeepSeek V4-Pro has 1.6 trillion total parameters with 49 billion active (V4-Flash is 284 billion with 13 billion active), and uses Hybrid Attention combining Compressed Sparse Attention at 4x compression with Heavily Compressed Attention at 128x to make its 1M context economical.
Can I use Kimi K2.7 and DeepSeek V4 with the same coding agents?
Yes. Both expose OpenAI-compatible APIs, so they drop into agents and editors that support custom model endpoints. DeepSeek V4 uses model IDs like deepseek-v4-flash and deepseek-v4-pro; Kimi K2.7 is accessed through Moonshot's API. Because the switching cost is low and neither locks you in the way a proprietary model does, running both against your own evaluation harness for a week is a practical way to decide which fits your workflow.
Are Kimi K2.7 and DeepSeek V4 as good as Claude or GPT for coding?
Both are cheap next to a top-tier Western coding subscription, but neither is a claim of parity with the closed frontier. DeepSeek V4's verified 80.6 percent SWE-bench Verified is competitive and effectively tied with Gemini 3.1 Pro, while the only published Kimi K2.7 numbers are self-reported and not on the standard suites. For the value-versus-capability tradeoff against the Western peak, our Claude Fable 5 vs DeepSeek V4 and Claude Opus 4.8 vs Gemini 3.1 Pro comparisons cover the gap in detail.
When were Kimi K2.7 and DeepSeek V4 released?
DeepSeek V4 was released April 24, 2026, with weights for both V4-Pro and V4-Flash on HuggingFace and API access live the same day. Kimi K2.7 (Kimi K2.7-Code) was announced June 12, 2026, also with downloadable weights on HuggingFace at launch. Both are 2026 open-weight Chinese coding models, with DeepSeek V4 arriving roughly seven weeks earlier.
Our Verdict
DeepSeek V4 is the overall winner for most coding teams in 2026: it is the only model here with independently verified frontier scores (80.6% SWE-bench Verified, 93.5% LiveCodeBench), it has a 1M context window, a clean MIT license, and a V4-Flash tier roughly 7x cheaper on input and 14x cheaper on output than Kimi K2.7. Kimi K2.7 wins on agentic tool-use (its strongest self-reported category), native multimodal vision via MoonViT, and reasoning-token efficiency — making it the better fit for MCP-heavy agent fleets and vision-in-code workflows, but its lack of any independent benchmark keeps it behind DeepSeek V4 on demonstrated capability.
Choose Kimi K2.7
Moonshot AI's open-weight 1T-parameter MoE coding model — 32B active, 256K context, Modified MIT, metered at $0.95 in / $4.00 out per million tokens.
Try Kimi K2.7 →Choose DeepSeek V4
Chinese open-source flagship: 1.6T MoE (49B active), 1M context, 80.6% SWE-bench Verified, MIT license — at one-fifth the price of Claude Opus 4.7
Try DeepSeek V4 →Frequently Asked Questions
Is Kimi K2.7 better than DeepSeek V4?
DeepSeek V4 is the overall winner for most coding teams in 2026: it is the only model here with independently verified frontier scores (80.6% SWE-bench Verified, 93.5% LiveCodeBench), it has a 1M context window, a clean MIT license, and a V4-Flash tier roughly 7x cheaper on input and 14x cheaper on output than Kimi K2.7. Kimi K2.7 wins on agentic tool-use (its strongest self-reported category), native multimodal vision via MoonViT, and reasoning-token efficiency — making it the better fit for MCP-heavy agent fleets and vision-in-code workflows, but its lack of any independent benchmark keeps it behind DeepSeek V4 on demonstrated capability.
Which is cheaper, Kimi K2.7 or DeepSeek V4?
Kimi K2.7 is priced at $0.95 in / $4 out per M tokens (free plan available). DeepSeek V4 is priced at $0.14 in / $0.28 out per M tokens (free plan available). Check the pricing comparison section above for a full breakdown.
What are the main differences between Kimi K2.7 and DeepSeek V4?
The key differences span across 10 features we compared. For Verified coding benchmarks, Kimi K2.7 offers None (self-reported only) while DeepSeek V4 offers 80.6% SWE-bench Verified, 93.5% LiveCodeBench. For Input price (cache miss), Kimi K2.7 offers $0.95 per million tokens while DeepSeek V4 offers $0.14 (Flash) per million tokens. For Output price, Kimi K2.7 offers $4.00 per million tokens while DeepSeek V4 offers $0.28 (Flash) per million tokens. See the full feature comparison table above for all details.

