Skip to content
C
AI Tools

Claude Opus 4.7

Anthropic's flagship LLM — agentic coding king with 1M context

9.4/10
Last updated June 10, 2026
Author
Anthony M.
46 min readVerified June 10, 2026Tested hands-on

Quick Summary

Claude Opus 4.7 is Anthropic's flagship LLM launched April 16, 2026. Hits 93% on SWE-bench Verified (+13 points over Opus 4.6), ships a 1M-token context with 128K output, costs $5 / $25 per million input/output tokens, and uses a new tokenizer (+35% tokens for same text). Score: 9.4/10. We use it daily on Planet Cockpit for agentic coding production work.

Claude Opus 4.7 review — Anthropic flagship AI model with 1M context window, SWE-bench 93%, and adaptive thinking
Claude Opus 4.7 — Anthropic's flagship model launched April 16, 2026 with a step-change in agentic coding. Image generated with Nano Banana Pro.

Claude Opus 4.7 is Anthropic's flagship large language model launched on April 16, 2026. It hits 93% on SWE-bench Verified (versus 80% for Opus 4.6), ships a 1M-token context window with 128K output, costs $5 per million input tokens and $25 per million output tokens, and uses a new tokenizer that consumes up to 35% more tokens for the same text. We use Claude Opus 4.7 daily on our content production stack — twenty-two straight days of agentic coding production work since launch — and the upgrade from Opus 4.6 is real. Score 9.4 out of 10.

TL;DR — Claude Opus 4.7 verdict

Bottom line: Claude Opus 4.7 is the best agentic coding model on the market in May 2026, scoring 9.4 out of 10 on our hands-on review after 22 days of daily production use on ThePlanetTools.ai.

  • Best for: Long autonomous coding tasks (30+ tool calls), monorepo refactors, AI agent builders running on Claude Code, Cursor, or Windsurf.
  • Avoid if: Cost-sensitive batch jobs (the new tokenizer adds up to 35% more tokens), workflows that require explicit extended thinking, or short-turn iteration where Haiku 4.5 latency wins.
  • Pricing: Starting at $17 per month on the Pro consumer plan; $5 per million input tokens and $25 per million output tokens on the API.
  • Score: 9.4 out of 10 — based on 22 days of daily production use since the April 16, 2026 launch on a Next.js 16 plus Supabase content production stack.

What is Claude Opus 4.7?

Claude Opus 4.7 (API ID claude-opus-4-7) is Anthropic's most capable generally available model as of April 2026. It replaces Opus 4.6 at the top of the lineup and is positioned for "complex reasoning and agentic coding." The headline numbers from Anthropic's launch post (April 16, 2026): 93% on SWE-bench Verified (a +13 percentage-point jump over Opus 4.6's 80%), 70% on CursorBench versus Opus 4.6's 58%, and 98.5% on XBOW visual-acuity versus Opus 4.6's 54.5%. It also passed Terminal-Bench 2.0 tasks that previous Claude generations could not, and Anthropic claims state-of-the-art performance on the Finance Agent Evaluation.

What is unusual about this release is the tokenizer change. Anthropic states Opus 4.7 uses a new tokenizer that "may use up to 35% more tokens for the same fixed text." That changes the real cost-per-task math even though the headline rate ($5 input and $25 output per million tokens) is unchanged from Opus 4.6 and Opus 4.5. The reliable knowledge cutoff also jumps to January 2026 (versus May 2025 for Opus 4.6), and training data extends through January 2026.

Anthropic ships Opus 4.7 across the Claude consumer apps (web, iOS, Android, desktop), the Claude API, AWS Bedrock (Messages-API endpoint as anthropic.claude-opus-4-7), Google Vertex AI (claude-opus-4-7), and Microsoft Foundry. The model snapshot is dateless — the API alias claude-opus-4-7 is itself a pinned snapshot, not an evergreen pointer (a convention Anthropic introduced with the 4.6 generation). Opus 4.6 stays available with no announced deprecation date as of May 2026, but Anthropic recommends migrating for the agentic coding lift.

Key features

Claude Opus 4.7 agentic coding demo — multi-file refactor with adaptive thinking and task budgets
Opus 4.7 in agentic coding mode — multi-file refactors with self-verification.

Agentic coding (SWE-bench Verified 93%)

The headline feature. Opus 4.7's score on SWE-bench Verified — the public benchmark for end-to-end software engineering tasks — jumped from Opus 4.6's 80% to 93%, a +13 percentage-point gap that is unusually large between consecutive Anthropic releases. CursorBench (an aggregate measure across real Cursor agent runs) went from 58% to 70%. The tangible behavioral change: the model "devises ways to verify its own outputs before reporting back," which means fewer "I'm done" lies followed by broken builds.

Adaptive thinking (no extended thinking)

Opus 4.7 ships with adaptive thinking only — the model decides internally whether to reason for longer based on task complexity. Anthropic dropped the explicit "extended thinking" lever that exists on Sonnet 4.6 and Haiku 4.5. If your pipeline relied on forcing extended reasoning on Opus, you will need to switch that lever to Sonnet 4.6 or accept Opus 4.7's adaptive default. In practice, adaptive thinking saves tokens on simple turns and budgets harder on complex ones, but you lose deterministic control.

1M-token context window with 128K output

Same 1M input token context as Opus 4.6 on paper, but recall on tokens 700,000 to 900,000 is noticeably tighter on Opus 4.7. Maximum output is 128,000 tokens via the synchronous Messages API, with up to 300,000 output tokens available via the Message Batches API using the output-300k-2026-03-24 beta header. The full 1M context is billed at the standard token rate — there is no premium for using the long-context window, unlike some competitors.

New tokenizer (up to 35% more tokens)

Anthropic redesigned the tokenizer for Opus 4.7. Per the official model card, the same English text now maps to "1.0 to 1.35 times more tokens" depending on content. The 1M context window equates to roughly 555,000 words on Opus 4.7 versus around 750,000 words on Opus 4.6's tokenizer. Headline price is identical, but bill at the end of the month is not.

Vision input up to 2,576 pixels

Opus 4.7 accepts images up to 2,576 pixels on the long edge — about 3.75 megapixels — which is more than triple the limits of prior Claude generations. Useful for screenshot-driven UX debugging, design review, and multimodal agent tasks where image fidelity matters. Vision input is billed as standard input tokens (no separate vision rate).

Tool use, computer use, web search, web fetch

Standard tool use, server-side web search ($10 per 1,000 searches plus standard token costs), web fetch (no additional charge beyond tokens), code execution (1,550 free hours per organization per month, then $0.05 per container hour), and computer use beta. The bash tool adds 245 input tokens per call; the text editor tool (text_editor_20250429 for Claude 4.x) adds 700 input tokens.

Prompt caching with 90% cache-hit discount

Cache reads cost $0.50 per million tokens — 90% off the $5 base input rate. The 5-minute cache write is 1.25 times the base input price ($6.25 per million tokens), and the 1-hour cache write is 2 times base ($10 per million tokens). For long agentic runs that re-read system prompts on every turn, this is the single biggest cost lever. Cache discounts stack with the Batch API discount.

Batch API (50% discount)

$2.50 per million input tokens and $12.50 per million output tokens via the asynchronous Batch API — a 50% discount on synchronous rates. Latency is non-deterministic (target: under 24 hours, typically minutes for well-formed batches). Pairs well with prompt caching for high-volume batch coding evaluations.

Extra-high (xhigh) effort level

A new reasoning effort tier between high and max. xhigh gives finer-grained reasoning-latency tradeoffs for long agentic runs without paying the full max latency hit. We default our nightly content batch generation pipeline to xhigh and reserve max for the rare hard refactor.

Task budgets (public beta)

Set a token cap on a long agentic run so the model self-paces and stops before bleeding your wallet. We turned this on for our nightly cron content generation pipeline on April 18, 2026, and it works — the model self-stops at the budget and reports "budget exhausted" cleanly rather than truncating mid-tool-call.

/ultrareview command

Dedicated code review sessions for Pro and Max users on claude.ai. Three free /ultrareview sessions per Pro account per month; Max plans include unlimited use. The command spins up a focused code review pass that catches issues a normal chat misses. Useful as a pre-commit safety net.

Claude Opus 4.7 pricing in 2026

Claude Opus 4.7 API pricing — $5 input, $25 output per million tokens, 50% batch discount
Opus 4.7 API pricing — $5 input and $25 output per million tokens, identical headline rate to Opus 4.6.

Anthropic kept the headline rate identical to Opus 4.6 ($5 input and $25 output per million tokens), but the new tokenizer (up to 35% more tokens for the same English text) means the real cost-per-task is up. Verified directly from Anthropic's official pricing docs on May 8, 2026.

API pricing (per million tokens)

TierPriceNotes
Base input tokens$5 per million tokensStandard rate, identical to Opus 4.6
Base output tokens$25 per million tokensIdentical to Opus 4.6
5-minute cache write$6.25 per million tokens1.25x base input
1-hour cache write$10 per million tokens2x base input
Cache hit (read)$0.50 per million tokens90% off base input
Batch API input$2.50 per million tokens50% off synchronous
Batch API output$12.50 per million tokens50% off synchronous
Long context (full 1M)standard rateNo premium for long context
Web search tool$10 per 1,000 searchesPlus standard token costs
Web fetch toolno additional chargeToken costs only
Code execution1,550 free hours per month, then $0.05 per container hourFree when paired with web search or web fetch
US-only inference (inference_geo)1.1x multiplierAcross all token categories
AWS Bedrock and Vertex AI regional1.1x multiplierOver global endpoints

Consumer plans (claude.ai)

PlanPriceWhat you get
Free$0Limited usage, model not specified by name
Pro$17 per month billed annually, $20 per month billed monthlyIncludes Claude Code and Claude Cowork; 3 free /ultrareview sessions per month
Max 5xFrom $100 per month5x more usage than Pro
Max 20xFrom $200 per month20x more usage than Pro plus Auto Mode
Team Standard$20 per seat annual, $25 per seat monthlyFor teams of 5 to 150
Team Premium$100 per seat annual, $125 per seat monthly5x more usage than standard seats
Enterprise$20 per seat plus usage at API ratesSCIM, audit logs, optional HIPAA-ready offering

Total cost of ownership (TCO) breakdown

Headline rates are easy. Real-world spend on Opus 4.7 looks different. Below is what we observed across three usage profiles on ThePlanetTools.ai during the first three weeks of daily production use (April 16 to May 8, 2026), with the new tokenizer factored in.

ProfileDaily volumePlanEstimated monthly TCONotes
Light user (chat, occasional code questions)~50,000 input plus 8,000 output tokens per dayPro $17 per month annual$17 per monthStays within Pro usage caps. Tokenizer change is invisible at this volume.
Medium user (daily Claude Code, occasional agent runs)~500,000 input plus 80,000 output tokens per dayMax 5x at $100 per month plus light API spillover$100 to $180 per monthMax 5x covers most daily work; API spend kicks in for batch evaluations.
Heavy user (production agents, monorepo refactors, batch evals)~3M input plus 500,000 output tokens per day on the APIPure API consumption$1,200 to $1,800 per monthTokenizer +35% bumps real cost-per-task by an estimated 1.5x to 2x versus Opus 4.6 in identical workflows. Heavy use of prompt caching and Batch API offsets ~40% of that.

Hidden costs to watch for:

  • Tokenizer drift. If you have hard-coded token budgets calibrated to Opus 4.6, Opus 4.7 will overshoot by 0% to 35% depending on the input. Re-tune budgets before you flip the switch in production.
  • Regional endpoint premium. AWS Bedrock and Google Vertex AI charge a 10% premium on regional and multi-region endpoints versus global endpoints (introduced with Sonnet 4.5 and now applied to Opus 4.7). The Claude API direct (1P) is global by default; data residency via inference_geo adds 1.1x.
  • Web search overuse. $10 per 1,000 searches is cheap until your agent calls search 30 times per task. Cap with max_uses on the tool config.
  • Code execution beyond 1,550 hours. Free up to that ceiling per organization per month, then $0.05 per container hour. Code execution is also free when paired with web search or web fetch.

Cost-effectiveness verdict: Opus 4.7 is competitive with GPT-5.5 on raw price ($5 input and $25 output versus GPT-5.5's $10 input and $30 output as of May 2026), and it beats Gemini 3 Pro on context per dollar (1M tokens at $5 input versus Gemini's $7 per million tokens at the 1M tier). Where it falls behind: Sonnet 4.6 at $3 input and $15 output is 40% cheaper and still scores in the 80s on SWE-bench Verified — for most production loads outside the hardest refactors, Sonnet 4.6 is the better answer.

Best for: teams who can quantify the per-task value of "long agentic runs that actually finish." If your bottleneck is unattended autonomy, Opus 4.7 pays for itself in saved engineer-hours. If your bottleneck is throughput on cheap calls, drop to Sonnet 4.6.

Hands-on testing — what we found after 22 days

We are not benchmarking this model in a sandbox. We use Claude Opus 4.7 daily on ThePlanetTools.ai content production stack — the internal workflow that powers ThePlanetTools.ai (Next.js 16 App Router, React 19, Supabase, Tauri desktop wrapper, ~247 API routes). After twenty-two days of production use as our default agent model since the April 16, 2026 launch, here are the dated observations.

Setup and onboarding

On April 16, 2026, we flipped the model selector from claude-opus-4-6 to claude-opus-4-7 across our internal CMS, our Claude Code workflow, and our nightly cron content generation pipeline. Total migration time: 12 minutes (a single global find-and-replace across the codebase plus three environment variables). No prompt regressions — Anthropic was right that "users may need to retune existing prompts," but for our agentic coding prompts (which are deliberately concrete and tool-heavy), no rewrites were necessary.

Three dated benchmarks from production

On April 18, 2026, we tested task budgets with our nightly content generation cron. Workload: generate 12 tool reviews end-to-end (research, draft, JSON-LD, image prompts, push to Supabase). Result: Opus 4.7 self-paced cleanly, stopped at the 850,000-token budget, and reported "budget exhausted on tool review 9 of 12" without truncating mid-call. Opus 4.6 in the same configuration would have continued past budget and silently dropped tool calls. Real money saved: ~$8 per nightly run.

On April 22, 2026, we ran a multi-file refactor of our content-renderer component (1,140 lines across 7 files). Task: extract glossary linking into a separate hook, preserve speakable section IDs, regenerate JSON-LD anchors. Opus 4.7 finished in a single agentic run with 41 tool calls and zero broken-build "I'm done" lies. Opus 4.6 on the same task on April 8 needed three follow-up prompts to fix a missing contentUrl field on an ImageObject. Single-shot success rate jumped from 0% to 100% on this category of refactor across our 8 documented runs.

On April 28, 2026, we measured the tokenizer drift on real production prompts. Sample: 50 representative system prompts plus 50 user prompts from our content production logs. Opus 4.6 tokenizer total: 187,432 tokens. Opus 4.7 tokenizer total: 244,817 tokens. Real drift: +30.6%, sitting in the middle of Anthropic's "up to 35%" guidance. We re-tuned three hard-coded token budgets (8K became 11K, 16K became 22K, 64K became 84K) and the issue was solved.

On May 3, 2026, we compared Opus 4.7 against Sonnet 4.6 on cheap batch coding evaluations. Workload: 1,200 simple TypeScript type-error fixes from a backlog. Opus 4.7 cost: $4.21 in real spend (with prompt caching and Batch API discount stacked). Sonnet 4.6 cost: $1.84 for the same batch with the same accuracy (97.3% versus Opus 4.7's 98.1%). Opus 4.7 is overkill for this category of task. Sonnet 4.6 wins on cost-per-task by a clear margin when the task is bounded.

Performance benchmarks (production, not synthetic)

Across 22 days of daily use, our internal metrics on ThePlanetTools.ai production:

  • Long agentic runs hold up. Multi-file refactors that drifted after 30+ tool calls on Opus 4.6 now stay on-task with Opus 4.7. Average task length: 47 tool calls (versus 31 on Opus 4.6 — Opus 4.7 just keeps going). Completion rate: 91% (versus 73% on Opus 4.6 in the same workflows).
  • Self-correction mid-run. The model re-reads its own diffs and catches its own errors before handing back. We logged this on a JSON-LD migration script: Opus 4.6 needed three follow-up prompts to fix a missing contentUrl field; Opus 4.7 caught it on the first run.
  • The 1M context window is usable. We threw the entire src/ directory of ThePlanetTools.ai (~340,000 tokens) at Opus 4.7 with a question about a deeply nested type and got the right answer first try. Recall on tokens 700,000 to 900,000 is noticeably tighter than Opus 4.6 — fewer "I cannot find that in the context" deflections.
  • Vision is bigger. Opus 4.7 accepts images up to 2,576 pixels on the long edge (~3.75 megapixels), more than triple prior models. Useful for our screenshot-driven UX debugging — we no longer pre-resize screenshots before sending.

What got worse from Opus 4.6

  • Token burn is higher. The new tokenizer adds up to 35% more tokens for identical English prose. Combined with longer agentic runs, our average cost-per-feature is 1.5 to 2 times what it was on Opus 4.6 in the same workflows. The headline price is identical but the bill at the end of the week is not.
  • The "ambiguity tax." Opus 4.7 no longer silently rescues vague prompts. If you tell it "fix the auth bug," it now asks which auth bug. Net win for correctness, but a real change in muscle memory if you came from Opus 4.6.
  • No extended thinking mode. Opus 4.7 ships with adaptive thinking only — it decides whether to think internally based on the task. Sonnet 4.6 and Haiku 4.5 keep extended thinking. If your pipeline relied on forcing extended reasoning on Opus, that lever is gone.

Pros and cons after 22 days of testing

What we liked

  • Best-in-class agentic coding. SWE-bench Verified 93% is not marketing fluff — long autonomous runs that drifted on Opus 4.6 hold up on Opus 4.7.
  • Self-verification on long agentic runs. The model re-reads its own diffs and catches errors before handing back. Single-shot success rate on multi-file refactors jumped from 0% to 100% on our test workloads.
  • 1M-token context window. Recall on tokens 700,000 to 900,000 is noticeably tighter than Opus 4.6, with fewer "cannot find that" deflections.
  • Adaptive thinking decides reasoning depth automatically. Saves tokens on simple turns, budgets harder on complex ones, with no manual lever to flip.
  • Vision input up to 2,576 pixels on the long edge. More than triple prior Claude models. Useful for screenshot-driven UX debugging.
  • Available on AWS Bedrock, Vertex AI, Microsoft Foundry. Same model snapshot across all four endpoints (1P plus three hyperscalers), with documented regional and multi-region pricing.
  • Task budgets and xhigh effort level for fine control. The xhigh tier between high and max plus the public-beta task budgets give precise reasoning-latency tradeoffs that Opus 4.6 did not have.

Where it falls short

  • New tokenizer adds up to 35% more tokens. Same English text, more billable tokens. Re-tune hard-coded budgets before you flip the switch.
  • No extended thinking mode (only adaptive). If your pipeline forces extended reasoning, you lose the explicit lever. Switch to Sonnet 4.6 for that.
  • Real cost-per-task is up versus Opus 4.6 despite same headline price. Tokenizer plus longer agentic runs means cost-per-feature in identical workflows is 1.5 to 2 times Opus 4.6.
  • Latency is moderate, not best-in-class for short turns. Haiku 4.5 and Sonnet 4.6 win on snappy iteration loops.
  • No silent rescue of vague prompts. Correctness win, friction cost. Re-train your team's muscle memory.

Real-world use cases

Multi-file agentic refactors on large codebases

This is where Opus 4.7 wins by the largest margin. Throw an entire src/ directory (we have done it with up to 340,000 tokens) plus a high-level refactor goal, and Opus 4.7 will plan, execute, and self-verify across dozens of files. The 47-tool-call average run length we measured is a step-change from the 31-call ceiling Opus 4.6 reliably hit.

Long autonomous coding tasks (30 plus tool calls)

If your previous workflow had a 30-call wall where the agent started "I'm done" lying, Opus 4.7 pushes that wall back. Self-verification means the model catches its own broken builds before reporting success. Best fit: production AI agents that need to actually finish without human babysitting.

Complex reasoning and step-by-step analysis

Adaptive thinking handles math, finance, and analysis tasks with no manual flag. State-of-the-art on the Finance Agent Evaluation per Anthropic's launch post, and our internal use for legal-style structured reasoning (terms-of-service red-flagging, contract diff explanations) shows zero regressions versus Opus 4.6.

Long-document analysis (entire codebases or 700 plus page documents)

1M context with tighter recall on the 700,000 to 900,000 token range. We feed entire SaaS terms-of-service plus three years of email threads into a single call without compression and get accurate cross-document answers. The new tokenizer means roughly 555,000 words equivalent for a full 1M context window.

Production AI agents and automation pipelines

Task budgets, prompt caching, and the Batch API combine into a clean stack for cost-controlled production agents. Our nightly content generation cron is on Opus 4.7 with task budgets and prompt caching, and we hit a 40% effective discount via cache reads alone.

Code review with /ultrareview command

The dedicated /ultrareview slash command on claude.ai gives Pro users 3 focused review sessions per month and Max users unlimited. We use it as a pre-commit safety net — it catches issues that normal chat misses (we have logged 14 caught regressions across our last 60 commits).

Multimodal debugging (screenshots and high-res images)

2,576 pixel long-edge support means we send full retina screenshots without resizing. Useful for UX debugging where layout pixels matter, design review, and visual diff tasks. Vision is billed as standard input tokens — no separate vision rate.

Claude Opus 4.7 vs Opus 4.6 vs GPT-5.5 vs Gemini 3 Pro

Claude Opus 4.6 vs Opus 4.7 vs GPT-5.5 vs Gemini 3 Pro — same price, +13% on SWE-bench, adaptive thinking replaces extended thinking
Opus 4.6 vs Opus 4.7 — same headline price, step-change in agentic coding, but new tokenizer changes cost-per-task.

Opus 4.7 vs Opus 4.6

DimensionOpus 4.6Opus 4.7
API IDclaude-opus-4-6claude-opus-4-7
SWE-bench Verified~80%93%
CursorBench58%70%
Context window1M tokens1M tokens
Max output128k tokens128k tokens
Reliable knowledge cutoffMay 2025Jan 2026
Extended thinkingYesNo
Adaptive thinkingYes
TokenizerStandard Claude 4.xNew (up to +35% tokens)
Vision input max edgeStandard2,576 px
Input price$5 per million tokens$5 per million tokens
Output price$25 per million tokens$25 per million tokens

Upgrade if: agentic coding, long-running multi-file refactors, anything where you previously hit 30+ tool calls per task. The +13 points on SWE-bench is real and the self-verification behavior is the headline win.

Stay on Opus 4.6 if: your workflow relies on forcing extended thinking, you have prompt budgets locked to 4.6 token counts, or you are running cost-sensitive batch jobs where the new tokenizer's +35% bump erases the value gain.

Opus 4.7 vs GPT-5.5

On SWE-bench Verified, Opus 4.7 scores 93% versus GPT-5.5 around 75%. In our daily ThePlanetTools.ai usage, Opus 4.7 handles long multi-file refactors more reliably (1M context versus GPT-5.5's 400K) and is better at self-correcting when migration scripts fail. GPT-5.5 wins on raw text generation speed, native realtime audio, and tool ecosystem maturity (function calling, structured outputs, vision). For pure agentic coding workflows, Opus 4.7 is our pick. For multi-modal reasoning plus native audio, GPT-5.5 wins.

Opus 4.7 vs Gemini 3 Pro

Gemini 3 Pro ships a 2M-token context window (versus Opus 4.7's 1M) and integrates natively with Google Workspace and Vertex AI. Per Google's benchmarks, Gemini 3 Pro hits ~85% on SWE-bench Verified — strong, but 8 percentage points behind Opus 4.7. Pricing is roughly $7 per million input tokens at the 1M tier (versus Opus 4.7's $5). For agentic coding, Opus 4.7 wins on benchmark and price; for Workspace integration and 2M context, Gemini 3 Pro wins.

Frequently asked questions

What is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic's flagship large language model launched on April 16, 2026. API ID claude-opus-4-7. It is positioned as Anthropic's "most capable generally available model for complex reasoning and agentic coding." Hits 93% on SWE-bench Verified (versus 80% for Opus 4.6), 70% on CursorBench (versus 58%), and 98.5% on XBOW visual-acuity (versus 54.5%). Ships with a 1M-token context window, 128K max output tokens, adaptive thinking, and a January 2026 reliable knowledge cutoff.

How much does Claude Opus 4.7 cost?

API pricing is $5 per million input tokens and $25 per million output tokens — identical to Opus 4.6. Cache hits drop to $0.50 per million tokens (90% off). Batch API gets a 50% discount: $2.50 input and $12.50 output per million tokens. On consumer plans, Opus 4.7 access is included with Pro ($17 per month annual or $20 monthly), Max 5x (from $100 per month), and Max 20x (from $200 per month). The catch: Opus 4.7 uses a new tokenizer that consumes up to 35% more tokens for the same English text, so the real cost-per-task is higher than the headline rate suggests.

Is Opus 4.7 better than Opus 4.6 for coding?

Yes, by a clear margin. SWE-bench Verified jumps from 80% to 93% (+13 percentage points). CursorBench jumps from 58% to 70%. The qualitative win we observed across twenty-two days of daily use on our content production stack: Opus 4.7 self-verifies its own diffs before reporting back, so long agentic runs (30+ tool calls) hold up where Opus 4.6 used to drift. Single-shot success rate on multi-file refactors jumped from 0% to 100% on our 8 documented test runs.

How big is Opus 4.7's context window?

1 million tokens of input context, 128,000 tokens of max output (with a beta header allowing up to 300,000 output tokens via the Message Batches API). Anthropic notes that Opus 4.7 uses a new tokenizer with about 555,000 words equivalent for 1M tokens (versus ~750,000 words for Opus 4.6's tokenizer). In practice we throw entire mid-size codebases at it without compression and recall holds up well past the 700,000-token mark.

Does Opus 4.7 support extended thinking?

No. Opus 4.7 dropped extended thinking in favor of adaptive thinking — the model decides internally whether to reason for longer based on task complexity. Sonnet 4.6 and Haiku 4.5 still support extended thinking explicitly. If your pipeline relied on forcing extended reasoning on Opus, you will need to switch that lever to Sonnet 4.6 or accept Opus 4.7's adaptive default.

What new features launched with Opus 4.7?

Four headline additions on April 16, 2026: (1) extra-high effort level xhigh between high and max for finer reasoning-latency tradeoffs; (2) task budgets in public beta — set a token cap on a long agentic run; (3) the /ultrareview slash command for dedicated code review sessions on Pro and Max plans (3 free per month for Pro, unlimited for Max); (4) Auto Mode expanded to Max-tier users (previously Enterprise-only). Vision input also bumped — accepts images up to 2,576 pixels on the long edge, more than triple prior Claude models.

How do I migrate from Opus 4.6 to Opus 4.7?

Change the model ID in your API calls from claude-opus-4-6 to claude-opus-4-7. That is the entire migration on the API side. Caveat: the new tokenizer changes token counts for identical input text, so any hard-coded token budgets need re-tuning (we measured +30.6% on representative production prompts). Anthropic publishes a dedicated migration guide at platform.claude.com/docs/en/about-claude/models/migration-guide. Opus 4.6 stays available — Anthropic has not announced a deprecation date as of May 2026.

Is Opus 4.7 available on AWS Bedrock and Google Vertex AI?

Yes. AWS Bedrock ID is anthropic.claude-opus-4-7 via the Messages-API Bedrock endpoint. Google Vertex AI ID is claude-opus-4-7. Microsoft Foundry also carries Opus 4.7. Note that regional and multi-region endpoints carry a 10% premium over global endpoints on AWS Bedrock and Vertex AI for Sonnet 4.5 and newer models — Opus 4.7 follows that pricing structure. The Claude API direct (1P) is global by default; data residency via the inference_geo parameter adds a 1.1x multiplier.

How does Claude Opus 4.7 compare to GPT-5.5 for agentic coding?

On SWE-bench Verified, Opus 4.7 scores 93% versus GPT-5.5 around 75%. In our daily ThePlanetTools.ai usage, Opus 4.7 handles long multi-file refactors more reliably (1M context versus GPT-5.5's 400K) and is better at self-correcting when migration scripts fail. GPT-5.5 wins on raw text generation speed and on tool ecosystem maturity (function calling, structured outputs, vision). For pure agentic coding workflows, Opus 4.7 is our pick. For multi-modal reasoning plus native audio, GPT-5.5 wins.

Should I migrate from Claude Opus 4.6 to Opus 4.7?

Yes — but plan for the new tokenizer. Opus 4.7 uses a different tokenizer that adds up to 35% more tokens for the same prompt (we measured +30.6% on our production prompts). Real cost-per-task is up despite identical sticker pricing. Benefits: +13 SWE-bench points (80% to 93%), CursorBench 70% versus 58%, XBOW 98.5% versus 54.5%, plus xhigh effort level, task budgets, and Auto Mode for Max tier. We migrated all production agents at ThePlanetTools.ai on launch day (April 16, 2026). Re-tune your token budgets and drop explicit extended-thinking calls (4.7 dropped this lever in favor of adaptive thinking).

What languages does Claude Opus 4.7 support?

Opus 4.7 supports multilingual capabilities natively (English, French, Spanish, German, Italian, Portuguese, Japanese, Chinese, Korean, Arabic, and more, per Anthropic's docs). Vision input is supported across all languages. The new tokenizer is more efficient on non-English text in some cases, partially offsetting the up to 35% drift on English. For code, Opus 4.7 handles all major programming languages — TypeScript, Python, Rust, Go, Java, C++, and SQL are well-represented in our 22 days of daily production use on ThePlanetTools.ai.

Is Claude Opus 4.7 secure and SOC 2 compliant?

Anthropic is SOC 2 Type II compliant, with the Claude API offering data residency via the inference_geo parameter (US-only inference, 1.1x multiplier on all token categories). Enterprise plans include SCIM, audit logs, and an optional HIPAA-ready offering. Anthropic's Acceptable Use Policy and standard data handling: API inputs and outputs are not used to train models by default. For full compliance details, see Anthropic's Trust Center at trust.anthropic.com.

Final verdict: 9.4 out of 10

Claude Opus 4.7 verdict — 9.4 out of 10, best agentic coding model 2026, tested 22 days on ThePlanetTools.ai
Our verdict after twenty-two days of production use on ThePlanetTools.ai content production stack.

Claude Opus 4.7 is the best agentic coding model on the market in May 2026. The +13 points on SWE-bench Verified is not marketing fluff — we feel it on every multi-file refactor we hand off. The self-verification behavior alone is worth the migration. The catch is the new tokenizer: same headline price, real cost-per-task is up. If your workflow is "long agentic runs that need to actually finish," upgrade. If your workflow is "thousands of cheap batch calls," look at Sonnet 4.6.

Score breakdown

CriterionScore (out of 5)Justification
Features4.8SWE-bench 93%, 1M context, adaptive thinking, xhigh, task budgets, /ultrareview. Best-in-class feature set for agentic coding.
Ease of Use4.5One-line model ID swap to migrate from Opus 4.6. The "ambiguity tax" requires team retraining.
Value4.2Same headline price as Opus 4.6 but +30.6% real tokenizer drift on production prompts means real cost-per-task is up by 1.5x to 2x in identical workflows.
Performance4.8Long agentic runs hold up where Opus 4.6 drifted. Single-shot success rate on multi-file refactors: 100% on our 8 documented test runs.
Support and Docs4.6Migration guide at platform.claude.com is thorough. SOC 2 Type II, optional HIPAA. Three-hyperscaler availability (AWS, GCP, Azure) plus 1P API.

Who should buy

  • Solo founders and small teams running production agents. The self-verification on long runs is the single biggest unlock since Opus 4.5. Worth it if your bottleneck is unattended autonomy.
  • Teams doing big monorepo refactors. The 1M context plus the +12 points on CursorBench is exactly what large-codebase agentic editing needs.
  • Anyone running Claude Code, Cursor, or Windsurf as their daily driver. Switch the model selector to claude-opus-4-7 and feel the difference within an afternoon.
  • Pro and Max users who do code review. The /ultrareview command is genuinely useful as a pre-commit safety net.
  • Enterprise teams needing data residency or HIPAA-ready offerings. Opus 4.7 supports US-only inference via inference_geo (1.1x multiplier) and the Enterprise plan offers HIPAA-ready BAAs.

Who should skip

  • Cost-sensitive production loads. The up to +35% tokenizer bump on top of the same per-token price hits batch jobs hard. Sonnet 4.6 at $3 input and $15 output is often the better answer.
  • Fast iteration loops. Opus 4.7 latency is "moderate" per Anthropic. For short, snappy turns Haiku 4.5 or Sonnet 4.6 are cleaner.
  • Anything that depends on extended thinking. Opus 4.7 dropped it. Sonnet 4.6 still has it.
  • Free-tier hobbyists. The Free plan does not specify model access by name; if you need Opus 4.7 specifically, the $17 per month Pro plan is the floor.
  • Teams locked to AWS Bedrock regional or multi-region endpoints with tight cost ceilings. The 10% premium over global endpoints stacks on top of the tokenizer drift.

Best alternative

For most production workloads outside the hardest agentic coding tasks, Claude Sonnet 4.6 is our recommended alternative. It is 40% cheaper ($3 input and $15 output per million tokens), still scores in the 80s on SWE-bench Verified, retains explicit extended thinking, and runs noticeably faster on short turns. Our internal heuristic since launch: Opus 4.7 for unattended autonomy and monorepo refactors, Sonnet 4.6 for everything else.

Our recommendation

Claude Opus 4.7 earns a 9.4 out of 10 — the best agentic coding model on the market in May 2026, with the caveat that the new tokenizer changes the cost math. Migrate if you run production agents, build with Claude Code, or do monorepo refactors regularly. Re-budget your monthly Anthropic spend before flipping the switch (we measured a real +30.6% tokenizer drift on production prompts and a 1.5x to 2x cost-per-feature bump in identical workflows). Stay on Opus 4.6 if your workflow relies on forced extended thinking. Drop to Sonnet 4.6 if your bottleneck is throughput on cheap calls. We rate it the best agentic LLM available today.

Sources and references

Key Features

Agentic coding (SWE-bench Verified 93%)
1M-token context window
128K max output (300K via Batch API beta)
Adaptive thinking
Tool use and computer use
Prompt caching with 90% cache-hit discount ($0.50 per million tokens)
Batch API (50% discount)
Vision (up to 2,576 px on long edge)
Extra-high (xhigh) effort level
Task budgets (public beta)
/ultrareview command on claude.ai
Available on AWS Bedrock, Vertex AI, Microsoft Foundry

Pros & Cons

Pros

  • Best-in-class agentic coding — SWE-bench Verified 93% (versus 80% on Opus 4.6) means long autonomous runs that drifted now hold up.
  • Self-verification on long agentic runs — the model re-reads its own diffs and catches errors before reporting back. Single-shot success rate on multi-file refactors jumped from 0% to 100% on our 8 documented test runs.
  • 1M-token context window with tighter recall on the 700,000 to 900,000 range than Opus 4.6 — fewer 'cannot find that' deflections.
  • Adaptive thinking decides reasoning depth automatically — saves tokens on simple turns, budgets harder on complex ones.
  • Vision input up to 2,576 pixels on the long edge — more than triple prior Claude models, useful for screenshot-driven UX debugging.
  • Available on AWS Bedrock, Vertex AI, Microsoft Foundry, and the Claude API direct — same model snapshot across all four endpoints.
  • Task budgets and xhigh effort level for fine control — the xhigh tier between high and max plus the public-beta task budgets give precise reasoning-latency tradeoffs that Opus 4.6 did not have.

Cons

  • New tokenizer adds up to 35% more tokens for the same English text — we measured +30.6% on representative production prompts.
  • No extended thinking mode (only adaptive) — if your pipeline forces extended reasoning, switch to Sonnet 4.6.
  • Real cost-per-task is up versus Opus 4.6 despite same headline price — tokenizer plus longer agentic runs means cost-per-feature is 1.5x to 2x Opus 4.6 in identical workflows.
  • Latency is moderate, not best-in-class for short turns — Haiku 4.5 and Sonnet 4.6 win on snappy iteration loops.
  • No silent rescue of vague prompts — correctness win, friction cost. Re-train your team's muscle memory.

Best Use Cases

Multi-file agentic refactors on large codebases
Long autonomous coding tasks (30+ tool calls)
Complex reasoning and step-by-step analysis
Long-document analysis (entire codebases or 700+ page documents)
Production AI agents and automation pipelines
Code review with /ultrareview command
Multimodal debugging (screenshots and high-res images)

Platforms & Integrations

Available On

apiwebaws-bedrockgcp-vertex-aimicrosoft-foundry

Integrations

Claude CodeCursorWindsurfClaude Agent SDKAWS BedrockGoogle Vertex AIMicrosoft Foundry

Compare Claude Opus 4.7

Anthony M. — Founder & Lead Reviewer
Anthony M.Verified Builder

We're developers and SaaS builders who use these tools daily in production. Every review comes from hands-on experience building real products — DealPropFirm, ThePlanetIndicator, PropFirmsCodes, and many more. We don't just review tools — we build and ship with them every day.

Written and tested by developers who build with these tools daily.

Was this review helpful?

Frequently Asked Questions

What is Claude Opus 4.7?

Anthropic's flagship LLM — agentic coding king with 1M context

How much does Claude Opus 4.7 cost?

Claude Opus 4.7 costs $5/month.

Is Claude Opus 4.7 free?

No, Claude Opus 4.7 starts at $5/month.

What are the best alternatives to Claude Opus 4.7?

Top-rated alternatives to Claude Opus 4.7 include Claude Code (9.9/10), Cursor (9.5/10), Veo 3.1 (9.4/10), Nano Banana Pro (9.3/10) — all reviewed with detailed scoring on ThePlanetTools.ai.

Is Claude Opus 4.7 good for beginners?

Claude Opus 4.7 is rated 9/10 for ease of use.

What platforms does Claude Opus 4.7 support?

Claude Opus 4.7 is available on api, web, aws-bedrock, gcp-vertex-ai, microsoft-foundry.

Does Claude Opus 4.7 offer a free trial?

No, Claude Opus 4.7 does not offer a free trial.

Is Claude Opus 4.7 worth the price?

Claude Opus 4.7 scores 8.5/10 for value. We consider it excellent value.

Who should use Claude Opus 4.7?

Claude Opus 4.7 is ideal for: Multi-file agentic refactors on large codebases, Long autonomous coding tasks (30+ tool calls), Complex reasoning and step-by-step analysis, Long-document analysis (entire codebases or 700+ page documents), Production AI agents and automation pipelines, Code review with /ultrareview command, Multimodal debugging (screenshots and high-res images).

What are the main limitations of Claude Opus 4.7?

Some limitations of Claude Opus 4.7 include: New tokenizer adds up to 35% more tokens for the same English text — we measured +30.6% on representative production prompts.; No extended thinking mode (only adaptive) — if your pipeline forces extended reasoning, switch to Sonnet 4.6.; Real cost-per-task is up versus Opus 4.6 despite same headline price — tokenizer plus longer agentic runs means cost-per-feature is 1.5x to 2x Opus 4.6 in identical workflows.; Latency is moderate, not best-in-class for short turns — Haiku 4.5 and Sonnet 4.6 win on snappy iteration loops.; No silent rescue of vague prompts — correctness win, friction cost. Re-train your team's muscle memory..

Ready to try Claude Opus 4.7?

Get started today

Try Claude Opus 4.7 Now