Claude Opus 4.7 Released: 3× Wins & New xhigh Tier

On April 16, 2026, Anthropic shipped Claude Opus 4.7 — the next iteration of its flagship model, laser-focused on long-context tasks, autonomous reasoning, and disciplined execution across multi-hour engineering projects. The launch arrives just as the entire industry is shifting weight from single-shot prompts to agentic workflows — sessions where a model plans, writes code, fixes bugs, and reports progress for hours or days without human supervision. Opus 4.7 is tuned exactly for that: slower error cascades, better cross-session memory, and visibly stronger results on engineering benchmarks.

Claude Opus 4.7 — coding and agentic benchmark results

What’s New in Claude Opus 4.7

The new model keeps the same pricing as its predecessor ($5 per million input tokens, $25 per million output tokens), but ships meaningful upgrades across architecture and reasoning control. This is not a cosmetic bump — Anthropic explicitly notes the model is “substantially better at following instructions” and warns that some teams will need to re-tune their prompts to get the full benefit.

Key changes versus Opus 4.6:

xhigh effort level — a new reasoning tier sitting between high and max, giving finer control over the cost-to-quality trade-off.
Task budgets — a parameter that caps token spend inside long agentic runs without manual interruption.
Upgraded vision — images up to 2,576 px on the long edge (~3.75 megapixels), more than 3× the previous capacity.
/ultrareview command — a dedicated code-review mode for Claude Pro and Max subscribers.
Auto mode — autonomous decision-making expanded to Max plan users.
Updated tokenizer — better text handling, but the same prompt can now generate 1.0–1.35× more tokens than before, which matters for API budgets.

Benchmarks: Real Production Numbers, Not Marketing

Anthropic backs the release with hard numbers measured alongside production partners. That’s a tonal shift — instead of closed academic evaluations, the company is leaning on data from real engineering environments.

Benchmark / Partner	Opus 4.7	Opus 4.6
Replit (93-task coding test)	+13%	baseline
Rakuten (production tasks)	3× more resolutions	baseline
CursorBench	70% success	58% success
GDPval-AA (finance/legal)	state-of-the-art	lower
XBOW (pentesting)	98.5% visual acuity	—

The Rakuten result deserves attention — a 3× jump in resolved production tasks isn’t just about code quality. It signals the model is dramatically more resistant to “getting stuck” in long workflows. That translates directly to operating cost: fewer iterations, fewer wasted tokens, fewer human corrections needed downstream.

Agentic Behavior and Cross-Session Memory

The most important direction in Opus 4.7 is operating across long time horizons. The model has been tuned for scenarios where an agent runs for hours or days — writing code, running tests, resuming tasks after interruption, and keeping decisions consistent along the way.

The key improvements in this area:

Better cross-session memory use — the model handles earlier notes, logs, and project artifacts more effectively.
Tool coordination — smoother chaining of function calls inside multi-step plans.
Long-context stability — fewer “fatigue hallucinations” when working with large repositories.

For teams building their own integrations with Claude-class models — whether through the API or Claude Code — this means you can push more logic into the model itself and strip out some of the orchestration layers that used to be required to keep things on the rails.

Tokenizer Changes: What This Means for Your API Bill

The updated tokenizer improves text handling, but it comes with a practical consequence: the same prompts can cost more. Anthropic reports a 1.0–1.35× multiplier depending on content, and at xhigh and max effort the model also produces more output tokens — deliberately, in service of higher reliability.

What this means in practice for production teams:

Revisit monthly budgets — especially if you run large contexts or many parallel agents.
Audit your long prompts — non-Latin scripts and code with many special characters are particularly sensitive.
Run A/B cost tests — even at identical per-million rates, the effective cost per request can rise 10–35%.

Safety and Prompt Injection Resistance

The external safety assessment is cautiously positive. Opus 4.7 is judged as “largely well-aligned and trustworthy,” with visible gains in honesty and resistance to prompt injection — one of the biggest attack vectors against agentic systems.

The downside is a modest regression in harm-reduction advice, which Anthropic openly flags in the documentation. For teams building products in regulated spaces — health, finance, legal — that means maintaining additional filter layers and validation, just as with earlier models in this class.

Availability and Integrations

Claude Opus 4.7 is available from day one across every major platform:

Claude.ai — Free, Pro, Max, and Teams/Enterprise users.
Anthropic API — under the model ID claude-opus-4-7.
Amazon Bedrock — in all regions supporting the Claude family.
Google Cloud Vertex AI — full integration with GCP pipelines.
Microsoft Foundry — a new distribution channel in the Microsoft ecosystem.

Pro and Max subscribers also get access to the /ultrareview command, which kicks off a dedicated code-review session where the model spends a larger compute budget on verifying changes, catching regressions, and proposing justified refactors.

Summary

Claude Opus 4.7 is an evolutionary model — but in the places that actually matter for production teams: long agentic sessions, precise instruction-following, discipline across multi-hour engineering projects, and finer control over the cost-to-quality trade-off thanks to the new xhigh tier. The 3× lift in resolved production tasks at partners like Rakuten shows the improvement goes beyond benchmarks and translates into real operational savings.

For companies building workflows on top of AI-powered SEO or their own agentic tooling, Opus 4.7 sends a clear signal: the model layer is absorbing another chunk of autonomy, and the work of orchestration should increasingly focus on boundaries and observability rather than hand-holding the model step by step. The real test, as always, will be the API bill after a few weeks of production use — especially given the tokenizer changes.

Sources

Introducing Claude Opus 4.7 — Anthropic https://www.anthropic.com/news/claude-opus-4-7
Claude Opus 4.7 System Card — Anthropic https://www.anthropic.com/claude-opus-4-7-system-card
Claude API Documentation — Anthropic https://docs.anthropic.com/claude/docs
Amazon Bedrock — Claude Models https://aws.amazon.com/bedrock/claude/
Vertex AI — Claude on Google Cloud https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude

Claude Opus 4.7 Released — 3× More Production Wins, 13% Better Coding, and the New xhigh Reasoning Tier Explained

What’s New in Claude Opus 4.7

Benchmarks: Real Production Numbers, Not Marketing

Agentic Behavior and Cross-Session Memory

Tokenizer Changes: What This Means for Your API Bill

Safety and Prompt Injection Resistance

Availability and Integrations

Summary

Sources

Tags:

Claude Opus 4.7 Released — 3× More Production Wins, 13% Better Coding, and the New xhigh Reasoning Tier Explained

What’s New in Claude Opus 4.7

Benchmarks: Real Production Numbers, Not Marketing

Agentic Behavior and Cross-Session Memory

Tokenizer Changes: What This Means for Your API Bill

Safety and Prompt Injection Resistance

Availability and Integrations

Summary

Sources

Tags:

Related articles

Claude Fable 5 and Mythos 5 — Anthropic's most powerful model, with an asterisk

Claude Design: Anthropic Labs' Conversational Design Tool

Claude Mythos and the Capybara Class: Is This AI Too Powerful for Humanity?