Claude Sonnet 5 (Adaptive Reasoning, Xhigh Effort)

AnthropicClaude

描述

Claude Sonnet 5 is Anthropic's most agentic Sonnet-class model, an upgrade to Sonnet 4.6 that narrows the gap to Opus 4.8 on reasoning, tool use, coding, computer use, and knowledge work while staying lower priced. It plans, uses tools like browsers and terminals, and runs autonomously for long-horizon tasks. Capability gains include SWE-Bench Verified (85.2%), SWE-Bench Pro (63.2%), SWE-Bench Multilingual (78.3%), Terminal-Bench 2.1 (80.4%), OSWorld-Verified (81.2%), BrowseComp (84.7% single-agent, 86.6% multi-agent), Humanity's Last Exam with tools (57.4%), USAMO 2026 (79.5%), GDPval-AA v2 (1618 Elo), HealthBench Professional (57.8%), and FrontierCode v1 (38.8%). It supports adaptive thinking with selectable effort levels up to 'extra high' (xhigh) and a 1M-token context window with context compaction. The safety assessment found lower rates of misaligned behavior, hallucination, and sycophancy than Sonnet 4.6, with improved prompt-injection robustness; it ships with cyber safeguards enabled by default and uses an updated tokenizer (input maps to roughly 1.0-1.35x more tokens than Sonnet 4.6). Default model on Free and Pro plans and available to Max, Team, and Enterprise users, in Claude Code, and on the Claude Platform. Launches with introductory pricing of $2/$10 per million input/output tokens through August 31, 2026, then $3/$15. Available via the Claude API as `claude-sonnet-5`.

發布日期

2026-06-30

參數規模

—

上下文長度

1.0M

支援模態

image, pdf, text

能力雷達圖

100

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

暫無排名資料

基準測試分數 (LLM Stats)

Agents

GDPval-AA

1618.00 / 3000自報

BrowseComp

84.7%自報

OSWorld-Verified

81.2%自報

Terminal-Bench 2.0

80.4%自報

SWE-Bench Pro

63.2%自報

OfficeQA Pro

59.4%自報

Toolathlon

54.3%自報

FrontierCode

38.8%自報

SWE-Bench Multimodal

28.1%自報

AutomationBench

13.5%自報

Legal Agent Benchmark

5.8%自報

Code

SWE-Bench Verified

85.2%自報

SWE-bench Multilingual

78.3%自報

BenchCAD

37.3%自報

General

GDP.pdf

81.6%自報

Healthcare

HealthBench Professional

57.8%自報

Math

USAMO 2026

33.39 / 42自報

ArXivMath

72.2%自報

Humanity's Last Exam

57.4%自報

Multimodal

CharXiv-R

88.3%自報

ChartMuseum

86.7%自報

AA 評測指數

暫無 AA 評測資料

LLM Stats 分類評分

Finance

100

Legal

100

General

100

Agents

100

Reasoning

100

Frontend Development

Multimodal

Code

Tool Calling

Math

Healthcare

Vision

定價

輸入價格$3 / 1M tokens

輸出價格$15 / 1M tokens

混合價格(3:1)$6 / 1M tokens

快取讀取價格$0.2 / 1M tokens

快取寫入價格$2.5 / 1M tokens

速度

Tokens/秒76.3

首Token延遲9.68s

首回答延遲9.68s

供應商價格排行

1 個供應商

供應商輸入輸出

1Anthropic主要

$15

比較該模型在不同 API 供應商之間的定價。

外部連結

Artificial Analysis