Перейти к основному содержанию

Claude Mythos Preview

AnthropicClaudeProprietary

Описание

Claude Mythos Preview is an unreleased general-purpose frontier model from Anthropic, a new tier above Opus (internal codename 'Capybara'). It identified thousands of zero-day vulnerabilities across every major operating system and web browser as part of Project Glasswing, a cross-industry cybersecurity initiative with 12 partners including AWS, Apple, Microsoft, and Google. State-of-the-art on SWE-bench Verified (93.9%), GPQA Diamond (94.6%), USAMO (97.6%), Terminal-Bench 2.0 (82.0%), CyberGym (83.1%), and Cybench (100% pass@1, saturated). Represents a 4.3x increase over the previous trendline for model performance. Deployed under ASL-3 Standard. Best-aligned Claude model to date per Anthropic's risk report, with the first-ever 24-hour internal alignment review before deployment. Not planned for general availability. Pricing for participants: $25/$125 per million tokens (input/output). 244-page system card.

Дата выхода
Параметры
Длина контекста
Модальности
image, text

Радар способностей

90
general
80
coding
80
reasoning
77
scienceоцен.
80
agents
90
multimodal

Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.

Рейтинги

Домен#МестоОценкаИсточник
Agents & Tools3
79.0
LS
Multimodal Ranking3
93.0
LS

Оценки бенчмарков (LLM Stats)

Agents

CyBench100.0%Сам.
BrowseComp86.9%Сам.
CyberGym83.1%Сам.
Terminal-Bench 2.082.0%Сам.
OSWorld-Verified79.6%Сам.
SWE-Bench Pro77.8%Сам.
SWE-Bench Multimodal59.0%Сам.

Biology

GPQA94.6%Сам.

Code

SWE-Bench Verified93.9%Сам.
SWE-bench Multilingual87.3%Сам.

General

MMMLU92.7%Сам.

Healthcare

FigQA89.0%Сам.

Long Context

Graphwalks BFS >128k80.0%Сам.

Math

USAMO2597.6%Сам.
Humanity's Last Exam64.7%Сам.

Multimodal

CharXiv-R93.2%Сам.

Индексы оценки AA

Нет данных AA оценки

Оценки категорий LLM Stats

Biology
90
Chemistry
90
Frontend Development
90
General
90
Healthcare
90
Language
90
Multimodal
90
Physics
90
Reasoning
90
Safety
90
Search
90
Spatial Reasoning
80
Tool Calling
80
Vision
80
Agents
80
Code
80
Long Context
80
Math
80

Цены

Нет данных о ценах

Скорость

Нет данных о скорости

Доступные провайдеры

(Внутренние единицы LS)

Нет данных провайдеров

Внешние ссылки