跳转到主要内容

Seed 2.1 Pro

ByteDanceProprietary

描述

ByteDance's flagship next-generation agent model built for real-world productivity. A deep-thinking model with strong demand understanding, long-horizon planning, and continuous self-repair, it delivers reliable end-to-end results across complex coding, long-chain agents, and multi-step engineering workflows. Seed 2.1 Pro also advances knowledge, reasoning, and multimodal understanding, with SOTA results across several video understanding benchmarks. Served via Volcano Engine as Doubao-Seed-2.1-pro.

发布日期
2026-06-24
参数规模
上下文长度
支持模态

能力雷达图

80
general
60
coding
70
reasoning
51
science估算
70
agents
70
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
智能体能力模型榜38
60.0
LS
多模态榜70
70.0
LS
推理能力79
56.0
LS

基准测试分数 (LLM Stats)

3d

BLINK81.4%自报

Agents

GDPval87.9%自报
BrowseComp86.2%自报
MCP Atlas83.8%自报
OSWorld78.8%自报
Web Bench78.4%自报
MobileWorld73.1%自报
OfficeQA Pro72.2%自报
Terminal-Bench 2.171.0%自报
CyberGym70.2%自报
OneMillion Bench68.8%自报
Agent Startup Bench68.8%自报
SeedClawBench66.6%自报
Trae Error Fix63.3%自报
Trae Code Gen62.4%自报
WildClawBench61.7%自报
xDailyBench61.0%自报
Finance Agent v1.160.7%自报
SWE-Bench Pro57.5%自报
Repo Env55.0%自报
PresentBench54.6%自报
Workspace Bench53.0%自报
Doubao Multi-Turn Bench52.5%自报
ClawEval-MM51.0%自报
Toolathlon50.6%自报
Program Bench50.3%自报
NL2Repo47.0%自报
CreativeWork42.5%自报
Agents' Last Exam41.4%自报
SWE-Atlas35.2%自报
APEX-Agents33.8%自报
DeepSWE32.7%自报
GameWorld31.2%自报
PostTrainBench16.5%自报

Biology

SciCode59.8%自报

Chemistry

SuperGPQA70.8%自报
SuperChem59.8%自报

Code

Artifacts Bench51.0%自报
FrontierCS46.3%自报

Coding

AetherCode65.8%自报
Image2FloorPlan48.0%自报

Embodied

EmbSpatialBench0.83 / 100自报

General

MMMU-Pro82.7%自报
SimpleVQA0.74 / 100自报
MSQA50.2%自报
KINA48.3%自报

Image To Text

OCRBench_V263.2%自报

Knowledge

VideoSimpleQA76.4%自报
WorldBench67.6%自报

Long Context

DUDE82.8%自报
LongVideoBench80.6%自报
MMLongBench-128K78.3%自报
LVBench78.0%自报

Math

MathVision94.5%自报
MathVista90.7%自报
MathVerse89.7%自报
Beyond AIME87.0%自报
EMMA79.3%自报
FrontierScience Olympiad75.0%自报
DynaMath73.1%自报
IMO 20250.65 / 42自报
Humanity's Last Exam55.7%自报
IMOProof-Adv54.3%自报
MathArena Apex31.3%自报
LiveMathematicianBench20.9%自报
HorizonMath2.0%自报

Multimodal

CharXiv-D95.5%自报
Video-MME89.2%自报
CharXiv-R86.4%自报
VLMsAreBiased83.6%自报
OVOBench80.7%自报
TVBench80.5%自报
TOMATO79.5%自报
LiveSports-3K76.8%自报
MotionBench74.9%自报
BabyVision73.7%自报
TreeBench71.1%自报
ChartQAPro70.9%自报
Minerva70.7%自报
OVBench70.0%自报
VideoHolmes68.2%自报
CrossVid65.0%自报
ContPhy63.6%自报
MeasureBench62.9%自报
ZEROBench0.56 / 100自报
VisuLogic0.54 / 100自报
WorldVQA53.0%自报
VisFactor51.4%自报
MMSIBench35.9%自报

Physics

IPhO 202579.3%自报

Reasoning

ERQA72.0%自报
ArcAGI262.5%自报
FrontierScience Research28.3%自报

Spatial Reasoning

RealWorldQA86.7%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Structured Output
100
Search
90
Legal
80
Long Context
80
Spatial Reasoning
80
Embodied
80
Finance
80
General
80
3d
80
Image To Text
70
Math
70
Multimodal
70
Physics
70
Reasoning
70
Safety
70
Healthcare
70
Chemistry
70
Economics
70
Tool Calling
70
Video
70
Vision
70
Agents
60
Biology
60
Code
60
Frontend Development
50
Coding
50
Science
30
Systems
20

定价

暂无定价数据

速度

暂无速度数据

供应商价格排行

暂无提供商数据

外部链接