跳轉到主要內容

Seed 2.1 Pro

ByteDanceProprietary

描述

ByteDance's flagship next-generation agent model built for real-world productivity. A deep-thinking model with strong demand understanding, long-horizon planning, and continuous self-repair, it delivers reliable end-to-end results across complex coding, long-chain agents, and multi-step engineering workflows. Seed 2.1 Pro also advances knowledge, reasoning, and multimodal understanding, with SOTA results across several video understanding benchmarks. Served via Volcano Engine as Doubao-Seed-2.1-pro.

發布日期
2026-06-24
參數規模
上下文長度
支援模態

能力雷達圖

80
general
60
coding
70
reasoning
51
science估算
70
agents
70
multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

領域#排名分數來源
智慧體能力模型榜38
60.0
LS
多模態榜70
70.0
LS
推理能力79
56.0
LS

基準測試分數 (LLM Stats)

3d

BLINK81.4%自報

Agents

GDPval87.9%自報
BrowseComp86.2%自報
MCP Atlas83.8%自報
OSWorld78.8%自報
Web Bench78.4%自報
MobileWorld73.1%自報
OfficeQA Pro72.2%自報
Terminal-Bench 2.171.0%自報
CyberGym70.2%自報
OneMillion Bench68.8%自報
Agent Startup Bench68.8%自報
SeedClawBench66.6%自報
Trae Error Fix63.3%自報
Trae Code Gen62.4%自報
WildClawBench61.7%自報
xDailyBench61.0%自報
Finance Agent v1.160.7%自報
SWE-Bench Pro57.5%自報
Repo Env55.0%自報
PresentBench54.6%自報
Workspace Bench53.0%自報
Doubao Multi-Turn Bench52.5%自報
ClawEval-MM51.0%自報
Toolathlon50.6%自報
Program Bench50.3%自報
NL2Repo47.0%自報
CreativeWork42.5%自報
Agents' Last Exam41.4%自報
SWE-Atlas35.2%自報
APEX-Agents33.8%自報
DeepSWE32.7%自報
GameWorld31.2%自報
PostTrainBench16.5%自報

Biology

SciCode59.8%自報

Chemistry

SuperGPQA70.8%自報
SuperChem59.8%自報

Code

Artifacts Bench51.0%自報
FrontierCS46.3%自報

Coding

AetherCode65.8%自報
Image2FloorPlan48.0%自報

Embodied

EmbSpatialBench0.83 / 100自報

General

MMMU-Pro82.7%自報
SimpleVQA0.74 / 100自報
MSQA50.2%自報
KINA48.3%自報

Image To Text

OCRBench_V263.2%自報

Knowledge

VideoSimpleQA76.4%自報
WorldBench67.6%自報

Long Context

DUDE82.8%自報
LongVideoBench80.6%自報
MMLongBench-128K78.3%自報
LVBench78.0%自報

Math

MathVision94.5%自報
MathVista90.7%自報
MathVerse89.7%自報
Beyond AIME87.0%自報
EMMA79.3%自報
FrontierScience Olympiad75.0%自報
DynaMath73.1%自報
IMO 20250.65 / 42自報
Humanity's Last Exam55.7%自報
IMOProof-Adv54.3%自報
MathArena Apex31.3%自報
LiveMathematicianBench20.9%自報
HorizonMath2.0%自報

Multimodal

CharXiv-D95.5%自報
Video-MME89.2%自報
CharXiv-R86.4%自報
VLMsAreBiased83.6%自報
OVOBench80.7%自報
TVBench80.5%自報
TOMATO79.5%自報
LiveSports-3K76.8%自報
MotionBench74.9%自報
BabyVision73.7%自報
TreeBench71.1%自報
ChartQAPro70.9%自報
Minerva70.7%自報
OVBench70.0%自報
VideoHolmes68.2%自報
CrossVid65.0%自報
ContPhy63.6%自報
MeasureBench62.9%自報
ZEROBench0.56 / 100自報
VisuLogic0.54 / 100自報
WorldVQA53.0%自報
VisFactor51.4%自報
MMSIBench35.9%自報

Physics

IPhO 202579.3%自報

Reasoning

ERQA72.0%自報
ArcAGI262.5%自報
FrontierScience Research28.3%自報

Spatial Reasoning

RealWorldQA86.7%自報

AA 評測指數

暫無 AA 評測資料

LLM Stats 分類評分

Structured Output
100
Search
90
Legal
80
Long Context
80
Spatial Reasoning
80
Embodied
80
Finance
80
General
80
3d
80
Image To Text
70
Math
70
Multimodal
70
Physics
70
Reasoning
70
Safety
70
Healthcare
70
Chemistry
70
Economics
70
Tool Calling
70
Video
70
Vision
70
Agents
60
Biology
60
Code
60
Frontend Development
50
Coding
50
Science
30
Systems
20

定價

暫無定價資料

速度

暫無速度資料

供應商價格排行

暫無提供商資料

外部連結