Saltar al contenido principal

Seed 2.1 Pro

ByteDanceProprietary

Descripción

ByteDance's flagship next-generation agent model built for real-world productivity. A deep-thinking model with strong demand understanding, long-horizon planning, and continuous self-repair, it delivers reliable end-to-end results across complex coding, long-chain agents, and multi-step engineering workflows. Seed 2.1 Pro also advances knowledge, reasoning, and multimodal understanding, with SOTA results across several video understanding benchmarks. Served via Volcano Engine as Doubao-Seed-2.1-pro.

Fecha de lanzamiento
2026-06-24
Parámetros
Longitud del contexto
Modalidades

Radar de capacidades

80
general
60
coding
70
reasoning
51
scienceest.
70
agents
70
multimodal

Science usa un proxy de razonamiento cuando los benchmarks científicos dedicados no están disponibles.

Rankings

Dominio#PosiciónPuntuaciónFuente
Capacidad agéntica38
60.0
LS
Ranking multimodal70
70.0
LS
Razonamiento79
56.0
LS

Puntuaciones de benchmarks (LLM Stats)

3d

BLINK81.4%Aut.

Agents

GDPval87.9%Aut.
BrowseComp86.2%Aut.
MCP Atlas83.8%Aut.
OSWorld78.8%Aut.
Web Bench78.4%Aut.
MobileWorld73.1%Aut.
OfficeQA Pro72.2%Aut.
Terminal-Bench 2.171.0%Aut.
CyberGym70.2%Aut.
OneMillion Bench68.8%Aut.
Agent Startup Bench68.8%Aut.
SeedClawBench66.6%Aut.
Trae Error Fix63.3%Aut.
Trae Code Gen62.4%Aut.
WildClawBench61.7%Aut.
xDailyBench61.0%Aut.
Finance Agent v1.160.7%Aut.
SWE-Bench Pro57.5%Aut.
Repo Env55.0%Aut.
PresentBench54.6%Aut.
Workspace Bench53.0%Aut.
Doubao Multi-Turn Bench52.5%Aut.
ClawEval-MM51.0%Aut.
Toolathlon50.6%Aut.
Program Bench50.3%Aut.
NL2Repo47.0%Aut.
CreativeWork42.5%Aut.
Agents' Last Exam41.4%Aut.
SWE-Atlas35.2%Aut.
APEX-Agents33.8%Aut.
DeepSWE32.7%Aut.
GameWorld31.2%Aut.
PostTrainBench16.5%Aut.

Biology

SciCode59.8%Aut.

Chemistry

SuperGPQA70.8%Aut.
SuperChem59.8%Aut.

Code

Artifacts Bench51.0%Aut.
FrontierCS46.3%Aut.

Coding

AetherCode65.8%Aut.
Image2FloorPlan48.0%Aut.

Embodied

EmbSpatialBench0.83 / 100Aut.

General

MMMU-Pro82.7%Aut.
SimpleVQA0.74 / 100Aut.
MSQA50.2%Aut.
KINA48.3%Aut.

Image To Text

OCRBench_V263.2%Aut.

Knowledge

VideoSimpleQA76.4%Aut.
WorldBench67.6%Aut.

Long Context

DUDE82.8%Aut.
LongVideoBench80.6%Aut.
MMLongBench-128K78.3%Aut.
LVBench78.0%Aut.

Math

MathVision94.5%Aut.
MathVista90.7%Aut.
MathVerse89.7%Aut.
Beyond AIME87.0%Aut.
EMMA79.3%Aut.
FrontierScience Olympiad75.0%Aut.
DynaMath73.1%Aut.
IMO 20250.65 / 42Aut.
Humanity's Last Exam55.7%Aut.
IMOProof-Adv54.3%Aut.
MathArena Apex31.3%Aut.
LiveMathematicianBench20.9%Aut.
HorizonMath2.0%Aut.

Multimodal

CharXiv-D95.5%Aut.
Video-MME89.2%Aut.
CharXiv-R86.4%Aut.
VLMsAreBiased83.6%Aut.
OVOBench80.7%Aut.
TVBench80.5%Aut.
TOMATO79.5%Aut.
LiveSports-3K76.8%Aut.
MotionBench74.9%Aut.
BabyVision73.7%Aut.
TreeBench71.1%Aut.
ChartQAPro70.9%Aut.
Minerva70.7%Aut.
OVBench70.0%Aut.
VideoHolmes68.2%Aut.
CrossVid65.0%Aut.
ContPhy63.6%Aut.
MeasureBench62.9%Aut.
ZEROBench0.56 / 100Aut.
VisuLogic0.54 / 100Aut.
WorldVQA53.0%Aut.
VisFactor51.4%Aut.
MMSIBench35.9%Aut.

Physics

IPhO 202579.3%Aut.

Reasoning

ERQA72.0%Aut.
ArcAGI262.5%Aut.
FrontierScience Research28.3%Aut.

Spatial Reasoning

RealWorldQA86.7%Aut.

Índices de evaluación AA

No hay datos de evaluación AA disponibles

Puntuaciones por categoría LLM Stats

Structured Output
100
Search
90
Legal
80
Long Context
80
Spatial Reasoning
80
Embodied
80
Finance
80
General
80
3d
80
Image To Text
70
Math
70
Multimodal
70
Physics
70
Reasoning
70
Safety
70
Healthcare
70
Chemistry
70
Economics
70
Tool Calling
70
Video
70
Vision
70
Agents
60
Biology
60
Code
60
Frontend Development
50
Coding
50
Science
30
Systems
20

Precios

No hay datos de precios disponibles

Velocidad

No hay datos de velocidad disponibles

Ranking de Precios por Proveedor

No hay datos de proveedores disponibles

Fuentes externas