Seed 2.1 Pro

ByteDanceProprietary

Descripción

ByteDance's flagship next-generation agent model built for real-world productivity. A deep-thinking model with strong demand understanding, long-horizon planning, and continuous self-repair, it delivers reliable end-to-end results across complex coding, long-chain agents, and multi-step engineering workflows. Seed 2.1 Pro also advances knowledge, reasoning, and multimodal understanding, with SOTA results across several video understanding benchmarks. Served via Volcano Engine as Doubao-Seed-2.1-pro.

Fecha de lanzamiento

2026-06-24

Parámetros

—

Longitud del contexto

—

Modalidades

—

Radar de capacidades

general

coding

reasoning

scienceest.

agents

multimodal

Science usa un proxy de razonamiento cuando los benchmarks científicos dedicados no están disponibles.

Rankings

Dominio	#Posición	Puntuación	Fuente
Capacidad agéntica	38	60.0	LS
Ranking multimodal	70	70.0	LS
Razonamiento	79	56.0	LS

Puntuaciones de benchmarks (LLM Stats)

3d

BLINK

81.4%Aut.

Agents

GDPval

87.9%Aut.

BrowseComp

86.2%Aut.

MCP Atlas

83.8%Aut.

OSWorld

78.8%Aut.

Web Bench

78.4%Aut.

MobileWorld

73.1%Aut.

OfficeQA Pro

72.2%Aut.

Terminal-Bench 2.1

71.0%Aut.

CyberGym

70.2%Aut.

OneMillion Bench

68.8%Aut.

Agent Startup Bench

68.8%Aut.

SeedClawBench

66.6%Aut.

Trae Error Fix

63.3%Aut.

Trae Code Gen

62.4%Aut.

WildClawBench

61.7%Aut.

xDailyBench

61.0%Aut.

Finance Agent v1.1

60.7%Aut.

SWE-Bench Pro

57.5%Aut.

Repo Env

55.0%Aut.

PresentBench

54.6%Aut.

Workspace Bench

53.0%Aut.

Doubao Multi-Turn Bench

52.5%Aut.

ClawEval-MM

51.0%Aut.

Toolathlon

50.6%Aut.

Program Bench

50.3%Aut.

NL2Repo

47.0%Aut.

CreativeWork

42.5%Aut.

Agents' Last Exam

41.4%Aut.

SWE-Atlas

35.2%Aut.

APEX-Agents

33.8%Aut.

DeepSWE

32.7%Aut.

GameWorld

31.2%Aut.

PostTrainBench

16.5%Aut.

Biology

SciCode

59.8%Aut.

Chemistry

SuperGPQA

70.8%Aut.

SuperChem

59.8%Aut.

Code

Artifacts Bench

51.0%Aut.

FrontierCS

46.3%Aut.

Coding

AetherCode

65.8%Aut.

Image2FloorPlan

48.0%Aut.

Embodied

EmbSpatialBench

0.83 / 100Aut.

General

MMMU-Pro

82.7%Aut.

SimpleVQA

0.74 / 100Aut.

MSQA

50.2%Aut.

KINA

48.3%Aut.

Image To Text

OCRBench_V2

63.2%Aut.

Knowledge

VideoSimpleQA

76.4%Aut.

WorldBench

67.6%Aut.

Long Context

DUDE

82.8%Aut.

LongVideoBench

80.6%Aut.

MMLongBench-128K

78.3%Aut.

LVBench

78.0%Aut.

Math

MathVision

94.5%Aut.

MathVista

90.7%Aut.

MathVerse

89.7%Aut.

Beyond AIME

87.0%Aut.

EMMA

79.3%Aut.

FrontierScience Olympiad

75.0%Aut.

DynaMath

73.1%Aut.

IMO 2025

0.65 / 42Aut.

Humanity's Last Exam

55.7%Aut.

IMOProof-Adv

54.3%Aut.

MathArena Apex

31.3%Aut.

LiveMathematicianBench

20.9%Aut.

HorizonMath

2.0%Aut.

Multimodal

CharXiv-D

95.5%Aut.

Video-MME

89.2%Aut.

CharXiv-R

86.4%Aut.

VLMsAreBiased

83.6%Aut.

OVOBench

80.7%Aut.

TVBench

80.5%Aut.

TOMATO

79.5%Aut.

LiveSports-3K

76.8%Aut.

MotionBench

74.9%Aut.

BabyVision

73.7%Aut.

TreeBench

71.1%Aut.

ChartQAPro

70.9%Aut.

Minerva

70.7%Aut.

OVBench

70.0%Aut.

VideoHolmes

68.2%Aut.

CrossVid

65.0%Aut.

ContPhy

63.6%Aut.

MeasureBench

62.9%Aut.

ZEROBench

0.56 / 100Aut.

VisuLogic

0.54 / 100Aut.

WorldVQA

53.0%Aut.

VisFactor

51.4%Aut.

MMSIBench

35.9%Aut.

Physics

IPhO 2025

79.3%Aut.

Reasoning

ERQA

72.0%Aut.

ArcAGI2

62.5%Aut.

FrontierScience Research

28.3%Aut.

Spatial Reasoning

RealWorldQA

86.7%Aut.

Índices de evaluación AA

No hay datos de evaluación AA disponibles

Puntuaciones por categoría LLM Stats

Structured Output

100

Legal

Long Context

Spatial Reasoning

Embodied

Finance

General

Image To Text

Math

Multimodal

Physics

Reasoning

Safety

Healthcare

Chemistry

Economics

Tool Calling

Video

Vision

Agents

Biology

Code

Frontend Development

Coding

Science

Systems

Precios

No hay datos de precios disponibles

Velocidad

No hay datos de velocidad disponibles

Ranking de Precios por Proveedor

No hay datos de proveedores disponibles

Fuentes externas

LLM Stats Artificial Analysis