Step3 VL 10B

StepFunOpen WeightApache 2.0 · Commercial OK

Description

STEP3-VL-10B is a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. Built on a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens integrating a language-aligned Perception Encoder with a Qwen3-8B decoder. Features Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute for complex perceptual reasoning.

Release Date

2026-01-20

Parameters

10.0B

Context Length

—

Modalities

—

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Code Ranking	479	4.0	AA
General Ranking	418	24.0	AA
Multimodal Ranking	7	92.0	LS
Science	221	46.0	AA

Benchmark Scores (LLM Stats)

Communication

Multi-Challenge

62.6%SR

General

MMMU

78.1%SR

Math

AIME 2025

87.7%SR

MathVista

84.0%SR

MathVision

70.8%SR

Multimodal

MMBench

91.8%SR

AA Evaluation Indices

Intelligence Index

9.5

Gpqa

0.7

Ifbench

0.5

Scicode

0.3

Tau2

0.2

Hle

0.1

Terminalbench Hard

0.1

Lcr

0.0

LLM Stats Category Scores

Math

Multimodal

Reasoning

General

Healthcare

Vision

Communication

Pricing

Input PriceFree

Output PriceFree

Blended Price (3:1)Free

Speed

Tokens/sec0.0

Time to First Token0.00s

Time to Answer0.00s

Provider Price Ranking

No provider data available

External Sources

LLM Stats Artificial Analysis