Qwen2.5 VL 72B Instruct

Alibaba Cloud / Qwen TeamQwenOpen Weighttongyi-qianwen

Description

Qwen2.5-VL is the new flagship vision-language model of Qwen, significantly improved from Qwen2-VL. It excels at recognizing objects, analyzing text/charts/layouts in images, acting as a visual agent, understanding long videos (over 1 hour) with event pinpointing, performing visual localization (bounding boxes/points), and generating structured outputs from documents.

Release Date

2025-01-26

Parameters

72.0B

Context Length

131K

Modalities

image, text

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Agentic Capability	98	45.0	LS
Multimodal Ranking	59	73.0	LS
Reasoning	79	55.0	LS

Benchmark Scores (LLM Stats)

Agents

AITZ_EM

83.2%SR

MobileMiniWob++_SR

68.0%SR

AndroidWorld_SR

35.0%SR

OSWorld

8.8%SR

General

MMVet

76.2%SR

MLVU-M

74.6%SR

MMStar

70.8%SR

MMMU

70.2%SR

MMMU-Pro

51.1%SR

Grounding

ScreenSpot

87.1%SR

ScreenSpot Pro

43.6%SR

Image To Text

DocVQA

96.4%SR

OCRBench

88.5%SR

OCRBench-V2 (en)

61.5%SR

Long Context

EgoSchema

76.2%SR

LVBench

47.3%SR

Math

MathVista-Mini

74.8%SR

MathVision

38.1%SR

Multimodal

Android Control Low_EM

93.7%SR

ChartQA

89.5%SR

AI2D

88.4%SR

MMBench

88.0%SR

CC-OCR

79.8%SR

TempCompass

74.8%SR

VideoMME w/o sub.

73.3%SR

PerceptionTest

73.2%SR

MVBench

70.4%SR

Android Control High_EM

67.4%SR

MMBench-Video

2.0%SR

Reasoning

Hallusion Bench

55.2%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Image To Text

Structured Output

Text-to-image

Reasoning

Spatial Reasoning

Grounding

Healthcare

Long Context

Math

Multimodal

Vision

General

Video

Agents

Pricing

Input Price$2.8 / 1M tokens

Output Price$8.4 / 1M tokens

Blended Price (3:1)$4.2 / 1M tokens

Speed

No speed data available

Provider Price Ranking

12 providers

Cheapest: Nebius Token FactoryMost Expensive: LLM Gateway

ProviderInputOutput

1Nebius Token FactoryCheapest

$0.25

$0.75

2SiliconFlow (China)

$0.59

3SiliconFlow

$0.59

4NanoGPT

$0.69989

5OpenRouter

$0.8

6NovitaAI

$0.8

7Kilo Gateway

$0.8

8OVHcloud AI Endpoints

$1.01

9Alibaba (China)

$2.294

$6.881

10Alibaba Cloud / Qwen TeamPRIMARY

$2.8

$8.4

11Alibaba

$2.8

$8.4

12LLM Gateway

$2.8

$8.4

Compare pricing across different API providers for this model.

External Sources

LLM Stats Artificial Analysis