Qwen2.5 VL 7B Instruct

Alibaba Cloud / Qwen TeamQwenOpen WeightApache 2.0 · Commercial OK

Description

Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.

Release Date

2025-01-26

Parameters

8.3B

Context Length

131K

Modalities

image, text

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Agentic Capability	27	62.0	LS
Multimodal Ranking	67	71.0	LS
Reasoning	87	53.0	LS

Benchmark Scores (LLM Stats)

Agents

MobileMiniWob++_SR

91.4%SR

AITZ_EM

81.9%SR

AndroidWorld_SR

25.5%SR

General

MMVet

67.1%SR

MMStar

63.9%SR

MMT-Bench

63.6%SR

MMMU

58.6%SR

MMMU-Pro

38.3%SR

Grounding

ScreenSpot

84.7%SR

ScreenSpot Pro

29.0%SR

Image To Text

DocVQA

95.7%SR

OCRBench

86.4%SR

TextVQA

84.9%SR

Language

CharadesSTA

43.6%SR

Long Context

MLVU

70.2%SR

LongVideoBench

54.7%SR

LVBench

45.3%SR

Math

MathVista-Mini

68.2%SR

MathVision

25.1%SR

Multimodal

Android Control Low_EM

91.4%SR

ChartQA

87.3%SR

MMBench

84.3%SR

InfoVQA

82.6%SR

CC-OCR

77.8%SR

TempCompass

71.7%SR

VideoMME w sub.

71.6%SR

PerceptionTest

70.5%SR

MVBench

69.6%SR

VideoMME w/o sub.

65.1%SR

Android Control High_EM

60.1%SR

MMBench-Video

1.8%SR

Reasoning

Hallusion Bench

52.9%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Image To Text

Structured Output

Text-to-image

Long Context

Multimodal

Reasoning

Spatial Reasoning

Grounding

Healthcare

Vision

Math

General

Agents

Video

Language

Pricing

Input Price$0.35 / 1M tokens

Output Price$1.05 / 1M tokens

Blended Price (3:1)$0.525 / 1M tokens

Speed

No speed data available

Provider Price Ranking

4 providers

Cheapest: SiliconFlowMost Expensive: Alibaba

ProviderInputOutput

1SiliconFlowCheapest

$0.05

2Alibaba (China)

$0.287

$0.717

3Alibaba Cloud / Qwen TeamPRIMARY

$0.35

$1.05

4Alibaba

$0.35

$1.05

Compare pricing across different API providers for this model.

External Sources

LLM Stats Artificial Analysis