Skip to main content

Qwen2.5 VL 7B Instruct

Alibaba Cloud / Qwen TeamQwenOpen WeightApache 2.0 · Commercial OK

Description

Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.

Release Date
2025-01-26
Parameters
8.3B
Context Length
131K
Modalities
image, text

Capability Radar

50
general
0
coding
50
reasoning
51
scienceest.
50
agents
90
multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain#RankScoreSource
Agentic Capability27
62.0
LS
Multimodal Ranking67
71.0
LS
Reasoning87
53.0
LS

Benchmark Scores (LLM Stats)

Agents

MobileMiniWob++_SR91.4%SR
AITZ_EM81.9%SR
AndroidWorld_SR25.5%SR

General

MMVet67.1%SR
MMStar63.9%SR
MMT-Bench63.6%SR
MMMU58.6%SR
MMMU-Pro38.3%SR

Grounding

ScreenSpot84.7%SR
ScreenSpot Pro29.0%SR

Image To Text

DocVQA95.7%SR
OCRBench86.4%SR
TextVQA84.9%SR

Language

CharadesSTA43.6%SR

Long Context

MLVU70.2%SR
LongVideoBench54.7%SR
LVBench45.3%SR

Math

MathVista-Mini68.2%SR
MathVision25.1%SR

Multimodal

Android Control Low_EM91.4%SR
ChartQA87.3%SR
MMBench84.3%SR
InfoVQA82.6%SR
CC-OCR77.8%SR
TempCompass71.7%SR
VideoMME w sub.71.6%SR
PerceptionTest70.5%SR
MVBench69.6%SR
VideoMME w/o sub.65.1%SR
Android Control High_EM60.1%SR
MMBench-Video1.8%SR

Reasoning

Hallusion Bench52.9%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Image To Text
90
Structured Output
80
Text-to-image
80
Long Context
60
Multimodal
60
Reasoning
60
Spatial Reasoning
60
Grounding
60
Healthcare
60
Vision
60
Math
50
General
50
Agents
50
Video
50
Language
40

Pricing

Input Price$0.35 / 1M tokens
Output Price$1.05 / 1M tokens
Blended Price (3:1)$0.525 / 1M tokens

Speed

No speed data available

Provider Price Ranking

Provider Price Ranking

4 providers

Cheapest: SiliconFlowMost Expensive: Alibaba
ProviderInputOutput
1SiliconFlowCheapest
$0.05
$0.05
2Alibaba (China)
$0.287
$0.717
3Alibaba Cloud / Qwen TeamPRIMARY
$0.35
$1.05
4Alibaba
$0.35
$1.05

Compare pricing across different API providers for this model.

External Sources