Qwen2.5 VL 7B Instruct
Alibaba Cloud / Qwen TeamQwenOpen WeightApache 2.0 · Commercial OK
Description
Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.
Release Date
2025-01-26
Parameters
8.3B
Context Length
131K
Modalities
image, text
Capability Radar
50
general
0
coding
50
reasoning
51
scienceest.
50
agents
90
multimodal
Science uses a reasoning proxy when dedicated science benchmarks are unavailable.
Rankings
| Domain | #Rank | Score | Source |
|---|---|---|---|
| Agentic Capability | 27 | 62.0 | LS |
| Multimodal Ranking | 67 | 71.0 | LS |
| Reasoning | 87 | 53.0 | LS |
Benchmark Scores (LLM Stats)
Agents
MobileMiniWob++_SR
91.4%SR
AITZ_EM
81.9%SR
AndroidWorld_SR
25.5%SR
General
MMVet
67.1%SR
MMStar
63.9%SR
MMT-Bench
63.6%SR
MMMU
58.6%SR
MMMU-Pro
38.3%SR
Grounding
ScreenSpot
84.7%SR
ScreenSpot Pro
29.0%SR
Image To Text
DocVQA
95.7%SR
OCRBench
86.4%SR
TextVQA
84.9%SR
Language
CharadesSTA
43.6%SR
Long Context
MLVU
70.2%SR
LongVideoBench
54.7%SR
LVBench
45.3%SR
Math
MathVista-Mini
68.2%SR
MathVision
25.1%SR
Multimodal
Android Control Low_EM
91.4%SR
ChartQA
87.3%SR
MMBench
84.3%SR
InfoVQA
82.6%SR
CC-OCR
77.8%SR
TempCompass
71.7%SR
VideoMME w sub.
71.6%SR
PerceptionTest
70.5%SR
MVBench
69.6%SR
VideoMME w/o sub.
65.1%SR
Android Control High_EM
60.1%SR
MMBench-Video
1.8%SR
Reasoning
Hallusion Bench
52.9%SR
AA Evaluation Indices
No AA evaluation data available
LLM Stats Category Scores
Image To Text90
Structured Output80
Text-to-image80
Long Context60
Multimodal60
Reasoning60
Spatial Reasoning60
Grounding60
Healthcare60
Vision60
Math50
General50
Agents50
Video50
Language40
Pricing
Input Price$0.35 / 1M tokens
Output Price$1.05 / 1M tokens
Blended Price (3:1)$0.525 / 1M tokens
Speed
No speed data available
Provider Price Ranking
Provider Price Ranking
4 providers
Cheapest: SiliconFlowMost Expensive: Alibaba
ProviderInputOutput
1SiliconFlowCheapest
$0.05
$0.05
2Alibaba (China)
$0.287
$0.717
3Alibaba Cloud / Qwen TeamPRIMARY
$0.35
$1.05
4Alibaba
$0.35
$1.05
Compare pricing across different API providers for this model.