Qwen2.5 VL 7B Instruct
Alibaba Cloud / Qwen TeamQwenOpen WeightApache 2.0 · Commercial OK
描述
Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.
發布日期
2025-01-26
參數規模
8.3B
上下文長度
—
支援模態
—
能力雷達圖
50
general
0
coding
50
reasoning
51
science估算
50
agents
90
multimodal
Science 在缺少專門科學評測時使用推理能力代理估算。
排行榜排名
基準測試分數 (LLM Stats)
Agents
MobileMiniWob++_SR
91.4%自報
AITZ_EM
81.9%自報
AndroidWorld_SR
25.5%自報
General
MMVet
67.1%自報
MMStar
63.9%自報
MMT-Bench
63.6%自報
MMMU
58.6%自報
MMMU-Pro
38.3%自報
Grounding
ScreenSpot
84.7%自報
ScreenSpot Pro
29.0%自報
Image To Text
DocVQA
95.7%自報
OCRBench
86.4%自報
TextVQA
84.9%自報
Language
CharadesSTA
43.6%自報
Long Context
MLVU
70.2%自報
LongVideoBench
54.7%自報
LVBench
45.3%自報
Math
MathVista-Mini
68.2%自報
MathVision
25.1%自報
Multimodal
Android Control Low_EM
91.4%自報
ChartQA
87.3%自報
MMBench
84.3%自報
InfoVQA
82.6%自報
CC-OCR
77.8%自報
TempCompass
71.7%自報
VideoMME w sub.
71.6%自報
PerceptionTest
70.5%自報
MVBench
69.6%自報
VideoMME w/o sub.
65.1%自報
Android Control High_EM
60.1%自報
MMBench-Video
1.8%自報
Reasoning
Hallusion Bench
52.9%自報
AA 評測指數
暫無 AA 評測資料
LLM Stats 分類評分
Image To Text90
Structured Output80
Text-to-image80
Spatial Reasoning60
Vision60
Grounding60
Healthcare60
Long Context60
Multimodal60
Reasoning60
Video50
Agents50
General50
Math50
Language40
定價
暫無定價資料
速度
暫無速度資料
可用提供商
(LS 內部計價單位)暫無提供商資料