Qwen2.5 VL 7B Instruct
Alibaba Cloud / Qwen TeamQwenOpen WeightApache 2.0 · Commercial OK
描述
Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.
发布日期
2025-01-26
参数规模
8.3B
上下文长度
—
支持模态
—
能力雷达图
50
general
0
coding
50
reasoning
51
science估算
50
agents
90
multimodal
Science 在缺少专门科学评测时使用推理能力代理估算。
排行榜排名
基准测试分数 (LLM Stats)
Agents
MobileMiniWob++_SR
91.4%自报
AITZ_EM
81.9%自报
AndroidWorld_SR
25.5%自报
General
MMVet
67.1%自报
MMStar
63.9%自报
MMT-Bench
63.6%自报
MMMU
58.6%自报
MMMU-Pro
38.3%自报
Grounding
ScreenSpot
84.7%自报
ScreenSpot Pro
29.0%自报
Image To Text
DocVQA
95.7%自报
OCRBench
86.4%自报
TextVQA
84.9%自报
Language
CharadesSTA
43.6%自报
Long Context
MLVU
70.2%自报
LongVideoBench
54.7%自报
LVBench
45.3%自报
Math
MathVista-Mini
68.2%自报
MathVision
25.1%自报
Multimodal
Android Control Low_EM
91.4%自报
ChartQA
87.3%自报
MMBench
84.3%自报
InfoVQA
82.6%自报
CC-OCR
77.8%自报
TempCompass
71.7%自报
VideoMME w sub.
71.6%自报
PerceptionTest
70.5%自报
MVBench
69.6%自报
VideoMME w/o sub.
65.1%自报
Android Control High_EM
60.1%自报
MMBench-Video
1.8%自报
Reasoning
Hallusion Bench
52.9%自报
AA 评测指数
暂无 AA 评测数据
LLM Stats 分类评分
Image To Text90
Structured Output80
Text-to-image80
Spatial Reasoning60
Vision60
Grounding60
Healthcare60
Long Context60
Multimodal60
Reasoning60
Video50
Agents50
General50
Math50
Language40
定价
暂无定价数据
速度
暂无速度数据
可用提供商
(LS 内部计价单位)暂无提供商数据