Qwen2.5 VL 7B Instruct
Alibaba Cloud / Qwen TeamQwen开源权重Apache 2.0 · 商用许可
描述
Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.
发布日期
2025-01-26
参数规模
8.3B
上下文长度
131K
支持模态
image, text
能力雷达图
50
general
0
coding
50
reasoning
51
science估算
50
agents
90
multimodal
Science 在缺少专门科学评测时使用推理能力代理估算。
排行榜排名
基准测试分数 (LLM Stats)
Agents
MobileMiniWob++_SR
91.4%自报
AITZ_EM
81.9%自报
AndroidWorld_SR
25.5%自报
General
MMVet
67.1%自报
MMStar
63.9%自报
MMT-Bench
63.6%自报
MMMU
58.6%自报
MMMU-Pro
38.3%自报
Grounding
ScreenSpot
84.7%自报
ScreenSpot Pro
29.0%自报
Image To Text
DocVQA
95.7%自报
OCRBench
86.4%自报
TextVQA
84.9%自报
Language
CharadesSTA
43.6%自报
Long Context
MLVU
70.2%自报
LongVideoBench
54.7%自报
LVBench
45.3%自报
Math
MathVista-Mini
68.2%自报
MathVision
25.1%自报
Multimodal
Android Control Low_EM
91.4%自报
ChartQA
87.3%自报
MMBench
84.3%自报
InfoVQA
82.6%自报
CC-OCR
77.8%自报
TempCompass
71.7%自报
VideoMME w sub.
71.6%自报
PerceptionTest
70.5%自报
MVBench
69.6%自报
VideoMME w/o sub.
65.1%自报
Android Control High_EM
60.1%自报
MMBench-Video
1.8%自报
Reasoning
Hallusion Bench
52.9%自报
AA 评测指数
暂无 AA 评测数据
LLM Stats 分类评分
Image To Text90
Structured Output80
Text-to-image80
Long Context60
Multimodal60
Reasoning60
Spatial Reasoning60
Grounding60
Healthcare60
Vision60
Math50
General50
Agents50
Video50
Language40
定价
输入价格$0.35 / 1M tokens
输出价格$1.05 / 1M tokens
混合价格(3:1)$0.525 / 1M tokens
速度
暂无速度数据
供应商价格排行
供应商价格排行
4 个供应商
最便宜: SiliconFlow最贵: Alibaba
供应商输入输出
1SiliconFlow最便宜
$0.05
$0.05
2Alibaba (China)
$0.287
$0.717
3Alibaba Cloud / Qwen Team主要
$0.35
$1.05
4Alibaba
$0.35
$1.05
比较该模型在不同 API 供应商之间的定价。