Qwen2.5 VL 7B Instruct
Alibaba Cloud / Qwen TeamQwen開源權重Apache 2.0 · 商用許可
描述
Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.
發布日期
2025-01-26
參數規模
8.3B
上下文長度
131K
支援模態
image, text
能力雷達圖
50
general
0
coding
50
reasoning
51
science估算
50
agents
90
multimodal
Science 在缺少專門科學評測時使用推理能力代理估算。
排行榜排名
基準測試分數 (LLM Stats)
Agents
MobileMiniWob++_SR
91.4%自報
AITZ_EM
81.9%自報
AndroidWorld_SR
25.5%自報
General
MMVet
67.1%自報
MMStar
63.9%自報
MMT-Bench
63.6%自報
MMMU
58.6%自報
MMMU-Pro
38.3%自報
Grounding
ScreenSpot
84.7%自報
ScreenSpot Pro
29.0%自報
Image To Text
DocVQA
95.7%自報
OCRBench
86.4%自報
TextVQA
84.9%自報
Language
CharadesSTA
43.6%自報
Long Context
MLVU
70.2%自報
LongVideoBench
54.7%自報
LVBench
45.3%自報
Math
MathVista-Mini
68.2%自報
MathVision
25.1%自報
Multimodal
Android Control Low_EM
91.4%自報
ChartQA
87.3%自報
MMBench
84.3%自報
InfoVQA
82.6%自報
CC-OCR
77.8%自報
TempCompass
71.7%自報
VideoMME w sub.
71.6%自報
PerceptionTest
70.5%自報
MVBench
69.6%自報
VideoMME w/o sub.
65.1%自報
Android Control High_EM
60.1%自報
MMBench-Video
1.8%自報
Reasoning
Hallusion Bench
52.9%自報
AA 評測指數
暫無 AA 評測資料
LLM Stats 分類評分
Image To Text90
Structured Output80
Text-to-image80
Long Context60
Multimodal60
Reasoning60
Spatial Reasoning60
Grounding60
Healthcare60
Vision60
Math50
General50
Agents50
Video50
Language40
定價
輸入價格$0.35 / 1M tokens
輸出價格$1.05 / 1M tokens
混合價格(3:1)$0.525 / 1M tokens
速度
暫無速度資料
供應商價格排行
供應商價格排行
4 個供應商
最便宜: SiliconFlow最貴: Alibaba
供應商輸入輸出
1SiliconFlow最便宜
$0.05
$0.05
2Alibaba (China)
$0.287
$0.717
3Alibaba Cloud / Qwen Team主要
$0.35
$1.05
4Alibaba
$0.35
$1.05
比較該模型在不同 API 供應商之間的定價。