跳转到主要内容

Qwen2.5 VL 7B Instruct

Alibaba Cloud / Qwen TeamQwen开源权重Apache 2.0 · 商用许可

描述

Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.

发布日期
2025-01-26
参数规模
8.3B
上下文长度
131K
支持模态
image, text

能力雷达图

50
general
0
coding
50
reasoning
51
science估算
50
agents
90
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
智能体能力模型榜27
62.0
LS
多模态榜67
71.0
LS
推理能力87
53.0
LS

基准测试分数 (LLM Stats)

Agents

MobileMiniWob++_SR91.4%自报
AITZ_EM81.9%自报
AndroidWorld_SR25.5%自报

General

MMVet67.1%自报
MMStar63.9%自报
MMT-Bench63.6%自报
MMMU58.6%自报
MMMU-Pro38.3%自报

Grounding

ScreenSpot84.7%自报
ScreenSpot Pro29.0%自报

Image To Text

DocVQA95.7%自报
OCRBench86.4%自报
TextVQA84.9%自报

Language

CharadesSTA43.6%自报

Long Context

MLVU70.2%自报
LongVideoBench54.7%自报
LVBench45.3%自报

Math

MathVista-Mini68.2%自报
MathVision25.1%自报

Multimodal

Android Control Low_EM91.4%自报
ChartQA87.3%自报
MMBench84.3%自报
InfoVQA82.6%自报
CC-OCR77.8%自报
TempCompass71.7%自报
VideoMME w sub.71.6%自报
PerceptionTest70.5%自报
MVBench69.6%自报
VideoMME w/o sub.65.1%自报
Android Control High_EM60.1%自报
MMBench-Video1.8%自报

Reasoning

Hallusion Bench52.9%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Image To Text
90
Structured Output
80
Text-to-image
80
Long Context
60
Multimodal
60
Reasoning
60
Spatial Reasoning
60
Grounding
60
Healthcare
60
Vision
60
Math
50
General
50
Agents
50
Video
50
Language
40

定价

输入价格$0.35 / 1M tokens
输出价格$1.05 / 1M tokens
混合价格(3:1)$0.525 / 1M tokens

速度

暂无速度数据

供应商价格排行

供应商价格排行

4 个供应商

最便宜: SiliconFlow最贵: Alibaba
供应商输入输出
1SiliconFlow最便宜
$0.05
$0.05
2Alibaba (China)
$0.287
$0.717
3Alibaba Cloud / Qwen Team主要
$0.35
$1.05
4Alibaba
$0.35
$1.05

比较该模型在不同 API 供应商之间的定价。

外部链接