跳转到主要内容

Qwen2.5 VL 72B Instruct

Alibaba Cloud / Qwen TeamQwenOpen Weighttongyi-qianwen

描述

Qwen2.5-VL is the new flagship vision-language model of Qwen, significantly improved from Qwen2-VL. It excels at recognizing objects, analyzing text/charts/layouts in images, acting as a visual agent, understanding long videos (over 1 hour) with event pinpointing, performing visual localization (bounding boxes/points), and generating structured outputs from documents.

发布日期
2025-01-26
参数规模
72.0B
上下文长度
32K
支持模态
image, text

能力雷达图

50
general
0
coding
60
reasoning
60
science估算
40
agents
80
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
智能体与工具82
45.0
LS
多模态榜54
73.0
LS
推理能力74
55.0
LS

基准测试分数 (LLM Stats)

Agents

AITZ_EM83.2%自报
MobileMiniWob++_SR68.0%自报
AndroidWorld_SR35.0%自报
OSWorld8.8%自报

General

MMVet76.2%自报
MLVU-M74.6%自报
MMStar70.8%自报
MMMU70.2%自报
MMMU-Pro51.1%自报

Grounding

ScreenSpot87.1%自报
ScreenSpot Pro43.6%自报

Image To Text

DocVQA96.4%自报
OCRBench88.5%自报
OCRBench-V2 (en)61.5%自报

Long Context

EgoSchema76.2%自报
LVBench47.3%自报

Math

MathVista-Mini74.8%自报
MathVision38.1%自报

Multimodal

Android Control Low_EM93.7%自报
ChartQA89.5%自报
AI2D88.4%自报
MMBench88.0%自报
CC-OCR79.8%自报
TempCompass74.8%自报
VideoMME w/o sub.73.3%自报
PerceptionTest73.2%自报
MVBench70.4%自报
Android Control High_EM67.4%自报
MMBench-Video2.0%自报

Reasoning

Hallusion Bench55.2%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Structured Output
80
Text-to-image
80
Image To Text
80
Spatial Reasoning
70
Grounding
70
Healthcare
70
Reasoning
70
Vision
60
Long Context
60
Math
60
Multimodal
60
Video
50
General
50
Agents
40

定价

输入价格$0.25 / 1M tokens
输出价格$0.75 / 1M tokens
混合价格(3:1)$0.375 / 1M tokens

速度

暂无速度数据

可用提供商

(LS 内部计价单位)

暂无提供商数据

外部链接