Qwen2.5 VL 7B Instruct

Alibaba Cloud / Qwen TeamQwen开源权重Apache 2.0 · 商用许可

描述

Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.

发布日期

2025-01-26

参数规模

8.3B

上下文长度

131K

支持模态

image, text

能力雷达图

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域	#排名	分数	来源
智能体能力模型榜	27	62.0	LS
多模态榜	67	71.0	LS
推理能力	87	53.0	LS

基准测试分数 (LLM Stats)

Agents

MobileMiniWob++_SR

91.4%自报

AITZ_EM

81.9%自报

AndroidWorld_SR

25.5%自报

General

MMVet

67.1%自报

MMStar

63.9%自报

MMT-Bench

63.6%自报

MMMU

58.6%自报

MMMU-Pro

38.3%自报

Grounding

ScreenSpot

84.7%自报

ScreenSpot Pro

29.0%自报

Image To Text

DocVQA

95.7%自报

OCRBench

86.4%自报

TextVQA

84.9%自报

Language

CharadesSTA

43.6%自报

Long Context

MLVU

70.2%自报

LongVideoBench

54.7%自报

LVBench

45.3%自报

Math

MathVista-Mini

68.2%自报

MathVision

25.1%自报

Multimodal

Android Control Low_EM

91.4%自报

ChartQA

87.3%自报

MMBench

84.3%自报

InfoVQA

82.6%自报

CC-OCR

77.8%自报

TempCompass

71.7%自报

VideoMME w sub.

71.6%自报

PerceptionTest

70.5%自报

MVBench

69.6%自报

VideoMME w/o sub.

65.1%自报

Android Control High_EM

60.1%自报

MMBench-Video

1.8%自报

Reasoning

Hallusion Bench

52.9%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Image To Text

Structured Output

Text-to-image

Long Context

Multimodal

Reasoning

Spatial Reasoning

Grounding

Healthcare

Vision

Math

General

Agents

Video

Language

定价

输入价格$0.35 / 1M tokens

输出价格$1.05 / 1M tokens

混合价格(3:1)$0.525 / 1M tokens

速度

暂无速度数据

供应商价格排行

4 个供应商

最便宜: SiliconFlow最贵: Alibaba

供应商输入输出

1SiliconFlow最便宜

$0.05

2Alibaba (China)

$0.287

$0.717

3Alibaba Cloud / Qwen Team主要

$0.35

$1.05

4Alibaba

$0.35

$1.05

比较该模型在不同 API 供应商之间的定价。

外部链接

LLM Stats Artificial Analysis