Qwen2.5 VL 7B Instruct

Alibaba Cloud / Qwen TeamQwen開源權重Apache 2.0 · 商用許可

描述

Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.

發布日期

2025-01-26

參數規模

8.3B

上下文長度

131K

支援模態

image, text

能力雷達圖

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

領域	#排名	分數	來源
智慧體能力模型榜	27	62.0	LS
多模態榜	67	71.0	LS
推理能力	87	53.0	LS

基準測試分數 (LLM Stats)

Agents

MobileMiniWob++_SR

91.4%自報

AITZ_EM

81.9%自報

AndroidWorld_SR

25.5%自報

General

MMVet

67.1%自報

MMStar

63.9%自報

MMT-Bench

63.6%自報

MMMU

58.6%自報

MMMU-Pro

38.3%自報

Grounding

ScreenSpot

84.7%自報

ScreenSpot Pro

29.0%自報

Image To Text

DocVQA

95.7%自報

OCRBench

86.4%自報

TextVQA

84.9%自報

Language

CharadesSTA

43.6%自報

Long Context

MLVU

70.2%自報

LongVideoBench

54.7%自報

LVBench

45.3%自報

Math

MathVista-Mini

68.2%自報

MathVision

25.1%自報

Multimodal

Android Control Low_EM

91.4%自報

ChartQA

87.3%自報

MMBench

84.3%自報

InfoVQA

82.6%自報

CC-OCR

77.8%自報

TempCompass

71.7%自報

VideoMME w sub.

71.6%自報

PerceptionTest

70.5%自報

MVBench

69.6%自報

VideoMME w/o sub.

65.1%自報

Android Control High_EM

60.1%自報

MMBench-Video

1.8%自報

Reasoning

Hallusion Bench

52.9%自報

AA 評測指數

暫無 AA 評測資料

LLM Stats 分類評分

Image To Text

Structured Output

Text-to-image

Long Context

Multimodal

Reasoning

Spatial Reasoning

Grounding

Healthcare

Vision

Math

General

Agents

Video

Language

定價

輸入價格$0.35 / 1M tokens

輸出價格$1.05 / 1M tokens

混合價格(3:1)$0.525 / 1M tokens

速度

暫無速度資料

供應商價格排行

4 個供應商

最便宜: SiliconFlow最貴: Alibaba

供應商輸入輸出

1SiliconFlow最便宜

$0.05

2Alibaba (China)

$0.287

$0.717

3Alibaba Cloud / Qwen Team主要

$0.35

$1.05

4Alibaba

$0.35

$1.05

比較該模型在不同 API 供應商之間的定價。

外部連結

LLM Stats Artificial Analysis