Qwen2.5 VL 32B Instruct

Alibaba Cloud / Qwen TeamQwenオープンウエイトApache 2.0 · 商用利用可

説明

Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.

リリース日

2025-02-28

パラメータ

33.5B

コンテキスト長

—

モダリティ

—

能力レーダー

general

coding

reasoning

science推定

agents

multimodal

専門的な科学ベンチマークが利用できない場合、Scienceは推論プロキシを使用して推定します。

ベンチマークスコア (LLM Stats)

Agents

AITZ_EM

83.1%自己申告

AndroidWorld_SR

22.0%自己申告

OSWorld

5.9%自己申告

Biology

GPQA

46.0%自己申告

Code

HumanEval

91.5%自己申告

Finance

MMLU

78.4%自己申告

MMLU-Pro

68.8%自己申告

General

MBPP

0.84 / 100自己申告

MMMU

70.0%自己申告

MMStar

69.5%自己申告

MMMU-Pro

49.5%自己申告

Grounding

ScreenSpot

88.5%自己申告

ScreenSpot Pro

39.4%自己申告

Image To Text

DocVQA

94.8%自己申告

OCRBench-V2 (zh)

59.1%自己申告

OCRBench-V2 (en)

57.2%自己申告

Language

CharadesSTA

54.2%自己申告

Long Context

LVBench

49.0%自己申告

Math

MATH

82.2%自己申告

MathVista-Mini

74.7%自己申告

MathVision

38.4%自己申告

Multimodal

Android Control Low_EM

93.3%自己申告

InfoVQA

83.4%自己申告

VideoMME w sub.

77.9%自己申告

CC-OCR

77.1%自己申告

VideoMME w/o sub.

70.5%自己申告

Android Control High_EM

69.6%自己申告

MMBench-Video

1.9%自己申告

AA評価指数

AA評価データがありません

LLM Statsカテゴリスコア

Code

Structured Output

Text-to-image

Image To Text

Language

Legal

Math

Finance

Healthcare

Multimodal

Reasoning

Spatial Reasoning

Grounding

Vision

Long Context

Physics

General

Biology

Chemistry

Video

Agents

価格設定

価格データがありません

速度

速度データがありません

プロバイダー価格ランキング

6 プロバイダー

最安: IO.NET最高: LLM Gateway

プロバイダー入力出力

1IO.NET最安

$0.05

$0.22

2Chutes

$0.0543

$0.2174

3Meganova

$0.2

$0.6

4SiliconFlow (China)

$0.27

5SiliconFlow

$0.27

6LLM Gateway

$1.4

$4.2

このモデルの異なるAPIプロバイダー間の価格を比較。

外部リンク

LLM Stats Artificial Analysis

ドメイン	#順位	スコア	ソース
エージェント能力	115	33.0	LS
マルチモーダルランキング	74	66.0	LS