Qwen2-VL-72B-Instruct

Alibaba Cloud / Qwen TeamQwenオープンウエイトtongyi-qianwen

説明

An instruction-tuned, large multimodal model that excels at visual understanding and step-by-step reasoning. It supports image and video input, with dynamic resolution handling and improved positional embeddings (M-ROPE), enabling advanced capabilities such as complex problem solving, multilingual text recognition in images, and agent-like interactions in video contexts.

リリース日

2024-08-29

パラメータ

73.4B

コンテキスト長

—

モダリティ

—

能力レーダー

general

coding

reasoning

science推定

agents

multimodal

専門的な科学ベンチマークが利用できない場合、Scienceは推論プロキシを使用して推定します。

ベンチマークスコア (LLM Stats)

General

MMVetGPT4Turbo

74.0%自己申告

MMMUval

64.5%自己申告

MMMU-Pro

46.2%自己申告

Image To Text

OCRBench

87.7%自己申告

TextVQA

85.5%自己申告

Long Context

EgoSchema

77.9%自己申告

Math

MathVista-Mini

70.5%自己申告

Multimodal

DocVQAtest

96.5%自己申告

ChartQA

88.3%自己申告

MMBench

86.5%自己申告

InfoVQAtest

84.5%自己申告

MVBench

73.6%自己申告

MTVQA

30.9%自己申告

Reasoning

VCR_en_easy

91.9%自己申告

Spatial Reasoning

RealWorldQA

77.8%自己申告

AA評価指数

AA評価データがありません

LLM Statsカテゴリスコア

Image To Text

Long Context

Multimodal

Spatial Reasoning

Vision

Math

Reasoning

Video

General

Healthcare

価格設定

価格データがありません

速度

速度データがありません

プロバイダー価格ランキング

プロバイダーデータがありません

外部リンク

LLM Stats Artificial Analysis

ドメイン	#順位	スコア	ソース
マルチモーダルランキング	40	78.0	LS
推論	6	92.0	LS