Qwen2-VL-72B-Instruct
Alibaba Cloud / Qwen TeamQwenOpen Weighttongyi-qianwen
描述
An instruction-tuned, large multimodal model that excels at visual understanding and step-by-step reasoning. It supports image and video input, with dynamic resolution handling and improved positional embeddings (M-ROPE), enabling advanced capabilities such as complex problem solving, multilingual text recognition in images, and agent-like interactions in video contexts.
發布日期
2024-08-29
參數規模
73.4B
上下文長度
—
支援模態
—
能力雷達圖
60
general
0
coding
70
reasoning
51
science估算
0
agents
90
multimodal
Science 在缺少專門科學評測時使用推理能力代理估算。
排行榜排名
基準測試分數 (LLM Stats)
General
MMVetGPT4Turbo
74.0%自報
MMMUval
64.5%自報
MMMU-Pro
46.2%自報
Image To Text
OCRBench
87.7%自報
TextVQA
85.5%自報
Long Context
EgoSchema
77.9%自報
Math
MathVista-Mini
70.5%自報
Multimodal
DocVQAtest
96.5%自報
ChartQA
88.3%自報
MMBench
86.5%自報
InfoVQAtest
84.5%自報
MVBench
73.6%自報
MTVQA
30.9%自報
Reasoning
VCR_en_easy
91.9%自報
Spatial Reasoning
RealWorldQA
77.8%自報
AA 評測指數
暫無 AA 評測資料
LLM Stats 分類評分
Image To Text90
Spatial Reasoning80
Vision80
Long Context80
Multimodal80
Video70
Math70
Reasoning70
General60
Healthcare60
定價
暫無定價資料
速度
暫無速度資料
可用提供商
(LS 內部計價單位)暫無提供商資料