Qwen2-VL-72B-Instruct
Alibaba Cloud / Qwen TeamQwenOpen Weighttongyi-qianwen
描述
An instruction-tuned, large multimodal model that excels at visual understanding and step-by-step reasoning. It supports image and video input, with dynamic resolution handling and improved positional embeddings (M-ROPE), enabling advanced capabilities such as complex problem solving, multilingual text recognition in images, and agent-like interactions in video contexts.
发布日期
2024-08-29
参数规模
73.4B
上下文长度
—
支持模态
—
能力雷达图
60
general
0
coding
70
reasoning
51
science估算
0
agents
90
multimodal
Science 在缺少专门科学评测时使用推理能力代理估算。
排行榜排名
基准测试分数 (LLM Stats)
General
MMVetGPT4Turbo
74.0%自报
MMMUval
64.5%自报
MMMU-Pro
46.2%自报
Image To Text
OCRBench
87.7%自报
TextVQA
85.5%自报
Long Context
EgoSchema
77.9%自报
Math
MathVista-Mini
70.5%自报
Multimodal
DocVQAtest
96.5%自报
ChartQA
88.3%自报
MMBench
86.5%自报
InfoVQAtest
84.5%自报
MVBench
73.6%自报
MTVQA
30.9%自报
Reasoning
VCR_en_easy
91.9%自报
Spatial Reasoning
RealWorldQA
77.8%自报
AA 评测指数
暂无 AA 评测数据
LLM Stats 分类评分
Image To Text90
Spatial Reasoning80
Vision80
Long Context80
Multimodal80
Video70
Math70
Reasoning70
General60
Healthcare60
定价
暂无定价数据
速度
暂无速度数据
可用提供商
(LS 内部计价单位)暂无提供商数据