跳转到主要内容

Qwen2.5 VL 32B Instruct

Alibaba Cloud / Qwen TeamQwenOpen WeightApache 2.0 · Commercial OK

描述

Qwen2.5-VL is a vision-language model from the Qwen family. Key enhancements include visual understanding (objects, text, charts, layouts), visual agent capabilities (tool use, computer/phone control), long video comprehension with event pinpointing, visual localization (bounding boxes/points), and structured output generation.

发布日期
2025-02-28
参数规模
33.5B
上下文长度
支持模态

能力雷达图

50
general
90
coding
70
reasoning
43
science估算
40
agents
70
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
智能体与工具94
33.0
LS
多模态榜65
66.0
LS

基准测试分数 (LLM Stats)

Agents

AITZ_EM83.1%自报
AndroidWorld_SR22.0%自报
OSWorld5.9%自报

Biology

GPQA46.0%自报

Code

HumanEval91.5%自报

Finance

MMLU78.4%自报
MMLU-Pro68.8%自报

General

MBPP0.84 / 100自报
MMMU70.0%自报
MMStar69.5%自报
MMMU-Pro49.5%自报

Grounding

ScreenSpot88.5%自报
ScreenSpot Pro39.4%自报

Image To Text

DocVQA94.8%自报
OCRBench-V2 (zh)59.1%自报
OCRBench-V2 (en)57.2%自报

Language

CharadesSTA54.2%自报

Long Context

LVBench49.0%自报

Math

MATH82.2%自报
MathVista-Mini74.7%自报
MathVision38.4%自报

Multimodal

Android Control Low_EM93.3%自报
InfoVQA83.4%自报
VideoMME w sub.77.9%自报
CC-OCR77.1%自报
VideoMME w/o sub.70.5%自报
Android Control High_EM69.6%自报
MMBench-Video1.9%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Code
90
Structured Output
80
Text-to-image
80
Finance
70
Healthcare
70
Image To Text
70
Language
70
Legal
70
Math
70
Spatial Reasoning
60
Vision
60
Grounding
60
Multimodal
60
Reasoning
60
Video
50
Biology
50
Chemistry
50
General
50
Long Context
50
Physics
50
Agents
40

定价

暂无定价数据

速度

暂无速度数据

可用提供商

(LS 内部计价单位)

暂无提供商数据

外部链接