Skip to main content

Qwen2.5-Omni-7B

Alibaba Cloud / Qwen TeamQwenOpen WeightApache 2.0 · Commercial OK

Description

Qwen2.5-Omni is the flagship end-to-end multimodal model in the Qwen series. It processes diverse inputs including text, images, audio, and video, delivering real-time streaming responses through text generation and natural speech synthesis using a novel Thinker-Talker architecture.

Release Date
2025-03-27
Parameters
7.0B
Context Length
33K
Modalities
audio, image, text, video

Capability Radar

50
general
80
coding
60
reasoning
26
scienceest.
66
agents
80
multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain#RankScoreSource
Multimodal Ranking57
74.0
LS

Benchmark Scores (LLM Stats)

Audio

VocalSound93.9%SR
GiantSteps Tempo88.0%SR
MMAU Music69.2%SR
MMAU Sound67.9%SR
MMAU65.6%SR
MMAU Speech59.8%SR
OmniBench Music52.8%SR
CoVoST2 en-zh0.41 / 100SR
MusicCaps32.8%SR
Common Voice 150.08 / 100SR

Biology

GPQA30.8%SR

Code

HumanEval78.7%SR

Communication

VoiceBench Avg74.1%SR
MM-MT-Bench0.06 / 100SR

Creativity

Meld57.0%SR

Finance

MMLU-Pro47.0%SR

General

MBPP0.73 / 100SR
MMLU-Redux71.0%SR
MultiPL-E65.8%SR
MMStar64.0%SR
MME-RealWorld61.6%SR
MMMU59.2%SR
MMMU-Pro36.6%SR
LiveBench29.6%SR
NMOS0.05 / 100SR

Grounding

PointGrounding66.5%SR

Healthcare

CRPErelation76.5%SR

Image To Text

DocVQA95.2%SR
TextVQA84.4%SR
OCRBench_V257.8%SR

Language

FLEURS95.9%SR

Long Context

EgoSchema68.6%SR

Math

GSM8k88.7%SR
MATH71.5%SR
MathVista67.9%SR
MathVision25.0%SR

Multimodal

ChartQA85.3%SR
AI2D83.2%SR
MMBench-V1.181.8%SR
VideoMME w sub.72.4%SR
MVBench70.3%SR
MuirBench59.2%SR
OmniBench56.1%SR

Spatial Reasoning

RealWorldQA70.3%SR

Vision

ODinW42.4%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Speech To Text
100
Image To Text
80
Code
80
Language
70
Long Context
70
Spatial Reasoning
70
Video
70
Vision
70
Math
60
Multimodal
60
Reasoning
60
Legal
50
Finance
50
General
50
Healthcare
50
Physics
30
Biology
30
Chemistry
30
Communication
10

Pricing

Input Price$0.1 / 1M tokens
Output Price$0.4 / 1M tokens
Blended Price (3:1)$0.175 / 1M tokens

Speed

No speed data available

Provider Price Ranking

Provider Price Ranking

3 providers

Cheapest: Alibaba (China)Most Expensive: Alibaba
ProviderInputOutput
1Alibaba (China)Cheapest
$0.087
$0.345
2Alibaba Cloud / Qwen TeamPRIMARY
$0.1
$0.4
3Alibaba
$0.1
$0.4

Compare pricing across different API providers for this model.

External Sources