Skip to main content

MiMo-V2-Omni

XiaomiProprietary

Description

MiMo-V2-Omni is Xiaomi's omni foundation model uniting frontier multimodal understanding with strong agentic capability. It fuses dedicated image, video, and audio encoders into a single shared backbone, processing all modalities simultaneously. Natively supports structured tool calling, function execution, and UI grounding. Supports over 10 hours of continuous audio understanding and 256K token context window.

Release Date
2026-03-19
Parameters
Context Length
262K
Modalities
audio, image, text, video

Capability Radar

38
general
36
coding
83
reasoning
54
scienceest.
100
agents
85
multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain#RankScoreSource
Agents & Tools61
54.0
LS
Code Ranking75
66.0
AA
General Ranking74
73.0
AA
Science82
64.0
AA

Benchmark Scores (LLM Stats)

Agents

GDPval-AA1410.00 / 3000SR
PinchBench81.2%SR
Claw-Eval54.8%SR
MM-BrowserComp52.0%SR
OmniGAIA49.8%SR

Code

SWE-Bench Verified74.8%SR

AA Evaluation Indices

Intelligence Index
43.4
Coding Index
35.5
Tau2
0.9
Gpqa
0.8
Lcr
0.7
Ifbench
0.5
Scicode
0.4
Terminalbench Hard
0.3
Hle
0.2

LLM Stats Category Scores

Finance
100
General
100
Legal
100
Reasoning
100
Agents
100
Code
70
Coding
70
Frontend Development
70

Pricing

Input PriceFree
Output PriceFree
Blended Price (3:1)Free

Speed

Tokens/sec120.9 tokens/s
Time to First Token1.35s
Time to Answer17.89s

Available Providers

(LS internal units)
ProviderInput PriceOutput Price
Xiaomi400K2.0M

External Sources