Skip to main content

MiMo-V2-Omni

XiaomiProprietary

Description

MiMo-V2-Omni is Xiaomi's omni foundation model uniting frontier multimodal understanding with strong agentic capability. It fuses dedicated image, video, and audio encoders into a single shared backbone, processing all modalities simultaneously. Natively supports structured tool calling, function execution, and UI grounding. Supports over 10 hours of continuous audio understanding and 256K token context window.

Release Date
2026-03-19
Parameters
Context Length
262K
Modalities
audio, image, pdf, text, video

Capability Radar

32
general
37
coding
83
reasoning
54
scienceest.
100
agents
85
multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain#RankScoreSource
Agentic Capability66
54.0
LS
Code Ranking73
72.0
AA
General Ranking91
67.0
AA
Science101
61.0
AA

Benchmark Scores (LLM Stats)

Agents

GDPval-AA1410.00 / 3000SR
PinchBench81.2%SR
Claw-Eval54.8%SR
MM-BrowserComp52.0%SR
OmniGAIA49.8%SR

Code

SWE-Bench Verified74.8%SR

AA Evaluation Indices

Intelligence Index
35.0
Tau2
0.9
Gpqa
0.8
Lcr
0.7
Ifbench
0.5
Scicode
0.4
Terminalbench Hard
0.3
Hle
0.2

LLM Stats Category Scores

Legal
100
Finance
100
General
100
Reasoning
100
Agents
100
Frontend Development
70
Code
70
Coding
70

Pricing

Input PriceFree
Output PriceFree
Blended Price (3:1)Free
Cache Read Price$0.08 / 1M tokens

Speed

Tokens/sec70.9
Time to First Token2.79s
Time to Answer31.00s

Provider Price Ranking

Provider Price Ranking

6 providers

Cheapest: NanoGPTMost Expensive: Xiaomi
ProviderInputOutput
1NanoGPTCheapest
$0.4
$2
2OpenCode Go
$0.4
$2
3ZenMux
$0.4
$2
4Kilo Gateway
$0.4
$2
5LLM Gateway
$0.4
$2
6Xiaomi
$0.4
$2

Compare pricing across different API providers for this model.

External Sources