跳转到主要内容

MiMo-V2.5-TTS

Xiaomi

描述

MiMo-V2.5 is Xiaomi's native omnimodal sparse Mixture-of-Experts model with 310B total parameters, 15B activated parameters, and a 1M-token context window. Built on the MiMo-V2-Flash backbone, it adds dedicated vision and audio encoders for text, image, video, and audio understanding, and is post-trained with SFT, agentic reinforcement learning, and Multi-Teacher On-Policy Distillation for multimodal perception, long-context reasoning, and agentic workflows.

发布日期
2026-04-22
参数规模
上下文长度
1.0M
支持模态
audio, image, text, video

能力雷达图

80
general
60
coding
70
reasoning
60
science估算
70
agents
88
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
音频能力35
65.0
AA

基准测试分数 (LLM Stats)

Agents

MiMo Coding Bench71.8%自报
Terminal-Bench 2.065.8%自报
Claw-Eval63.2%自报
SWE-Bench Pro56.1%自报
Finance Agent v236.7%自报
ResearchClawBench16.9%自报

Document Understanding

OmniDocBench87.2%自报

General

MMMU-Pro77.9%自报

Long Context

GraphWalks87.0%自报

Multimodal

HR-Bench (4k)88.5%自报
Video-MME87.7%自报
DailyOmni83.5%自报
CharXiv-R81.0%自报
VideoHolmes64.0%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Long Context
90
Multimodal
80
General
80
Vision
80
Reasoning
70
Tool Calling
70
Agents
60
Code
60
Coding
60
Finance
40

定价

输入价格$0.4 / 1M tokens
输出价格$2 / 1M tokens
混合价格(3:1)$0.8 / 1M tokens
缓存读取价格$0.08 / 1M tokens

速度

暂无速度数据

供应商价格排行

供应商价格排行

1 个供应商

供应商输入输出
1Xiaomi主要
$0.4
$2

比较该模型在不同 API 供应商之间的定价。

外部链接