MiMo-V2.5-TTS

Xiaomi

描述

MiMo-V2.5 is Xiaomi's native omnimodal sparse Mixture-of-Experts model with 310B total parameters, 15B activated parameters, and a 1M-token context window. Built on the MiMo-V2-Flash backbone, it adds dedicated vision and audio encoders for text, image, video, and audio understanding, and is post-trained with SFT, agentic reinforcement learning, and Multi-Teacher On-Policy Distillation for multimodal perception, long-context reasoning, and agentic workflows.

发布日期

2026-04-22

参数规模

—

上下文长度

1.0M

支持模态

audio, image, text, video

能力雷达图

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域	#排名	分数	来源
音频能力	35	65.0	AA

基准测试分数 (LLM Stats)

Agents

MiMo Coding Bench

71.8%自报

Terminal-Bench 2.0

65.8%自报

Claw-Eval

63.2%自报

SWE-Bench Pro

56.1%自报

Finance Agent v2

36.7%自报

ResearchClawBench

16.9%自报

Document Understanding

OmniDocBench

87.2%自报

General

MMMU-Pro

77.9%自报

Long Context

GraphWalks

87.0%自报

Multimodal

HR-Bench (4k)

88.5%自报

Video-MME

87.7%自报

DailyOmni

83.5%自报

CharXiv-R

81.0%自报

VideoHolmes

64.0%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Long Context

Multimodal

General

Vision

Reasoning

Tool Calling

Agents

Code

Coding

Finance

定价

输入价格$0.4 / 1M tokens

输出价格$2 / 1M tokens

混合价格(3:1)$0.8 / 1M tokens

缓存读取价格$0.08 / 1M tokens

速度

暂无速度数据

供应商价格排行

1 个供应商

供应商输入输出

1Xiaomi主要

$0.4

比较该模型在不同 API 供应商之间的定价。

外部链接

Artificial Analysis