MiMo-V2.5-TTS

Xiaomi

描述

MiMo-V2.5 is Xiaomi's native omnimodal sparse Mixture-of-Experts model with 310B total parameters, 15B activated parameters, and a 1M-token context window. Built on the MiMo-V2-Flash backbone, it adds dedicated vision and audio encoders for text, image, video, and audio understanding, and is post-trained with SFT, agentic reinforcement learning, and Multi-Teacher On-Policy Distillation for multimodal perception, long-context reasoning, and agentic workflows.

發布日期

2026-04-22

參數規模

—

上下文長度

1.0M

支援模態

audio, image, text, video

能力雷達圖

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

領域	#排名	分數	來源
音訊能力	35	65.0	AA

基準測試分數 (LLM Stats)

Agents

MiMo Coding Bench

71.8%自報

Terminal-Bench 2.0

65.8%自報

Claw-Eval

63.2%自報

SWE-Bench Pro

56.1%自報

Finance Agent v2

36.7%自報

ResearchClawBench

16.9%自報

Document Understanding

OmniDocBench

87.2%自報

General

MMMU-Pro

77.9%自報

Long Context

GraphWalks

87.0%自報

Multimodal

HR-Bench (4k)

88.5%自報

Video-MME

87.7%自報

DailyOmni

83.5%自報

CharXiv-R

81.0%自報

VideoHolmes

64.0%自報

AA 評測指數

暫無 AA 評測資料

LLM Stats 分類評分

Long Context

Multimodal

General

Vision

Reasoning

Tool Calling

Agents

Code

Coding

Finance

定價

輸入價格$0.4 / 1M tokens

輸出價格$2 / 1M tokens

混合價格(3:1)$0.8 / 1M tokens

快取讀取價格$0.08 / 1M tokens

速度

暫無速度資料

供應商價格排行

1 個供應商

供應商輸入輸出

1Xiaomi主要

$0.4

比較該模型在不同 API 供應商之間的定價。

外部連結

Artificial Analysis