Перейти к основному содержанию

MAI-Thinking-1

MicrosoftProprietary

Описание

MAI-Thinking-1 is Microsoft AI's first in-house reasoning model, a 35B-active / ~1T-total parameter sparse Mixture of Experts model (base model MAI-Base-1) trained from scratch without distillation from third-party models. Built with Microsoft's Hill-Climbing Machine pipeline, it was pre-trained on 30T tokens of clean, commercially licensed, human-generated data (plus 3.55T mid-training tokens), then post-trained via reinforcement learning across STEM, agentic coding, and helpfulness/safety specialists consolidated into a single model. It delivers strong mathematical reasoning and software-engineering performance for its weight class, going toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro and reaching 97.0% on AIME 2025. It supports a 256k token context window, function calling, and developer instructions, and is preferred over Claude Sonnet 4.6 in blind human side-by-side evaluations.

Дата выхода
2026-06-02
Параметры
1.0T
Длина контекста
Модальности

Радар способностей

80
general
60
coding
90
reasoning
68
scienceоцен.
60
agents
40
multimodal

Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.

Рейтинги

Домен#МестоОценкаИсточник
Агентные возможности45
60.0
LS

Оценки бенчмарков (LLM Stats)

Agents

BFCL-v372.0%Сам.
SWE-Bench Pro52.8%Сам.
Terminal-Bench 2.046.0%Сам.

Biology

GPQA84.2%Сам.

Code

SWE-Bench Verified73.5%Сам.
CyberSecEval 463.0%Сам.

Communication

Multi-Challenge53.0%Сам.

Factuality

LongFact98.0%Сам.
SimpleQA Verified31.0%Сам.

Finance

TruthfulQA88.0%Сам.
MMLU-Pro85.0%Сам.

General

LiveCodeBench v687.7%Сам.
AdvancedIF85.0%Сам.
CorpusQA82.0%Сам.
IFBench69.0%Сам.
LongBench v261.0%Сам.

Healthcare

MedXpertQA43.0%Сам.
HealthBench Professional35.0%Сам.

Long Context

GraphWalks90.0%Сам.

Math

AIME 202597.0%Сам.
AIME 202694.5%Сам.
HMMT Feb 2684.9%Сам.

Safety

AIR-Bench88.0%Сам.

Индексы оценки AA

Нет данных AA оценки

Оценки категорий LLM Stats

Legal
90
Math
90
Finance
80
General
80
Language
80
Physics
80
Biology
80
Chemistry
80
Structured Output
70
Frontend Development
70
Instruction Following
70
Reasoning
70
Tool Calling
60
Healthcare
60
Long Context
60
Agents
60
Code
60
Communication
50
Vision
40
Multimodal
40

Цены

Нет данных о ценах

Скорость

Нет данных о скорости

Рейтинг цен провайдеров

Нет данных провайдеров

Внешние ссылки