메인 콘텐츠로 건너뛰기

MAI-Thinking-1

MicrosoftProprietary

설명

MAI-Thinking-1 is Microsoft AI's first in-house reasoning model, a 35B-active / ~1T-total parameter sparse Mixture of Experts model (base model MAI-Base-1) trained from scratch without distillation from third-party models. Built with Microsoft's Hill-Climbing Machine pipeline, it was pre-trained on 30T tokens of clean, commercially licensed, human-generated data (plus 3.55T mid-training tokens), then post-trained via reinforcement learning across STEM, agentic coding, and helpfulness/safety specialists consolidated into a single model. It delivers strong mathematical reasoning and software-engineering performance for its weight class, going toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro and reaching 97.0% on AIME 2025. It supports a 256k token context window, function calling, and developer instructions, and is preferred over Claude Sonnet 4.6 in blind human side-by-side evaluations.

출시일
2026-06-02
파라미터
1.0T
컨텍스트 길이
모달리티

능력 레이더

80
general
60
coding
90
reasoning
68
science추정
60
agents
40
multimodal

전용 과학 벤치마크가 없을 때 Science는 추론 프록시를 사용하여 추정합니다.

랭킹

도메인#순위점수소스
에이전트형 역량45
60.0
LS

벤치마크 점수 (LLM Stats)

Agents

BFCL-v372.0%자체 보고
SWE-Bench Pro52.8%자체 보고
Terminal-Bench 2.046.0%자체 보고

Biology

GPQA84.2%자체 보고

Code

SWE-Bench Verified73.5%자체 보고
CyberSecEval 463.0%자체 보고

Communication

Multi-Challenge53.0%자체 보고

Factuality

LongFact98.0%자체 보고
SimpleQA Verified31.0%자체 보고

Finance

TruthfulQA88.0%자체 보고
MMLU-Pro85.0%자체 보고

General

LiveCodeBench v687.7%자체 보고
AdvancedIF85.0%자체 보고
CorpusQA82.0%자체 보고
IFBench69.0%자체 보고
LongBench v261.0%자체 보고

Healthcare

MedXpertQA43.0%자체 보고
HealthBench Professional35.0%자체 보고

Long Context

GraphWalks90.0%자체 보고

Math

AIME 202597.0%자체 보고
AIME 202694.5%자체 보고
HMMT Feb 2684.9%자체 보고

Safety

AIR-Bench88.0%자체 보고

AA 평가 지수

AA 평가 데이터가 없습니다

LLM Stats 카테고리 점수

Legal
90
Math
90
Finance
80
General
80
Language
80
Physics
80
Biology
80
Chemistry
80
Structured Output
70
Frontend Development
70
Instruction Following
70
Reasoning
70
Tool Calling
60
Healthcare
60
Long Context
60
Agents
60
Code
60
Communication
50
Vision
40
Multimodal
40

가격

가격 데이터가 없습니다

속도

속도 데이터가 없습니다

공급자 가격 순위

프로바이더 데이터가 없습니다

외부 링크