跳转到主要内容

o3-mini

OpenAIOpenAI o-seriesProprietary

描述

A smaller variant of O3, expected to offer enhanced multimodal capabilities, improved reasoning, and more efficient resource utilization compared to previous models while maintaining strong performance on core tasks.

发布日期
2025-01-31
参数规模
上下文长度
200K
支持模态
file, text

能力雷达图

39
general
39
coding
83
reasoning
49
science估算
40
agents
85
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
代码能力榜230
37.0
AA
通用能力榜214
48.0
AA
数学推理50
89.0
AA
推理能力78
54.0
LS
科学能力146
54.0
AA

基准测试分数 (LLM Stats)

Biology

GPQA77.2%自报

Code

Aider-Polyglot66.7%自报
Aider-Polyglot Edit60.4%自报
SWE-Bench Verified49.3%自报
SWE-Lancer18.0%自报
SWE-Lancer (IC-Diamond subset)7.4%自报

Communication

Multi-IF79.5%自报
TAU-bench Retail57.6%自报
Multi-Challenge39.9%自报
TAU-bench Airline32.4%自报

Factuality

SimpleQA15.0%自报

Finance

MMLU86.9%自报

General

IFEval93.9%自报
LiveBench84.6%自报
Multilingual MMLU80.7%自报
Internal API instruction following (hard)50.0%自报

Language

COLLIE98.7%自报

Long Context

OpenAI-MRCR: 2 needle 128k18.7%自报
ComplexFuncBench17.6%自报

Math

MATH97.9%自报
MGSM92.0%自报
AIME 202487.3%自报
FrontierMath9.2%自报

Reasoning

Graphwalks parents <128k58.3%自报
Graphwalks BFS <128k51.0%自报

AA 评测指数

Intelligence Index
25.9
Coding Index
17.9
Math 500
1.0
Mmlu Pro
0.8
Aime
0.8
Gpqa
0.7
Livecodebench
0.7
Scicode
0.4
Tau2
0.3
Hle
0.1
Terminalbench Hard
0.1

LLM Stats 分类评分

Writing
100
Finance
90
Healthcare
90
Instruction Following
90
Language
90
Legal
90
Biology
80
Chemistry
80
Math
80
Physics
80
General
70
Structured Output
60
Reasoning
60
Spatial Reasoning
50
Communication
50
Frontend Development
50
Tool Calling
40
Code
40
Long Context
20
Factuality
10

定价

输入价格$1.1 / 1M tokens
输出价格$4.4 / 1M tokens
混合价格(3:1)$1.925 / 1M tokens

速度

Tokens/秒135.1 tokens/s
首Token延迟10.07s
首回答延迟10.07s

可用提供商

(LS 内部计价单位)

暂无提供商数据

外部链接