跳转到主要内容

GPT-4.1 mini

OpenAIGPTProprietary

描述

GPT-4.1 mini provides a balance between intelligence, speed, and cost. It's a significant leap in small model performance, even beating GPT-4o in many benchmarks while reducing latency and cost.

发布日期
2025-04-14
参数规模
上下文长度
1.0M
支持模态
file, image, text

能力雷达图

37
general
31
coding
54
reasoning
45
science估算
50
agents
85
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
代码能力榜229
37.0
AA
通用能力榜207
49.0
AA
数学推理161
56.0
AA
多模态榜49
75.0
LS
推理能力61
62.0
LS
科学能力196
48.0
AA

基准测试分数 (LLM Stats)

Biology

GPQA65.0%自报

Code

Aider-Polyglot34.7%自报
Aider-Polyglot Edit31.6%自报
SWE-Bench Verified23.6%自报

Communication

Multi-IF67.0%自报
TAU-bench Retail55.8%自报
TAU-bench Airline36.0%自报
Multi-Challenge35.8%自报

Finance

MMLU87.5%自报

General

IFEval84.1%自报
MMMLU78.5%自报
MMMU72.7%自报
Internal API instruction following (hard)45.1%自报

Language

COLLIE54.6%自报

Long Context

ComplexFuncBench49.3%自报
OpenAI-MRCR: 2 needle 128k47.2%自报
OpenAI-MRCR: 2 needle 1M33.3%自报
Graphwalks BFS >128k15.0%自报
Graphwalks parents >128k11.0%自报

Math

MathVista73.1%自报
AIME 202449.6%自报
AIME 202540.2%自报
HMMT 202535.0%自报
Humanity's Last Exam3.7%自报

Multimodal

CharXiv-D88.4%自报
CharXiv-R56.8%自报

Reasoning

Graphwalks BFS <128k61.7%自报
Graphwalks parents <128k60.5%自报

AA 评测指数

Math Index
46.3
Intelligence Index
22.9
Coding Index
18.5
Math 500
0.9
Mmlu Pro
0.8
Gpqa
0.7
Tau2
0.5
Livecodebench
0.5
Aime 25
0.5
Aime
0.4
Lcr
0.4
Scicode
0.4
Ifbench
0.4
Terminalbench Hard
0.1
Hle
0.0

LLM Stats 分类评分

Finance
90
Legal
90
Healthcare
80
Instruction Following
80
Structured Output
70
Biology
70
Chemistry
70
Language
70
Multimodal
70
Physics
70
Vision
60
General
60
Tool Calling
50
Writing
50
Communication
50
Math
50
Reasoning
50
Spatial Reasoning
40
Code
30
Long Context
30
Frontend Development
20

定价

输入价格$0.4 / 1M tokens
输出价格$1.6 / 1M tokens
混合价格(3:1)$0.7 / 1M tokens

速度

Tokens/秒78.8 tokens/s
首Token延迟0.52s
首回答延迟0.52s

可用提供商

(LS 内部计价单位)
提供商输入价格输出价格
OpenAI400K1.6M

外部链接