跳转到主要内容

GPT-4.1

OpenAIGPTProprietary

描述

GPT-4.1 is OpenAI's latest and most advanced flagship model, significantly improving upon GPT-4 Turbo in performance across benchmarks, speed, and cost-effectiveness.

发布日期
2025-04-14
参数规模
上下文长度
1.0M
支持模态
file, image, text

能力雷达图

39
general
32
coding
49
reasoning
44
science估算
60
agents
85
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
代码能力榜177
45.0
AA
通用能力榜181
52.0
AA
数学推理188
48.0
AA
多模态榜53
74.0
LS
推理能力63
60.0
LS
科学能力206
47.0
AA

基准测试分数 (LLM Stats)

Biology

GPQA66.3%自报

Code

SWE-Bench Verified54.6%自报
Aider-Polyglot Edit52.9%自报
Aider-Polyglot51.6%自报

Communication

Multi-IF70.8%自报
TAU-bench Retail68.0%自报
TAU-bench Airline49.4%自报
Multi-Challenge38.3%自报

Finance

MMLU90.2%自报

General

IFEval87.4%自报
MMMLU87.3%自报
MMMU74.8%自报
Internal API instruction following (hard)49.1%自报

Language

COLLIE65.8%自报

Long Context

ComplexFuncBench65.5%自报
OpenAI-MRCR: 2 needle 128k57.2%自报
OpenAI-MRCR: 2 needle 1M46.3%自报
Graphwalks parents >128k25.0%自报
Graphwalks BFS >128k19.0%自报

Math

MathVista72.2%自报
AIME 202448.1%自报
AIME 202546.4%自报
HMMT 202528.9%自报
Humanity's Last Exam5.4%自报

Multimodal

CharXiv-D87.9%自报
Video-MME (long, no subtitles)72.0%自报
CharXiv-R56.7%自报

Reasoning

Graphwalks BFS <128k61.7%自报
Graphwalks parents <128k58.0%自报

AA 评测指数

Math Index
34.7
Intelligence Index
26.3
Coding Index
21.8
Math 500
0.9
Mmlu Pro
0.8
Gpqa
0.7
Lcr
0.6
Tau2
0.5
Livecodebench
0.5
Aime
0.4
Ifbench
0.4
Scicode
0.4
Aime 25
0.3
Terminalbench Hard
0.1
Hle
0.0

LLM Stats 分类评分

Finance
90
Legal
90
Healthcare
80
Instruction Following
80
Language
80
Structured Output
70
Writing
70
Biology
70
Chemistry
70
General
70
Multimodal
70
Physics
70
Tool Calling
60
Vision
60
Communication
60
Reasoning
60
Code
50
Frontend Development
50
Math
50
Spatial Reasoning
40
Long Context
40

定价

输入价格$2 / 1M tokens
输出价格$8 / 1M tokens
混合价格(3:1)$3.5 / 1M tokens

速度

Tokens/秒108.1 tokens/s
首Token延迟0.55s
首回答延迟0.55s

可用提供商

(LS 内部计价单位)
提供商输入价格输出价格
OpenAI2.0M8.0M

外部链接