跳转到主要内容

o1

OpenAIOpenAI o-seriesProprietary

描述

A research preview model focused on mathematical and logical reasoning capabilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows enhanced capabilities in formal reasoning while maintaining strong general capabilities.

发布日期
2024-12-05
参数规模
上下文长度
200K
支持模态
file, image, text

能力雷达图

43
general
39
coding
80
reasoning
48
science估算
60
agents
70
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域#排名分数来源
代码能力榜157
49.0
AA
通用能力榜104
66.0
AA
数学推理55
87.0
AA
科学能力169
51.0
AA

基准测试分数 (LLM Stats)

Biology

GPQA78.0%自报
GPQA Biology69.2%自报

Chemistry

GPQA Chemistry64.7%自报

Code

HumanEval88.1%自报
SWE-Bench Verified41.0%自报

Communication

TAU-bench Retail70.8%自报
TAU-bench Airline50.0%自报

Factuality

SimpleQA47.0%自报

Finance

MMLU91.8%自报

General

MMMLU87.7%自报
MMMU77.6%自报
LiveBench67.0%自报

Math

GSM8k97.1%自报
MATH96.4%自报
MGSM89.3%自报
AIME 202474.3%自报
MathVista71.8%自报
FrontierMath5.5%自报

Physics

GPQA Physics92.8%自报

AA 评测指数

Intelligence Index
30.8
Coding Index
20.5
Math 500
1.0
Mmlu Pro
0.8
Gpqa
0.7
Aime
0.7
Ifbench
0.7
Livecodebench
0.7
Tau2
0.6
Lcr
0.6
Scicode
0.4
Terminalbench Hard
0.1
Hle
0.1

LLM Stats 分类评分

Finance
90
Language
90
Legal
90
Biology
80
Chemistry
80
Healthcare
80
Math
80
Physics
80
Vision
70
General
70
Multimodal
70
Reasoning
70
Tool Calling
60
Code
60
Communication
60
Factuality
50
Frontend Development
40

定价

输入价格$15 / 1M tokens
输出价格$60 / 1M tokens
混合价格(3:1)$26.25 / 1M tokens

速度

Tokens/秒111.0 tokens/s
首Token延迟22.15s
首回答延迟22.15s

可用提供商

(LS 内部计价单位)

暂无提供商数据

外部链接