o1
OpenAIOpenAI o-seriesProprietary
描述
A research preview model focused on mathematical and logical reasoning capabilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows enhanced capabilities in formal reasoning while maintaining strong general capabilities.
发布日期
2024-12-05
参数规模
—
上下文长度
200K
支持模态
file, image, text
能力雷达图
43
general
39
coding
80
reasoning
48
science估算
60
agents
70
multimodal
Science 在缺少专门科学评测时使用推理能力代理估算。
排行榜排名
基准测试分数 (LLM Stats)
Biology
GPQA
78.0%自报
GPQA Biology
69.2%自报
Chemistry
GPQA Chemistry
64.7%自报
Code
HumanEval
88.1%自报
SWE-Bench Verified
41.0%自报
Communication
TAU-bench Retail
70.8%自报
TAU-bench Airline
50.0%自报
Factuality
SimpleQA
47.0%自报
Finance
MMLU
91.8%自报
General
MMMLU
87.7%自报
MMMU
77.6%自报
LiveBench
67.0%自报
Math
GSM8k
97.1%自报
MATH
96.4%自报
MGSM
89.3%自报
AIME 2024
74.3%自报
MathVista
71.8%自报
FrontierMath
5.5%自报
Physics
GPQA Physics
92.8%自报
AA 评测指数
Intelligence Index30.8
Coding Index20.5
Math 5001.0
Mmlu Pro0.8
Gpqa0.7
Aime0.7
Ifbench0.7
Livecodebench0.7
Tau20.6
Lcr0.6
Scicode0.4
Terminalbench Hard0.1
Hle0.1
LLM Stats 分类评分
Finance90
Language90
Legal90
Biology80
Chemistry80
Healthcare80
Math80
Physics80
Vision70
General70
Multimodal70
Reasoning70
Tool Calling60
Code60
Communication60
Factuality50
Frontend Development40
定价
输入价格$15 / 1M tokens
输出价格$60 / 1M tokens
混合价格(3:1)$26.25 / 1M tokens
速度
Tokens/秒111.0 tokens/s
首Token延迟22.15s
首回答延迟22.15s
可用提供商
(LS 内部计价单位)暂无提供商数据