o1

OpenAIOpenAI o-seriesProprietary

描述

A research preview model focused on mathematical and logical reasoning capabilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows enhanced capabilities in formal reasoning while maintaining strong general capabilities.

发布日期

2024-12-05

参数规模

—

上下文长度

200K

支持模态

image, pdf, text

能力雷达图

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域	#排名	分数	来源
代码能力榜	151	55.0	AA
通用能力榜	105	63.0	AA
数学推理	55	87.0	AA
科学能力	195	49.0	AA

基准测试分数 (LLM Stats)

Biology

GPQA

78.0%自报

GPQA Biology

69.2%自报

Chemistry

GPQA Chemistry

64.7%自报

Code

HumanEval

88.1%自报

SWE-Bench Verified

41.0%自报

Communication

TAU-bench Retail

70.8%自报

TAU-bench Airline

50.0%自报

Factuality

SimpleQA

47.0%自报

Finance

MMLU

91.8%自报

General

MMMLU

87.7%自报

MMMU

77.6%自报

LiveBench

67.0%自报

Math

GSM8k

97.1%自报

MATH

96.4%自报

MGSM

89.3%自报

AIME 2024

74.3%自报

MathVista

71.8%自报

FrontierMath

5.5%自报

Physics

GPQA Physics

92.8%自报

AA 评测指数

Coding Index

39.7

Intelligence Index

23.4

Math 500

1.0

Mmlu Pro

0.8

Gpqa

0.7

Aime

0.7

Ifbench

0.7

Livecodebench

0.7

Tau2

0.6

Lcr

0.6

Scicode

0.4

Terminalbench Hard

0.1

Hle

0.1

LLM Stats 分类评分

Language

Legal

Finance

Math

Physics

Healthcare

Biology

Chemistry

Multimodal

Reasoning

General

Vision

Code

Communication

Tool Calling

Factuality

Frontend Development

定价

输入价格$15 / 1M tokens

输出价格$60 / 1M tokens

混合价格(3:1)$26.25 / 1M tokens

缓存读取价格$7.5 / 1M tokens

速度

Tokens/秒148.8

首Token延迟12.79s

首回答延迟12.79s

供应商价格排行

13 个供应商

最便宜: Poe最贵: Merge Gateway

供应商输入输出

1Poe最便宜

$14

$54

2NanoGPT

$14.994

$59.993

3OpenAI主要

$15

$60

4OpenRouter

$15

$60

5Kilo Gateway

$15

$60

6Cloudflare AI Gateway

$15

$60

7Helicone

$15

$60

8Azure Cognitive Services

$15

$60

9DigitalOcean

$15

$60

10Vercel AI Gateway

$15

$60

11LLM Gateway

$15

$60

12Azure

$15

$60

13Merge Gateway

$15

$60

比较该模型在不同 API 供应商之间的定价。

外部链接

LLM Stats Artificial Analysis