GPT-4o (Aug '24)
OpenAIGPTProprietary
Описание
GPT-4o ('o' for 'omni') is a multimodal AI model that accepts text, audio, image, and video inputs, and generates text, audio, and image outputs. It matches GPT-4 Turbo performance on text and code, with improvements in non-English languages, vision, and audio understanding.
Дата выхода
2024-08-06
Параметры
—
Длина контекста
128K
Модальности
file, image, text
Радар способностей
15
general
24
coding
40
reasoning
36
scienceоцен.
50
agents
90
multimodal
Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.
Рейтинги
| Домен | #Место | Оценка | Источник |
|---|---|---|---|
| Code Ranking | 265 | 30.0 | AA |
| General Ranking | 377 | 28.0 | AA |
| Math Reasoning | 196 | 46.0 | AA |
| Multimodal Ranking | 27 | 81.0 | LS |
| Reasoning | 94 | 37.0 | LS |
| Science | 290 | 37.0 | AA |
Оценки бенчмарков (LLM Stats)
Biology
GPQA
70.1%Сам.
Code
SWE-Bench Verified
33.2%Сам.
SWE-Lancer
32.6%Сам.
Aider-Polyglot
30.7%Сам.
Aider-Polyglot Edit
18.2%Сам.
SWE-Lancer (IC-Diamond subset)
12.4%Сам.
Communication
Tau2 Retail
63.4%Сам.
Multi-IF
60.9%Сам.
TAU-bench Retail
60.3%Сам.
Tau2 Airline
45.5%Сам.
TAU-bench Airline
42.8%Сам.
Multi-Challenge
40.3%Сам.
Tau2 Telecom
23.5%Сам.
Factuality
SimpleQA
38.2%Сам.
Finance
MMLU
85.7%Сам.
MMLU-Pro
74.7%Сам.
General
MMMLU
81.4%Сам.
IFEval
81.0%Сам.
MMMU
72.2%Сам.
MMMU-Pro
59.9%Сам.
Internal API instruction following (hard)
29.2%Сам.
Healthcare
VideoMMMU
61.2%Сам.
Image To Text
DocVQA
92.8%Сам.
Language
COLLIE
61.0%Сам.
Long Context
EgoSchema
72.2%Сам.
ComplexFuncBench
66.5%Сам.
OpenAI-MRCR: 2 needle 128k
31.9%Сам.
Math
MathVista
61.4%Сам.
AIME 2024
13.1%Сам.
Humanity's Last Exam
5.3%Сам.
Multimodal
AI2D
94.2%Сам.
ChartQA
85.7%Сам.
CharXiv-D
85.3%Сам.
CharXiv-R
58.8%Сам.
Reasoning
Graphwalks BFS <128k
41.7%Сам.
Graphwalks parents <128k
35.4%Сам.
ERQA
35.2%Сам.
Video
ActivityNet
61.9%Сам.
Индексы оценки AA
Intelligence Index18.6
Coding Index16.6
Math 5000.8
Gpqa0.5
Ifbench0.4
Lcr0.3
Scicode0.3
Livecodebench0.3
Tau20.3
Aime0.1
Terminalbench Hard0.1
Hle0.0
Оценки категорий LLM Stats
Image To Text90
Finance80
Legal80
Vision70
Biology70
Chemistry70
Healthcare70
Instruction Following70
Language70
Multimodal70
Physics70
Structured Output60
Writing60
General60
Long Context60
Tool Calling50
Communication50
Math50
Reasoning50
Spatial Reasoning40
Factuality40
Code30
Frontend Development30
Цены
Цена ввода$2.5 / 1M tokens
Цена вывода$10 / 1M tokens
Смешанная цена (3:1)$4.375 / 1M tokens
Скорость
Токенов/сек102.1 tokens/s
Задержка первого токена0.65s
Время до первого ответа0.65s
Доступные провайдеры
(Внутренние единицы LS)| Провайдер | Цена ввода | Цена вывода |
|---|---|---|
| OpenAI | 2.5M | 10.0M |
| Azure | 2.5M | 10.0M |