跳转到主要内容

DeepSeek R1 Zero

DeepSeekDeepSeek开源权重MIT · 商用许可

描述

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

发布日期
2025-01-20
参数规模
671.0B
上下文长度
支持模态

能力雷达图

60
general
50
coding
90
reasoning
60
science估算
78
agents
0
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

暂无排名数据

基准测试分数 (LLM Stats)

Biology

GPQA73.3%自报

Code

LiveCodeBench50.0%自报

Math

MATH-50095.9%自报
AIME 202486.7%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Math
90
Reasoning
80
Physics
70
Biology
70
Chemistry
70
General
60
Code
50

定价

暂无定价数据

速度

暂无速度数据

供应商价格排行

暂无提供商数据

外部链接