跳转到主要内容

DeepSeek R1 Zero

DeepSeekDeepSeekOpen WeightMIT · Commercial OK

描述

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

发布日期
2025-01-20
参数规模
671.0B
上下文长度
支持模态

能力雷达图

60
general
50
coding
90
reasoning
60
science估算
0
agents
0
multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

暂无排名数据

基准测试分数 (LLM Stats)

Biology

GPQA73.3%自报

Code

LiveCodeBench50.0%自报

Math

MATH-50095.9%自报
AIME 202486.7%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Math
90
Reasoning
80
Biology
70
Chemistry
70
Physics
70
General
60
Code
50

定价

暂无定价数据

速度

暂无速度数据

可用提供商

(LS 内部计价单位)

暂无提供商数据

外部链接

DeepSeek R1 Zero — DeepSeek | AITier