DeepSeek R1 Zero

DeepSeekDeepSeek开源权重MIT · 商用许可

描述

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

发布日期

2025-01-20

参数规模

671.0B

上下文长度

—

支持模态

—

能力雷达图

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

暂无排名数据

基准测试分数 (LLM Stats)

Biology

GPQA

73.3%自报

Code

LiveCodeBench

50.0%自报

Math

MATH-500

95.9%自报

AIME 2024

86.7%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Math

Reasoning

Physics

Biology

Chemistry

General

Code

定价

暂无定价数据

速度

暂无速度数据

供应商价格排行

暂无提供商数据

外部链接

LLM Stats Artificial Analysis