跳轉到主要內容

DeepSeek R1 Zero

DeepSeekDeepSeekOpen WeightMIT · Commercial OK

描述

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

發布日期
2025-01-20
參數規模
671.0B
上下文長度
支援模態

能力雷達圖

60
general
50
coding
90
reasoning
60
science估算
0
agents
0
multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

暫無排名資料

基準測試分數 (LLM Stats)

Biology

GPQA73.3%自報

Code

LiveCodeBench50.0%自報

Math

MATH-50095.9%自報
AIME 202486.7%自報

AA 評測指數

暫無 AA 評測資料

LLM Stats 分類評分

Math
90
Reasoning
80
Biology
70
Chemistry
70
Physics
70
General
60
Code
50

定價

暫無定價資料

速度

暫無速度資料

可用提供商

(LS 內部計價單位)

暫無提供商資料

外部連結