DeepSeek R1 Zero
DeepSeekDeepSeekOpen WeightMIT · Commercial OK
描述
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
發布日期
2025-01-20
參數規模
671.0B
上下文長度
—
支援模態
—
能力雷達圖
60
general
50
coding
90
reasoning
60
science估算
0
agents
0
multimodal
Science 在缺少專門科學評測時使用推理能力代理估算。
排行榜排名
暫無排名資料
基準測試分數 (LLM Stats)
Biology
GPQA
73.3%自報
Code
LiveCodeBench
50.0%自報
Math
MATH-500
95.9%自報
AIME 2024
86.7%自報
AA 評測指數
暫無 AA 評測資料
LLM Stats 分類評分
Math90
Reasoning80
Biology70
Chemistry70
Physics70
General60
Code50
定價
暫無定價資料
速度
暫無速度資料
可用提供商
(LS 內部計價單位)暫無提供商資料