DeepSeek R1 Zero
Description
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
Capability Radar
Science uses a reasoning proxy when dedicated science benchmarks are unavailable.
Rankings
No ranking data available
Benchmark Scores (LLM Stats)
Biology
Code
Math
AA Evaluation Indices
No AA evaluation data available
LLM Stats Category Scores
Pricing
No pricing data available
Speed
No speed data available
Available Providers
(LS internal units)No provider data available