DeepSeek R1 Zero

DeepSeekDeepSeek開源權重MIT · 商用許可

描述

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

發布日期

2025-01-20

參數規模

671.0B

上下文長度

—

支援模態

—

能力雷達圖

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

暫無排名資料

基準測試分數 (LLM Stats)

Biology

GPQA

73.3%自報

Code

LiveCodeBench

50.0%自報

Math

MATH-500

95.9%自報

AIME 2024

86.7%自報

AA 評測指數

暫無 AA 評測資料

LLM Stats 分類評分

Math

Reasoning

Physics

Biology

Chemistry

General

Code

定價

暫無定價資料

速度

暫無速度資料

供應商價格排行

暫無提供商資料

外部連結

LLM Stats Artificial Analysis