DeepSeek V3 0324
DeepSeekDeepSeekOpen WeightMIT + Model License (Commercial use allowed)
Description
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.
Release Date
2025-03-25
Parameters
671.0B
Context Length
164K
Modalities
text
Capability Radar
38
general
30
coding
54
reasoning
43
scienceest.
0
agents
0
multimodal
Science uses a reasoning proxy when dedicated science benchmarks are unavailable.
Rankings
| Domain | #Rank | Score | Source |
|---|---|---|---|
| Code Ranking | 217 | 39.0 | AA |
| General Ranking | 209 | 49.0 | AA |
| Math Reasoning | 164 | 54.0 | AA |
| Science | 232 | 45.0 | AA |
Benchmark Scores (LLM Stats)
Biology
GPQA
68.4%SR
Code
LiveCodeBench
49.2%SR
Finance
MMLU-Pro
81.2%SR
Math
MATH-500
94.0%SR
AIME 2024
59.4%SR
AA Evaluation Indices
Math Index41.0
Intelligence Index22.3
Coding Index22.0
Math 5000.9
Mmlu Pro0.8
Gpqa0.7
Aime0.5
Tau20.5
Aime 250.4
Ifbench0.4
Lcr0.4
Livecodebench0.4
Scicode0.4
Terminalbench Hard0.2
Hle0.1
LLM Stats Category Scores
Finance80
Healthcare80
Language80
Legal80
Math80
Biology70
Chemistry70
General70
Physics70
Reasoning70
Code50
Pricing
Input Price$1.195 / 1M tokens
Output Price$1.25 / 1M tokens
Blended Price (3:1)$1.209 / 1M tokens
Speed
Tokens/sec0.0 tokens/s
Time to First Token0.00s
Time to Answer0.00s
Available Providers
(LS internal units)| Provider | Input Price | Output Price |
|---|---|---|
| Novita | 280K | 1.1M |