DeepSeek V3 0324

DeepSeekDeepSeekOpen WeightMIT + Model License (Commercial use allowed)

Description

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.

Release Date

2025-03-25

Parameters

671.0B

Context Length

164K

Modalities

text

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Code Ranking	217	39.0	AA
General Ranking	209	49.0	AA
Math Reasoning	164	54.0	AA
Science	232	45.0	AA

Benchmark Scores (LLM Stats)

Biology

GPQA

68.4%SR

Code

LiveCodeBench

49.2%SR

Finance

MMLU-Pro

81.2%SR

Math

MATH-500

94.0%SR

AIME 2024

59.4%SR

AA Evaluation Indices

Math Index

41.0

Intelligence Index

22.3

Coding Index

22.0

Math 500

0.9

Mmlu Pro

0.8

Gpqa

0.7

Aime

0.5

Tau2

0.5

Aime 25

0.4

Ifbench

0.4

Lcr

0.4

Livecodebench

0.4

Scicode

0.4

Terminalbench Hard

0.2

Hle

0.1

LLM Stats Category Scores

Finance

Healthcare

Language

Legal

Math

Biology

Chemistry

General

Physics

Reasoning

Code

Pricing

Input Price$1.195 / 1M tokens

Output Price$1.25 / 1M tokens

Blended Price (3:1)$1.209 / 1M tokens

Speed

Tokens/sec0.0 tokens/s

Time to First Token0.00s

Time to Answer0.00s

Available Providers

(LS internal units)

Provider	Input Price	Output Price
Novita	280K	1.1M

External Sources

LLM Stats