DeepSeek V4 Pro (Non-reasoning)
Description
DeepSeek-V4-Pro-Max is the maximum reasoning effort mode of DeepSeek-V4-Pro, a 1.6T-parameter MoE model with 49B activated parameters and a 1M-token context window. It introduces a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for dramatically improved long-context efficiency, requiring only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2 at 1M-token context. The model also incorporates Manifold-Constrained Hyper-Connections (mHC) for stable signal propagation and is trained with the Muon optimizer for faster convergence. Pre-trained on more than 32T tokens, V4-Pro-Max significantly advances open-source knowledge capabilities, achieves top-tier performance in coding benchmarks, and bridges the gap with leading closed-source models on reasoning and agentic tasks.
Capability Radar
Science uses a reasoning proxy when dedicated science benchmarks are unavailable.
Rankings
| Domain | #Rank | Score | Source |
|---|---|---|---|
| Code Ranking | 137 | 59.0 | AA |
| General Ranking | 125 | 60.0 | AA |
| Science | 164 | 52.0 | AA |
Benchmark Scores (LLM Stats)
Agents
Biology
Code
Factuality
Finance
General
Math
AA Evaluation Indices
LLM Stats Category Scores
Pricing
Speed
Provider Price Ranking
Provider Price Ranking
21 providers
Compare pricing across different API providers for this model.