Nemotron 3 Ultra (550B A55B)
Description
Nemotron 3 Ultra is NVIDIA's frontier-scale open model with 550B total / 55B active parameters, built for agentic reasoning, long-context analysis, tool use, and high-stakes RAG. It uses a hybrid Latent Mixture-of-Experts (LatentMoE) architecture interleaving Mamba-2, MoE, and select Attention layers, with Multi-Token Prediction (MTP) for native speculative decoding, and is pre-trained on ~20T tokens with an NVFP4 recipe. Reasoning is configurable on/off (plus a medium-effort mode) via the chat template. It supports up to a 1M-token context and 10 languages (English, French, Spanish, Italian, German, Japanese, Hindi, Korean, Brazilian Portuguese, Chinese). Released with open weights, training data, and recipes under the OpenMDW-1.1 license.
Capability Radar
Science uses a reasoning proxy when dedicated science benchmarks are unavailable.
Rankings
| Domain | #Rank | Score | Source |
|---|---|---|---|
| Agentic Capability | 93 | 48.0 | LS |
| Reasoning | 21 | 85.0 | LS |
Benchmark Scores (LLM Stats)
Agents
Biology
Code
Communication
Finance
General
Knowledge
Language
Long Context
Math
Reasoning
AA Evaluation Indices
No AA evaluation data available
LLM Stats Category Scores
Pricing
Speed
No speed data available
Provider Price Ranking
Provider Price Ranking
4 providers
Compare pricing across different API providers for this model.