Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

NVIDIALlamaOpen WeightLlama 3.1 Community License

Description

A 253B parameter derivative of Meta Llama 3.1 405B Instruct, developed by NVIDIA using Neural Architecture Search (NAS) and vertical compression. It underwent multi-phase post-training (SFT for Math, Code, Reasoning, Chat, Tool Calling; RL with GRPO) to enhance reasoning and instruction-following. Optimized for accuracy/efficiency tradeoff on NVIDIA GPUs. Supports 128k context.

Release Date

2025-04-07

Parameters

253.0B

Context Length

—

Modalities

—

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Code Ranking	307	28.0	AA
General Ranking	314	34.0	AA
Math Reasoning	108	73.0	AA
Science	192	49.0	AA

Benchmark Scores (LLM Stats)

Biology

GPQA

76.0%SR

Code

LiveCodeBench

66.3%SR

General

IFEval

89.5%SR

BFCL v2

74.1%SR

Math

MATH-500

97.0%SR

AIME 2025

72.5%SR

AA Evaluation Indices

Math Index

63.7

Intelligence Index

9.1

Math 500

1.0

Mmlu Pro

0.8

Aime

0.7

Gpqa

0.7

Livecodebench

0.6

Aime 25

0.6

Ifbench

0.4

Scicode

0.3

Tau2

0.1

Hle

0.1

Lcr

0.1

Terminalbench Hard

0.0

LLM Stats Category Scores

Instruction Following

Structured Output

Math

Physics

Reasoning

General

Biology

Chemistry

Code

Tool Calling

Pricing

Input Price$0.6 / 1M tokens

Output Price$1.8 / 1M tokens

Blended Price (3:1)$0.9 / 1M tokens

Speed

Tokens/sec52.2

Time to First Token0.70s

Time to Answer39.03s

Provider Price Ranking

3 providers

Cheapest: NVIDIAMost Expensive: LLM Gateway

ProviderInputOutput

1NVIDIAPRIMARY

$0.6

$1.8

2Nebius Token Factory

$0.6

$1.8

3LLM Gateway

$0.6

$1.8

Compare pricing across different API providers for this model.

External Sources

LLM Stats Artificial Analysis