Skip to main content

Mercury 2

InceptionProprietary

Description

Mercury 2 is the fastest reasoning LLM, built on diffusion-based language model (dLLM) architecture. Instead of generating text token-by-token, it refines multiple text blocks simultaneously, achieving over 1,000 tokens per second on Nvidia Blackwell GPUs — 5x faster than leading speed-optimized LLMs. Supports tool usage and JSON output with 128K context window.

Release Date
2026-02-20
Parameters
Context Length
128K
Modalities
text

Capability Radar

23
general
39
coding
77
reasoning
51
scienceest.
50
agents
0
multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain#RankScoreSource
Code Ranking220
45.0
AA
General Ranking132
59.0
AA
Science124
57.0
AA

Benchmark Scores (LLM Stats)

Biology

GPQA74.0%SR
SciCode38.0%SR

Code

LiveCodeBench67.0%SR

Communication

Tau2 Airline53.0%SR

General

IFBench71.0%SR

Math

AIME 202591.1%SR

AA Evaluation Indices

Intelligence Index
25.3
Gpqa
0.8
Tau2
0.7
Ifbench
0.7
Scicode
0.4
Lcr
0.4
Terminalbench Hard
0.3
Hle
0.2

LLM Stats Category Scores

Instruction Following
70
General
70
Math
60
Physics
60
Reasoning
60
Biology
60
Chemistry
60
Code
50
Communication
50
Tool Calling
50

Pricing

Input Price$0.25 / 1M tokens
Output Price$0.75 / 1M tokens
Blended Price (3:1)$0.375 / 1M tokens
Cache Read Price$0.025 / 1M tokens

Speed

Tokens/sec1239.8
Time to First Token3.43s
Time to Answer3.43s

Provider Price Ranking

Provider Price Ranking

6 providers

Cheapest: InceptionMost Expensive: Venice AI
ProviderInputOutput
1InceptionCheapest
$0
$0
2NanoGPT
$0.25
$0.75
3OpenRouter
$0.25
$0.75
4Kilo Gateway
$0.25
$0.75
5Vercel AI Gateway
$0.25
$0.75
6Venice AI
$0.3125
$0.9375

Compare pricing across different API providers for this model.

External Sources