Qwen3 Next 80B A3B Instruct
Description
Qwen3-Next-80B-A3B-Instruct is the first in the Qwen3-Next series, featuring groundbreaking architectural innovations. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared) achieving extreme low activation ratio, and Multi-Token Prediction for improved performance and faster inference. With 80B total parameters and only 3B activated, it outperforms Qwen3-32B-Base with 10% training cost and 10x throughput for 32K+ contexts. The model performs on par with Qwen3-235B-A22B-Instruct-2507 while excelling at ultra-long-context tasks up to 256K tokens (extensible to 1M with YaRN). Architecture: 48 layers, 15T training tokens, hybrid layout of 12*(3*(Gated DeltaNet->MoE)->(Gated Attention->MoE)).
Capability Radar
Science uses a reasoning proxy when dedicated science benchmarks are unavailable.
Rankings
| Domain | #Rank | Score | Source |
|---|---|---|---|
| Agents & Tools | 15 | 70.0 | LS |
| Code Ranking | 193 | 42.0 | AA |
| General Ranking | 246 | 42.0 | AA |
| Math Reasoning | 130 | 67.0 | AA |
| Science | 204 | 47.0 | AA |
Benchmark Scores (LLM Stats)
Agents
Biology
Chemistry
Code
Communication
Creativity
Finance
General
Math
AA Evaluation Indices
LLM Stats Category Scores
Pricing
Speed
Available Providers
(LS internal units)| Provider | Input Price | Output Price |
|---|---|---|
| Novita | 150K | 1.5M |