Qwen3 Next 80B A3B Instruct
Description
Qwen3-Next-80B-A3B-Instruct is the first in the Qwen3-Next series, featuring groundbreaking architectural innovations. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared) achieving extreme low activation ratio, and Multi-Token Prediction for improved performance and faster inference. With 80B total parameters and only 3B activated, it outperforms Qwen3-32B-Base with 10% training cost and 10x throughput for 32K+ contexts. The model performs on par with Qwen3-235B-A22B-Instruct-2507 while excelling at ultra-long-context tasks up to 256K tokens (extensible to 1M with YaRN). Architecture: 48 layers, 15T training tokens, hybrid layout of 12*(3*(Gated DeltaNet->MoE)->(Gated Attention->MoE)).
Capability Radar
Science uses a reasoning proxy when dedicated science benchmarks are unavailable.
Rankings
| Domain | #Rank | Score | Source |
|---|---|---|---|
| Agentic Capability | 13 | 70.0 | LS |
| Code Ranking | 172 | 52.0 | AA |
| General Ranking | 272 | 39.0 | AA |
| Math Reasoning | 130 | 67.0 | AA |
| Science | 225 | 46.0 | AA |
Benchmark Scores (LLM Stats)
Agents
Biology
Chemistry
Code
Communication
Creativity
Finance
General
Math
AA Evaluation Indices
LLM Stats Category Scores
Pricing
Speed
Provider Price Ranking
Provider Price Ranking
13 providers
Compare pricing across different API providers for this model.