Qwen3 Next 80B A3B Instruct
Descripción
Qwen3-Next-80B-A3B-Instruct is the first in the Qwen3-Next series, featuring groundbreaking architectural innovations. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared) achieving extreme low activation ratio, and Multi-Token Prediction for improved performance and faster inference. With 80B total parameters and only 3B activated, it outperforms Qwen3-32B-Base with 10% training cost and 10x throughput for 32K+ contexts. The model performs on par with Qwen3-235B-A22B-Instruct-2507 while excelling at ultra-long-context tasks up to 256K tokens (extensible to 1M with YaRN). Architecture: 48 layers, 15T training tokens, hybrid layout of 12*(3*(Gated DeltaNet->MoE)->(Gated Attention->MoE)).
Radar de capacidades
Science usa un proxy de razonamiento cuando los benchmarks científicos dedicados no están disponibles.
Rankings
| Dominio | #Posición | Puntuación | Fuente |
|---|---|---|---|
| Capacidad agéntica | 13 | 70.0 | LS |
| Ranking de codificación | 172 | 52.0 | AA |
| Ranking general | 272 | 39.0 | AA |
| Razonamiento matemático | 130 | 67.0 | AA |
| Ciencia | 225 | 46.0 | AA |
Puntuaciones de benchmarks (LLM Stats)
Agents
Biology
Chemistry
Code
Communication
Creativity
Finance
General
Math
Índices de evaluación AA
Puntuaciones por categoría LLM Stats
Precios
Velocidad
Ranking de Precios por Proveedor
Ranking de Precios por Proveedor
13 proveedores
Comparar precios entre diferentes proveedores de API para este modelo.