Qwen3 Next 80B A3B (Reasoning)
Описание
Qwen3-Next-80B-A3B-Thinking is the thinking variant of the Qwen3-Next series, featuring the same groundbreaking architecture as the instruct model. Leveraging GSPO, it addresses stability and efficiency challenges of hybrid attention + high-sparsity MoE in RL training. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared), and Multi-Token Prediction. With 80B total parameters and only 3B activated, it demonstrates outstanding performance on complex reasoning tasks — outperforming Qwen3-30B-A3B-Thinking-2507, Qwen3-32B-Thinking, and even the proprietary Gemini-2.5-Flash-Thinking across multiple benchmarks. Architecture: 48 layers, 15T training tokens, hybrid layout of 12*(3*(Gated DeltaNet->MoE)->(Gated Attention->MoE)). Supports only thinking mode with automatic <think> tag inclusion, may generate longer thinking content.
Радар способностей
Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.
Рейтинги
| Домен | #Место | Оценка | Источник |
|---|---|---|---|
| Agents & Tools | 11 | 72.0 | LS |
| Code Ranking | 156 | 49.0 | AA |
| General Ranking | 166 | 56.0 | AA |
| Math Reasoning | 66 | 85.0 | AA |
| Reasoning | 100 | 30.0 | LS |
| Science | 133 | 56.0 | AA |
Оценки бенчмарков (LLM Stats)
Agents
Biology
Chemistry
Code
Communication
Creativity
Finance
General
Math
Reasoning
Индексы оценки AA
Оценки категорий LLM Stats
Цены
Скорость
Доступные провайдеры
(Внутренние единицы LS)| Провайдер | Цена ввода | Цена вывода |
|---|---|---|
| Novita | 150K | 1.5M |