Qwen3 235B A22B 2507 (Reasoning)

AlibabaQwen오픈 웨이트Apache 2.0 · 상업적 사용 가능

설명

Qwen3-235B-A22B-Thinking-2507 is a state-of-the-art thinking-enabled Mixture-of-Experts (MoE) model with 235B total parameters (22B activated). It features 94 layers, 128 experts (8 activated), and supports 262K native context length. This version delivers significantly improved reasoning performance, achieving state-of-the-art results among open-source thinking models on logical reasoning, mathematics, science, coding, and academic benchmarks. Key enhancements include markedly better general capabilities (instruction following, tool usage, text generation), enhanced 256K long-context understanding, and increased thinking depth. The model supports only thinking mode with automatic <think> tag inclusion.

출시일

2025-07-25

파라미터

235.0B

컨텍스트 길이

262K

모달리티

text

능력 레이더

general

coding

reasoning

science추정

agents

multimodal

전용 과학 벤치마크가 없을 때 Science는 추론 프록시를 사용하여 추정합니다.

랭킹

도메인	#순위	점수	소스
에이전트형 역량	9	72.0	LS
코딩 랭킹	101	66.0	AA
종합 랭킹	164	55.0	AA
수학 추론	19	95.0	AA
추론	103	33.0	LS
과학	118	59.0	AA

벤치마크 점수 (LLM Stats)

Agents

BFCL-v3

71.9%자체 보고

Biology

GPQA

81.1%자체 보고

Chemistry

SuperGPQA

64.9%자체 보고

Code

CFEval

2134.00 / 10000자체 보고

Communication

WritingBench

88.3%자체 보고

Multi-IF

80.6%자체 보고

Tau2 Retail

71.9%자체 보고

TAU-bench Retail

67.8%자체 보고

Tau2 Airline

58.0%자체 보고

TAU-bench Airline

46.0%자체 보고

Tau2 Telecom

45.6%자체 보고

Creativity

Creative Writing v3

86.1%자체 보고

Arena-Hard v2

79.7%자체 보고

Finance

MMLU-Pro

84.4%자체 보고

MMLU-ProX

81.0%자체 보고

General

MMLU-Redux

93.8%자체 보고

IFEval

87.8%자체 보고

Include

81.0%자체 보고

LiveBench 20241125

78.4%자체 보고

LiveCodeBench v6

74.1%자체 보고

Math

AIME 2025

92.3%자체 보고

HMMT25

83.9%자체 보고

PolyMATH

60.1%자체 보고

Humanity's Last Exam

18.2%자체 보고

Reasoning

OJBench

32.5%자체 보고

AA 평가 지수

Math Index

91.0

Intelligence Index

22.3

Math 500

1.0

Aime

0.9

Aime 25

0.9

Mmlu Pro

0.8

Gpqa

0.8

Livecodebench

0.8

Lcr

0.7

Tau2

0.5

Ifbench

0.5

Scicode

0.4

Hle

0.1

Terminalbench Hard

0.1

LLM Stats 카테고리 점수

Instruction Following

Language

Legal

Structured Output

Finance

General

Healthcare

Biology

Creativity

Writing

Math

Physics

Reasoning

Agents

Chemistry

Communication

Multimodal

Spatial Reasoning

Economics

Tool Calling

Vision

가격

입력 가격$0.4 / 1M 토큰

출력 가격$2.15 / 1M 토큰

혼합 가격 (3:1)$0.838 / 1M 토큰

속도

토큰/초83.4

첫 토큰 지연1.19s

첫 응답 지연25.16s

공급자 가격 순위

4개 공급자

최저가: Amazon Bedrock최고가: Nebius Token Factory

공급자입력출력

1Amazon Bedrock최저가

$0.22

$0.88

2Vercel AI Gateway

$0.22

$0.88

3Alibaba주요

$0.4

$2.15

4Nebius Token Factory

$0.5

이 모델의 다양한 API 공급자 간 가격 비교.

외부 링크

LLM Stats Artificial Analysis