Qwen3 VL 235B A22B (Reasoning)
설명
Qwen3-VL-235B-A22B-Thinking is the most powerful vision-language model in the Qwen series, featuring 236B parameters with MoE architecture for reasoning-enhanced multimodal understanding. Key capabilities include: Visual Agent (operates PC/mobile GUIs, recognizes elements, invokes tools), Visual Coding (generates Draw.io/HTML/CSS/JS from images/videos), Advanced Spatial Perception (2D grounding and 3D grounding for spatial reasoning and embodied AI), Long Context & Video Understanding (native 256K context expandable to 1M, handles hours-long video with second-level indexing), Enhanced Multimodal Reasoning (excels in STEM/Math with causal analysis), Upgraded Visual Recognition (celebrities, anime, products, landmarks, flora/fauna), and Expanded OCR (32 languages, robust in low light/blur/tilt). Architecture innovations include Interleaved-MRoPE for positional embeddings, DeepStack for multi-level ViT feature fusion, and Text-Timestamp Alignment for precise video temporal modeling.
능력 레이더
전용 과학 벤치마크가 없을 때 Science는 추론 프록시를 사용하여 추정합니다.
랭킹
| 도메인 | #순위 | 점수 | 소스 |
|---|---|---|---|
| Agents & Tools | 24 | 66.0 | LS |
| Code Ranking | 171 | 47.0 | AA |
| General Ranking | 146 | 59.0 | AA |
| Math Reasoning | 49 | 89.0 | AA |
| Multimodal Ranking | 64 | 67.0 | LS |
| Reasoning | 37 | 75.0 | LS |
| Science | 137 | 56.0 | AA |
벤치마크 점수 (LLM Stats)
3d
Agents
Chemistry
Code
Communication
Creativity
Embodied
Factuality
Finance
General
Grounding
Healthcare
Image To Text
Instruction Following
Language
Long Context
Math
Multimodal
Reasoning
Spatial Reasoning
Vision
AA 평가 지수
LLM Stats 카테고리 점수
가격
속도
사용 가능한 프로바이더
(LS 내부 단위)| 프로바이더 | 입력 가격 | 출력 가격 |
|---|---|---|
| DeepInfra | 450K | 3.5M |
| Novita | 980K | 4.0M |