Перейти к основному содержанию

Qwen3 4B 2507 Instruct

AlibabaQwen
Дата выхода
2025-08-06
Параметры
Длина контекста
262K
Модальности
audio, image, text, video

Радар способностей

25
general
33
coding
52
reasoning
31
scienceоцен.
60
agents
80
multimodal

Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.

Рейтинги

Домен#МестоОценкаИсточник
Рейтинг кодинга383
19.0
AA
Общий рейтинг374
30.0
AA
Математическое мышление172
53.0
AA
Наука388
29.0
AA

Оценки бенчмарков (LLM Stats)

3d

SUNRGBD0.36 / 100Сам.
Hypersim0.13 / 100Сам.

Agents

GDPval-AA985.00 / 3000Сам.
t2-bench79.5%Сам.
BFCL-V472.2%Сам.
AndroidWorld_SR66.4%Сам.
BrowseComp63.8%Сам.
FullStackBench en62.6%Сам.
WideSearch60.5%Сам.
FullStackBench zh58.7%Сам.
OSWorld-Verified58.0%Сам.
TIR-Bench53.2%Сам.
Terminal-Bench 2.049.4%Сам.
VITA-Bench33.6%Сам.
DeepPlanning24.1%Сам.

Biology

GPQA86.6%Сам.

Chemistry

SuperGPQA67.1%Сам.

Code

SWE-Bench Verified72.0%Сам.

Communication

Multi-Challenge61.5%Сам.

Embodied

EmbSpatialBench0.84 / 100Сам.

Finance

MMLU-Pro86.7%Сам.
MMLU-ProX82.2%Сам.

General

MMLU-Redux94.0%Сам.
IFEval93.4%Сам.
C-Eval91.9%Сам.
Global PIQA88.4%Сам.
MAXIFE87.9%Сам.
MMMLU86.7%Сам.
MMMU83.9%Сам.
MMStar82.9%Сам.
Include82.8%Сам.
LiveCodeBench v678.9%Сам.
MMMU-Pro76.9%Сам.
IFBench76.1%Сам.
SimpleVQA0.62 / 100Сам.
LongBench v260.2%Сам.
NOVA-6358.6%Сам.

Grounding

RefCOCO-avg0.91 / 100Сам.
ScreenSpot Pro70.4%Сам.
RefSpatialBench0.69 / 100Сам.

Healthcare

VideoMMMU82.0%Сам.
SlakeVQA81.6%Сам.
MedXpertQA67.3%Сам.
PMC-VQA63.3%Сам.

Image To Text

OCRBench92.1%Сам.

Language

LingoQA80.8%Сам.
WMT24++78.3%Сам.

Long Context

MLVU87.3%Сам.
LVBench74.4%Сам.
AA-LCR66.9%Сам.
MMLongBench-Doc0.59 / 100Сам.

Math

HMMT 202591.4%Сам.
HMMT2590.3%Сам.
MathVista-Mini87.4%Сам.
MathVision86.2%Сам.
DynaMath85.9%Сам.
CodeForces0.85 / 3000Сам.
PolyMATH68.9%Сам.
Humanity's Last Exam47.5%Сам.

Multimodal

VLMsAreBlind96.7%Сам.
AI2D93.3%Сам.
V*93.2%Сам.
MMBench-V1.192.8%Сам.
OmniDocBench 1.589.8%Сам.
VideoMME w sub.87.3%Сам.
VideoMME w/o sub.83.9%Сам.
CC-OCR81.8%Сам.
CharXiv-R77.2%Сам.
MVBench76.6%Сам.
MMVU74.7%Сам.
BabyVision40.2%Сам.
ZEROBench-Sub0.36 / 100Сам.
Nuscene15.4%Сам.
ZEROBench0.09 / 100Сам.

Reasoning

CountBench0.97 / 100Сам.
BrowseComp-zh69.9%Сам.
Hallusion Bench67.6%Сам.
ERQA62.0%Сам.
Seal-044.1%Сам.
OJBench39.5%Сам.

Spatial Reasoning

RealWorldQA85.1%Сам.

Vision

ODinW44.5%Сам.

Индексы оценки AA

Math Index
52.3
Intelligence Index
7.1
Mmlu Pro
0.7
Aime 25
0.5
Gpqa
0.5
Livecodebench
0.4
Ifbench
0.3
Tau2
0.3
Scicode
0.2
Lcr
0.1
Hle
0.0
Terminalbench Hard
0.0

Оценки категорий LLM Stats

Legal
100
Finance
100
Agents
76
General
46
Reasoning
19
Biology
90
Image To Text
80
Instruction Following
80
Language
80
Math
80
Physics
80
Structured Output
80
Embodied
80
Grounding
80
Healthcare
80
Chemistry
80
Text-to-image
80
Video
80
Long Context
70
Multimodal
70
Spatial Reasoning
70
Frontend Development
70
Economics
70
Vision
70
Search
60
Code
60
Communication
60
Tool Calling
60
Spatial
20
3d
20

Цены

Цена вводаБесплатно
Цена выводаБесплатно
Смешанная цена (3:1)Бесплатно

Скорость

Токенов/сек0.0
Задержка первого токена0.00s
Время до первого ответа0.00s

Рейтинг цен провайдеров

Нет данных провайдеров

Внешние ссылки