Перейти к основному содержанию

Qwen Chat 14B

AlibabaQwen
Дата выхода
2023-09-25
Параметры
Длина контекста
262K
Модальности
audio, image, text, video

Радар способностей

2
general
60
coding
80
reasoning
77
scienceоцен.
60
agents
80
multimodal

Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.

Рейтинги

Домен#МестоОценкаИсточник
Общий рейтинг533
2.0
AA

Оценки бенчмарков (LLM Stats)

3d

SUNRGBD0.36 / 100Сам.
Hypersim0.13 / 100Сам.

Agents

GDPval-AA985.00 / 3000Сам.
t2-bench79.5%Сам.
BFCL-V472.2%Сам.
AndroidWorld_SR66.4%Сам.
BrowseComp63.8%Сам.
FullStackBench en62.6%Сам.
WideSearch60.5%Сам.
FullStackBench zh58.7%Сам.
OSWorld-Verified58.0%Сам.
TIR-Bench53.2%Сам.
Terminal-Bench 2.049.4%Сам.
VITA-Bench33.6%Сам.
DeepPlanning24.1%Сам.

Biology

GPQA86.6%Сам.

Chemistry

SuperGPQA67.1%Сам.

Code

SWE-Bench Verified72.0%Сам.

Communication

Multi-Challenge61.5%Сам.

Embodied

EmbSpatialBench0.84 / 100Сам.

Finance

MMLU-Pro86.7%Сам.
MMLU-ProX82.2%Сам.

General

MMLU-Redux94.0%Сам.
IFEval93.4%Сам.
C-Eval91.9%Сам.
Global PIQA88.4%Сам.
MAXIFE87.9%Сам.
MMMLU86.7%Сам.
MMMU83.9%Сам.
MMStar82.9%Сам.
Include82.8%Сам.
LiveCodeBench v678.9%Сам.
MMMU-Pro76.9%Сам.
IFBench76.1%Сам.
SimpleVQA0.62 / 100Сам.
LongBench v260.2%Сам.
NOVA-6358.6%Сам.

Grounding

RefCOCO-avg0.91 / 100Сам.
ScreenSpot Pro70.4%Сам.
RefSpatialBench0.69 / 100Сам.

Healthcare

VideoMMMU82.0%Сам.
SlakeVQA81.6%Сам.
MedXpertQA67.3%Сам.
PMC-VQA63.3%Сам.

Image To Text

OCRBench92.1%Сам.

Language

LingoQA80.8%Сам.
WMT24++78.3%Сам.

Long Context

MLVU87.3%Сам.
LVBench74.4%Сам.
AA-LCR66.9%Сам.
MMLongBench-Doc0.59 / 100Сам.

Math

HMMT 202591.4%Сам.
HMMT2590.3%Сам.
MathVista-Mini87.4%Сам.
MathVision86.2%Сам.
DynaMath85.9%Сам.
CodeForces0.85 / 3000Сам.
PolyMATH68.9%Сам.
Humanity's Last Exam47.5%Сам.

Multimodal

VLMsAreBlind96.7%Сам.
AI2D93.3%Сам.
V*93.2%Сам.
MMBench-V1.192.8%Сам.
OmniDocBench 1.589.8%Сам.
VideoMME w sub.87.3%Сам.
VideoMME w/o sub.83.9%Сам.
CC-OCR81.8%Сам.
CharXiv-R77.2%Сам.
MVBench76.6%Сам.
MMVU74.7%Сам.
BabyVision40.2%Сам.
ZEROBench-Sub0.36 / 100Сам.
Nuscene15.4%Сам.
ZEROBench0.09 / 100Сам.

Reasoning

CountBench0.97 / 100Сам.
BrowseComp-zh69.9%Сам.
Hallusion Bench67.6%Сам.
ERQA62.0%Сам.
Seal-044.1%Сам.
OJBench39.5%Сам.

Spatial Reasoning

RealWorldQA85.1%Сам.

Vision

ODinW44.5%Сам.

Индексы оценки AA

Intelligence Index
2.1

Оценки категорий LLM Stats

Legal
100
Finance
100
Agents
76
General
46
Reasoning
19
Biology
90
Image To Text
80
Instruction Following
80
Language
80
Math
80
Physics
80
Structured Output
80
Embodied
80
Grounding
80
Healthcare
80
Chemistry
80
Text-to-image
80
Video
80
Long Context
70
Multimodal
70
Spatial Reasoning
70
Frontend Development
70
Economics
70
Vision
70
Search
60
Code
60
Communication
60
Tool Calling
60
Spatial
20
3d
20

Цены

Цена вводаБесплатно
Цена выводаБесплатно
Смешанная цена (3:1)Бесплатно

Скорость

Токенов/сек0.0
Задержка первого токена0.00s
Время до первого ответа0.00s

Рейтинг цен провайдеров

Нет данных провайдеров

Внешние ссылки