Qwen3 VL 4B Instruct

AlibabaQwenОткрытые весаApache 2.0 · Коммерческое использование

Описание

Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding. The model supports up to a 1 million-token context window and excels at visual understanding, spatial reasoning, long video comprehension, and tool-based interaction. It can generate code from images, perform precise 2D/3D object grounding, and operate digital interfaces like a visual agent. The “Instruct” version rivals Gemini 2.5 Pro in perception benchmarks, while the “Thinking” version leads in multimodal reasoning and STEM tasks. With multilingual OCR, creative writing, and fine-grained scene interpretation, Qwen3-VL establishes a new open-source frontier for integrated vision-language intelligence.

Дата выхода

2025-10-14

Параметры

4.0B

Длина контекста

—

Модальности

image, text

Радар способностей

general

coding

reasoning

scienceоцен.

agents

multimodal

Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.

Рейтинги

Домен	#Место	Оценка	Источник
Агентные возможности	94	48.0	LS
Рейтинг кодинга	407	16.0	AA
Общий рейтинг	405	26.0	AA
Математическое мышление	237	37.0	AA
Мультимодальный рейтинг	63	72.0	LS
Рассуждения	95	48.0	LS
Наука	444	20.0	AA

Оценки бенчмарков (LLM Stats)

3d

BLINK

65.8%Сам.

Agents

BFCL-v3

63.3%Сам.

OSWorld

26.2%Сам.

Chemistry

SuperGPQA

40.3%Сам.

Communication

MM-MT-Bench

7.50 / 100Сам.

WritingBench

82.5%Сам.

Factuality

SimpleQA

48.0%Сам.

Finance

MMLU

77.2%Сам.

MMLU-Pro

67.1%Сам.

MMLU-ProX

59.4%Сам.

General

IFEval

82.3%Сам.

MMLU-Redux

81.5%Сам.

MLVU-M

75.3%Сам.

MMStar

69.8%Сам.

MMMU (val)

67.4%Сам.

Include

61.4%Сам.

LiveBench 20241125

60.9%Сам.

MMMU-Pro

53.2%Сам.

LiveCodeBench v6

37.9%Сам.

Grounding

ScreenSpot

94.0%Сам.

ScreenSpot Pro

59.5%Сам.

Healthcare

VideoMMMU

56.2%Сам.

Image To Text

OCRBench

88.1%Сам.

OCRBench-V2 (en)

63.7%Сам.

OCRBench-V2 (zh)

57.6%Сам.

Language

CharadesSTA

55.5%Сам.

Long Context

LVBench

56.2%Сам.

Math

MathVista-Mini

73.7%Сам.

MathVision

51.6%Сам.

AIME 2025

46.6%Сам.

HMMT25

30.7%Сам.

PolyMATH

28.8%Сам.

Multimodal

DocVQAtest

95.3%Сам.

MMBench-V1.1

85.1%Сам.

AI2D

84.1%Сам.

InfoVQAtest

80.3%Сам.

CharXiv-D

76.2%Сам.

CC-OCR

76.2%Сам.

MVBench

68.9%Сам.

MuirBench

63.8%Сам.

CharXiv-R

39.7%Сам.

Reasoning

Hallusion Bench

57.6%Сам.

ERQA

41.3%Сам.

Spatial Reasoning

RealWorldQA

70.9%Сам.

Vision

ODinW

48.2%Сам.

Индексы оценки AA

Math Index

37.0

Intelligence Index

4.1

Mmlu Pro

0.6

Gpqa

0.4

Aime 25

0.4

Ifbench

0.3

Livecodebench

0.3

Tau2

0.2

Scicode

0.1

Lcr

0.1

Hle

0.0

Terminalbench Hard

0.0

Оценки категорий LLM Stats

Communication

Multimodal

Instruction Following

Grounding

Creativity

Text-to-image

Writing

Image To Text

Language

Legal

Structured Output

Long Context

Math

Reasoning

Spatial Reasoning

Finance

General

Healthcare

Tool Calling

Video

Vision

Factuality

Physics

Agents

Chemistry

Economics

Цены

Цена вводаБесплатно

Цена выводаБесплатно

Смешанная цена (3:1)Бесплатно

Скорость

Токенов/сек0.0

Задержка первого токена0.00s

Время до первого ответа0.00s

Рейтинг цен провайдеров

1 провайдеров

ПровайдерВводВывод

1DeepInfra

Сравнение цен разных API-провайдеров для этой модели.

Внешние ссылки

LLM Stats Artificial Analysis