Qwen3 VL 8B Instruct

AlibabaQwenОткрытые весаApache 2.0 · Коммерческое использование

Описание

Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding. The model supports up to a 1 million-token context window and excels at visual understanding, spatial reasoning, long video comprehension, and tool-based interaction. It can generate code from images, perform precise 2D/3D object grounding, and operate digital interfaces like a visual agent. The “Instruct” version rivals Gemini 2.5 Pro in perception benchmarks, while the “Thinking” version leads in multimodal reasoning and STEM tasks. With multilingual OCR, creative writing, and fine-grained scene interpretation, Qwen3-VL establishes a new open-source frontier for integrated vision-language intelligence.

Дата выхода

2025-10-14

Параметры

9.0B

Длина контекста

131K

Модальности

image, text, video

Радар способностей

general

coding

reasoning

scienceоцен.

agents

100

multimodal

Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.

Рейтинги

Домен	#Место	Оценка	Источник
Агентные возможности	73	53.0	LS
Рейтинг кодинга	372	20.0	AA
Общий рейтинг	351	32.0	AA
Математическое мышление	273	28.0	AA
Мультимодальный рейтинг	55	74.0	LS
Рассуждения	88	52.0	LS
Наука	419	24.0	AA

Оценки бенчмарков (LLM Stats)

3d

BLINK

69.1%Сам.

Agents

BFCL-v3

66.3%Сам.

OSWorld

33.9%Сам.

Chemistry

SuperGPQA

44.5%Сам.

Communication

MM-MT-Bench

7.70 / 100Сам.

WritingBench

83.1%Сам.

Multi-IF

75.1%Сам.

Finance

MMLU

80.7%Сам.

MMLU-Pro

71.6%Сам.

MMLU-ProX

65.4%Сам.

General

MMLU-Redux

84.9%Сам.

IFEval

83.7%Сам.

MLVU-M

78.1%Сам.

MMStar

70.9%Сам.

MMMU (val)

69.6%Сам.

Include

67.0%Сам.

LiveBench 20241125

62.0%Сам.

MMMU-Pro

55.9%Сам.

LiveCodeBench v6

39.3%Сам.

Grounding

ScreenSpot

94.4%Сам.

ScreenSpot Pro

54.6%Сам.

Healthcare

VideoMMMU

65.3%Сам.

Image To Text

OCRBench

89.6%Сам.

OCRBench-V2 (en)

65.4%Сам.

OCRBench-V2 (zh)

61.2%Сам.

Language

CharadesSTA

56.0%Сам.

Long Context

LVBench

58.0%Сам.

Math

MathVista-Mini

77.2%Сам.

MathVision

53.9%Сам.

AIME 2025

45.9%Сам.

HMMT25

32.5%Сам.

PolyMATH

30.4%Сам.

Multimodal

DocVQAtest

96.1%Сам.

AI2D

85.7%Сам.

MMBench-V1.1

85.0%Сам.

InfoVQAtest

83.1%Сам.

CharXiv-D

83.0%Сам.

CC-OCR

79.9%Сам.

Video-MME

71.4%Сам.

MVBench

68.7%Сам.

MuirBench

64.4%Сам.

CharXiv-R

46.4%Сам.

Reasoning

Hallusion Bench

61.1%Сам.

ERQA

45.8%Сам.

Spatial Reasoning

RealWorldQA

71.5%Сам.

Vision

ODinW

44.7%Сам.

Индексы оценки AA

Math Index

27.3

Intelligence Index

8.4

Mmlu Pro

0.7

Gpqa

0.4

Livecodebench

0.3

Ifbench

0.3

Tau2

0.3

Aime 25

0.3

Scicode

0.2

Lcr

0.2

Hle

0.0

Terminalbench Hard

0.0

Оценки категорий LLM Stats

Communication

Multimodal

100

Instruction Following

Structured Output

Creativity

Text-to-image

Writing

Image To Text

Language

Legal

Finance

Grounding

Healthcare

Tool Calling

Vision

Long Context

Math

Reasoning

Spatial Reasoning

General

Video

Agents

Physics

Chemistry

Economics

Цены

Цена ввода$0.18 / 1M токенов

Цена вывода$0.7 / 1M токенов

Смешанная цена (3:1)$0.31 / 1M токенов

Скорость

Токенов/сек140.8

Задержка первого токена0.94s

Время до первого ответа0.94s

Рейтинг цен провайдеров

9 провайдеров

Самый дешевый: NovitaСамый дорогой: SiliconFlow

ПровайдерВводВывод

1NovitaСамый дешевый

2DeepInfra

3OpenRouter

$0.08

$0.5

4NovitaAI

$0.08

$0.5

5Kilo Gateway

$0.08

$0.5

6LLM Gateway

$0.08

$0.5

7AlibabaОсновной

$0.18

$0.7

8SiliconFlow (China)

$0.18

$0.68

9SiliconFlow

$0.18

$0.68

Сравнение цен разных API-провайдеров для этой модели.

Внешние ссылки

LLM Stats Artificial Analysis