Phi-3.5-vision-instruct

MicrosoftPhiOpen WeightMIT · Commercial OK

Описание

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.

Дата выхода

2024-08-23

Параметры

4.2B

Длина контекста

—

Модальности

—

Радар способностей

general

coding

reasoning

scienceоцен.

agents

multimodal

Science использует прокси на основе рассуждений, когда специализированные научные бенчмарки недоступны.

Рейтинги

Домен	#Место	Оценка	Источник
Multimodal Ranking	30	80.0	LS

Оценки бенчмарков (LLM Stats)

General

MMMU

43.0%Сам.

Image To Text

TextVQA

72.0%Сам.

Math

ScienceQA

91.3%Сам.

MathVista

43.9%Сам.

InterGPS

36.3%Сам.

Multimodal

POPE

86.1%Сам.

MMBench

81.9%Сам.

ChartQA

81.8%Сам.

AI2D

78.1%Сам.

Индексы оценки AA

Нет данных AA оценки

Оценки категорий LLM Stats

Vision

Image To Text

Multimodal

Reasoning

General

Healthcare

Math

Цены

Нет данных о ценах

Скорость

Нет данных о скорости

Доступные провайдеры

(Внутренние единицы LS)

Нет данных провайдеров

Внешние ссылки

LLM Stats