Phi-3.5-vision-instruct

MicrosoftPhiオープンウエイトMIT · 商用利用可

説明

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.

リリース日

2024-08-23

パラメータ

4.2B

コンテキスト長

—

モダリティ

—

能力レーダー

general

coding

reasoning

science推定

agents

multimodal

専門的な科学ベンチマークが利用できない場合、Scienceは推論プロキシを使用して推定します。

ベンチマークスコア (LLM Stats)

General

MMMU

43.0%自己申告

Image To Text

TextVQA

72.0%自己申告

Math

ScienceQA

91.3%自己申告

MathVista

43.9%自己申告

InterGPS

36.3%自己申告

Multimodal

POPE

86.1%自己申告

MMBench

81.9%自己申告

ChartQA

81.8%自己申告

AI2D

78.1%自己申告

AA評価指数

AA評価データがありません

LLM Statsカテゴリスコア

Image To Text

Multimodal

Reasoning

Vision

Math

General

Healthcare

価格設定

価格データがありません

速度

速度データがありません

プロバイダー価格ランキング

プロバイダーデータがありません

外部リンク

LLM Stats Artificial Analysis