Phi-4 Multimodal Instruct

MicrosoftPhiオープンウエイトMIT · 商用利用可

説明

Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety.

リリース日

2025-02-26

パラメータ

5.6B

コンテキスト長

—

モダリティ

image, text

能力レーダー

general

coding

reasoning

science推定

agents

multimodal

専門的な科学ベンチマークが利用できない場合、Scienceは推論プロキシを使用して推定します。

ベンチマークスコア (LLM Stats)

3d

BLINK

61.3%自己申告

General

MMMU

55.1%自己申告

MMMU-Pro

38.5%自己申告

Image To Text

DocVQA

93.2%自己申告

OCRBench

84.4%自己申告

TextVQA

75.6%自己申告

Math

MathVista

62.4%自己申告

InterGPS

48.6%自己申告

Multimodal

ScienceQA Visual

97.5%自己申告

MMBench

86.7%自己申告

POPE

85.6%自己申告

AI2D

82.3%自己申告

ChartQA

81.4%自己申告

InfoVQA

72.7%自己申告

Video-MME

55.0%自己申告

AA評価指数

Intelligence Index

4.5

Math 500

0.7

Mmlu Pro

0.5

Gpqa

0.3

Livecodebench

0.1

Scicode

0.1

Aime

0.1

Hle

0.0

LLM Statsカテゴリスコア

Image To Text

Multimodal

Reasoning

Vision

Math

Spatial Reasoning

Healthcare

General

価格設定

入力価格無料

出力価格無料

混合価格（3:1）無料

速度

トークン/秒16.0

初トークン遅延1.34s

初回答遅延1.34s

プロバイダー価格ランキング

4 プロバイダー

最安: NanoGPT最高: evroc

プロバイダー入力出力

1NanoGPT最安

$0.07

$0.11

2Azure Cognitive Services

$0.08

$0.32

3Azure

$0.08

$0.32

4evroc

$0.24

$0.47

このモデルの異なるAPIプロバイダー間の価格を比較。

外部リンク

LLM Stats Artificial Analysis

ドメイン	#順位	スコア	ソース
コーディングランキング	416	14.0	AA
総合ランキング	440	21.0	AA
数学的推論	228	39.0	AA
マルチモーダルランキング	28	82.0	LS
科学	453	17.0	AA