Phi-4 Multimodal Instruct

MicrosoftPhi开源权重MIT · 商用许可

描述

Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety.

发布日期

2025-02-26

参数规模

5.6B

上下文长度

—

支持模态

image, text

能力雷达图

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域	#排名	分数	来源
代码能力榜	416	14.0	AA
通用能力榜	440	21.0	AA
数学推理	228	39.0	AA
多模态榜	28	82.0	LS
科学能力	453	17.0	AA

基准测试分数 (LLM Stats)

3d

BLINK

61.3%自报

General

MMMU

55.1%自报

MMMU-Pro

38.5%自报

Image To Text

DocVQA

93.2%自报

OCRBench

84.4%自报

TextVQA

75.6%自报

Math

MathVista

62.4%自报

InterGPS

48.6%自报

Multimodal

ScienceQA Visual

97.5%自报

MMBench

86.7%自报

POPE

85.6%自报

AI2D

82.3%自报

ChartQA

81.4%自报

InfoVQA

72.7%自报

Video-MME

55.0%自报

AA 评测指数

Intelligence Index

4.5

Math 500

0.7

Mmlu Pro

0.5

Gpqa

0.3

Livecodebench

0.1

Scicode

0.1

Aime

0.1

Hle

0.0

LLM Stats 分类评分

Image To Text

Multimodal

Reasoning

Vision

Math

Spatial Reasoning

Healthcare

General

定价

输入价格免费

输出价格免费

混合价格(3:1)免费

速度

Tokens/秒16.0

首Token延迟1.34s

首回答延迟1.34s

供应商价格排行

4 个供应商

最便宜: NanoGPT最贵: evroc

供应商输入输出

1NanoGPT最便宜

$0.07

$0.11

2Azure Cognitive Services

$0.08

$0.32

3Azure

$0.08

$0.32

4evroc

$0.24

$0.47

比较该模型在不同 API 供应商之间的定价。

外部链接

LLM Stats Artificial Analysis