Phi-3.5-vision-instruct

MicrosoftPhi开源权重MIT · 商用许可

描述

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.

发布日期

2024-08-23

参数规模

4.2B

上下文长度

—

支持模态

—

能力雷达图

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域	#排名	分数	来源
多模态榜	34	80.0	LS

基准测试分数 (LLM Stats)

General

MMMU

43.0%自报

Image To Text

TextVQA

72.0%自报

Math

ScienceQA

91.3%自报

MathVista

43.9%自报

InterGPS

36.3%自报

Multimodal

POPE

86.1%自报

MMBench

81.9%自报

ChartQA

81.8%自报

AI2D

78.1%自报

AA 评测指数

暂无 AA 评测数据

LLM Stats 分类评分

Image To Text

Multimodal

Reasoning

Vision

Math

General

Healthcare

定价

暂无定价数据

速度

暂无速度数据

供应商价格排行

暂无提供商数据

外部链接

LLM Stats Artificial Analysis