Phi-3.5-vision-instruct

MicrosoftPhi開源權重MIT · 商用許可

描述

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.

發布日期

2024-08-23

參數規模

4.2B

上下文長度

—

支援模態

—

能力雷達圖

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

領域	#排名	分數	來源
多模態榜	34	80.0	LS

基準測試分數 (LLM Stats)

General

MMMU

43.0%自報

Image To Text

TextVQA

72.0%自報

Math

ScienceQA

91.3%自報

MathVista

43.9%自報

InterGPS

36.3%自報

Multimodal

POPE

86.1%自報

MMBench

81.9%自報

ChartQA

81.8%自報

AI2D

78.1%自報

AA 評測指數

暫無 AA 評測資料

LLM Stats 分類評分

Image To Text

Multimodal

Reasoning

Vision

Math

General

Healthcare

定價

暫無定價資料

速度

暫無速度資料

供應商價格排行

暫無提供商資料

外部連結

LLM Stats Artificial Analysis