Phi-3.5-MoE-instruct

MicrosoftPhiOpen WeightMIT · Commercial OK

説明

Phi-3.5-MoE-instruct is a mixture-of-experts model with ~42B total parameters (6.6B active) and a 128K context window. It excels at reasoning, math, coding, and multilingual tasks, outperforming larger dense models in many benchmarks. It underwent a thorough safety post-training process (SFT + DPO) and is licensed under MIT. This model is ideal for scenarios where efficiency and high performance are both required, particularly in multi-lingual or reasoning-intensive tasks.

リリース日

2024-08-23

パラメータ

60.0B

コンテキスト長

—

モダリティ

—

能力レーダー

general

coding

reasoning

science推定

agents

multimodal

専門的な科学ベンチマークが利用できない場合、Scienceは推論プロキシを使用して推定します。

ベンチマークスコア (LLM Stats)

Biology

GPQA

36.8%自己申告

Code

RepoQA

85.0%自己申告

HumanEval

70.7%自己申告

Creativity

Social IQa

78.0%自己申告

Arena Hard

37.9%自己申告

Finance

MMLU

78.9%自己申告

TruthfulQA

77.5%自己申告

MMLU-Pro

45.3%自己申告

General

ARC-C

91.0%自己申告

OpenBookQA

89.6%自己申告

PIQA

88.6%自己申告

MBPP

0.81 / 100自己申告

MMMLU

69.9%自己申告

Language

BoolQ

84.6%自己申告

MEGA XStoryCloze

82.8%自己申告

Winogrande

81.3%自己申告

BIG-Bench Hard

79.1%自己申告

MEGA XCOPA

76.6%自己申告

MEGA TyDi QA

67.1%自己申告

MEGA MLQA

65.3%自己申告

MEGA UDPOS

60.4%自己申告

SQuALITY

24.1%自己申告

Long Context

RULER

87.1%自己申告

Qasper

40.0%自己申告

GovReport

26.4%自己申告

QMSum

19.9%自己申告

SummScreenFD

16.9%自己申告

Math

GSM8k

88.7%自己申告

MATH

59.5%自己申告

MGSM

58.7%自己申告

Reasoning

HellaSwag

83.8%自己申告

AA評価指数

AA評価データがありません

LLM Statsカテゴリスコア

Psychology

Code

Finance

General

Healthcare

Language

Legal

Math

Reasoning

Creativity

Long Context

Physics

Writing

Biology

Chemistry

Summarization

価格設定

価格データがありません

速度

速度データがありません

利用可能なプロバイダー

(LS内部単位)

プロバイダーデータがありません

外部リンク

LLM Stats

説明

能力レーダー

ランキング

ベンチマークスコア (LLM Stats)

Biology

Code

Creativity

Finance

General

Language

Long Context

Math

Reasoning

AA評価指数

LLM Statsカテゴリスコア

価格設定

速度

利用可能なプロバイダー

外部リンク