DeepSeek-V2.5

DeepSeekDeepSeek

描述

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.

發布日期

2024-09-06

參數規模

—

上下文長度

—

支援模態

—

能力雷達圖

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

領域	#排名	分數	來源
通用能力榜	506	10.0	AA
推理能力	53	69.0	LS

基準測試分數 (LLM Stats)

Code

HumanEval

89.0%自報

Aider

72.2%自報

SWE-Bench Verified

16.8%自報

Communication

MT-Bench

0.90 / 100自報

Creativity

AlignBench

80.4%自報

Arena Hard

76.2%自報

AlpacaEval 2.0

50.5%自報

Finance

MMLU

80.4%自報

General

DS-FIM-Eval

78.3%自報

LiveCodeBench(01-09)

41.8%自報

Language

BBH

84.3%自報

Math

GSM8k

95.1%自報

MATH

74.7%自報

Reasoning

HumanEval-Mul

73.8%自報

DS-Arena-Code

63.1%自報

AA 評測指數

Intelligence Index

6.6

LLM Stats 分類評分

Roleplay

Communication

Language

Legal

Math

Finance

General

Healthcare

Reasoning

Creativity

Writing

Code

Frontend Development

定價

輸入價格免費

輸出價格免費

混合價格(3:1)免費

速度

Tokens/秒0.0

首Token延遲0.00s

首回答延遲0.00s

供應商價格排行

暫無提供商資料

外部連結

Artificial Analysis