DeepSeek-V2.5 (Dec '24)

DeepSeekDeepSeek开源权重deepseek

描述

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.

发布日期

2024-12-10

参数规模

236.0B

上下文长度

—

支持模态

text

能力雷达图

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少专门科学评测时使用推理能力代理估算。

排行榜排名

领域	#排名	分数	来源
通用能力榜	505	10.0	AA
数学推理	104	75.0	AA

基准测试分数 (LLM Stats)

Code

HumanEval

89.0%自报

Aider

72.2%自报

SWE-Bench Verified

16.8%自报

Communication

MT-Bench

0.90 / 100自报

Creativity

AlignBench

80.4%自报

Arena Hard

76.2%自报

AlpacaEval 2.0

50.5%自报

Finance

MMLU

80.4%自报

General

DS-FIM-Eval

78.3%自报

LiveCodeBench(01-09)

41.8%自报

Language

BBH

84.3%自报

Math

GSM8k

95.1%自报

MATH

74.7%自报

Reasoning

HumanEval-Mul

73.8%自报

DS-Arena-Code

63.1%自报

AA 评测指数

Intelligence Index

6.8

Math 500

0.8

LLM Stats 分类评分

Roleplay

Communication

Language

Legal

Math

Finance

General

Healthcare

Reasoning

Creativity

Writing

Code

Frontend Development

定价

输入价格免费

输出价格免费

混合价格(3:1)免费

速度

Tokens/秒0.0

首Token延迟0.00s

首回答延迟0.00s

供应商价格排行

暂无提供商数据

外部链接

LLM Stats Artificial Analysis