跳轉到主要內容

GLM-5.1 (Reasoning)

Z AIGLMOpen WeightMIT · Commercial OK

描述

GLM-5.1 is Z.AI's next-generation flagship foundation model designed for long-horizon agentic engineering tasks. Built on a 754B MoE architecture (40B active parameters), it can work continuously and autonomously on a single task for up to 8 hours, completing the full loop from planning and execution to iterative optimization and delivery. GLM-5.1 achieves state-of-the-art on SWE-Bench Pro (58.4) and demonstrates strong performance across coding, reasoning, and agentic benchmarks. It supports 200K context length, 128K max output tokens, thinking mode, function calling, structured output, context caching, and MCP integration. Overall performance is aligned with Claude Opus 4.6 with particular strengths in sustained execution and complex engineering optimization.

發布日期
2026-04-07
參數規模
754.0B
上下文長度
203K
支援模態
text

能力雷達圖

46
general
43
coding
87
reasoning
60
science估算
60
agents
0
multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

領域#排名分數來源
智能体与工具21
67.0
LS
代码能力榜40
75.0
AA
通用能力榜9
90.0
AA
科学能力33
76.0
AA

基準測試分數 (LLM Stats)

Agents

Vending-Bench 2563441.0%自報
BrowseComp79.3%自報
MCP Atlas71.8%自報
TAU3-Bench70.6%自報
Terminal-Bench 2.069.0%自報
CyberGym68.7%自報
SWE-Bench Pro58.4%自報
NL2Repo42.7%自報
Toolathlon40.7%自報

Biology

GPQA86.2%自報

Math

AIME 202695.3%自報
HMMT 202594.0%自報
IMO-AnswerBench83.8%自報
HMMT Feb 2682.6%自報
Humanity's Last Exam52.3%自報

AA 評測指數

Intelligence Index
51.4
Coding Index
43.4
Tau2
1.0
Gpqa
0.9
Ifbench
0.8
Lcr
0.6
Scicode
0.4
Terminalbench Hard
0.4
Hle
0.3

LLM Stats 分類評分

Agents
100
Reasoning
100
Biology
90
Chemistry
90
General
90
Physics
90
Math
80
Search
80
Code
70
Safety
70
Tool Calling
60
Vision
50
Coding
40

定價

輸入價格$1.4 / 1M tokens
輸出價格$4.4 / 1M tokens
混合價格(3:1)$2.15 / 1M tokens

速度

Tokens/秒53.8 tokens/s
首Token延遲1.04s
首回答延遲71.55s

可用提供商

(LS 內部計價單位)
提供商輸入價格輸出價格
ZAI1.4M4.4M

外部連結