GLM-5.1 (Reasoning)

Z AIGLMOpen WeightMIT · Commercial OK

描述

GLM-5.1 is Z.AI's next-generation flagship foundation model designed for long-horizon agentic engineering tasks. Built on a 754B MoE architecture (40B active parameters), it can work continuously and autonomously on a single task for up to 8 hours, completing the full loop from planning and execution to iterative optimization and delivery. GLM-5.1 achieves state-of-the-art on SWE-Bench Pro (58.4) and demonstrates strong performance across coding, reasoning, and agentic benchmarks. It supports 200K context length, 128K max output tokens, thinking mode, function calling, structured output, context caching, and MCP integration. Overall performance is aligned with Claude Opus 4.6 with particular strengths in sustained execution and complex engineering optimization.

發布日期

2026-04-07

參數規模

754.0B

上下文長度

203K

支援模態

text

能力雷達圖

general

coding

reasoning

science估算

agents

multimodal

Science 在缺少專門科學評測時使用推理能力代理估算。

排行榜排名

領域	#排名	分數	來源
智能体与工具	21	67.0	LS
代码能力榜	40	75.0	AA
通用能力榜	9	90.0	AA
科学能力	33	76.0	AA

基準測試分數 (LLM Stats)

Agents

Vending-Bench 2

563441.0%自報

BrowseComp

79.3%自報

MCP Atlas

71.8%自報

TAU3-Bench

70.6%自報

Terminal-Bench 2.0

69.0%自報

CyberGym

68.7%自報

SWE-Bench Pro

58.4%自報

NL2Repo

42.7%自報

Toolathlon

40.7%自報

Biology

GPQA

86.2%自報

Math

AIME 2026

95.3%自報

HMMT 2025

94.0%自報

IMO-AnswerBench

83.8%自報

HMMT Feb 26

82.6%自報

Humanity's Last Exam

52.3%自報

AA 評測指數

Intelligence Index

51.4

Coding Index

43.4

Tau2

1.0

Gpqa

0.9

Ifbench

0.8

Lcr

0.6

Scicode

0.4

Terminalbench Hard

0.4

Hle

0.3

LLM Stats 分類評分

Agents

100

Reasoning

100

Biology

Chemistry

General

Physics

Math

Code

Safety

Tool Calling

Vision

Coding

定價

輸入價格$1.4 / 1M tokens

輸出價格$4.4 / 1M tokens

混合價格(3:1)$2.15 / 1M tokens

速度

Tokens/秒53.8 tokens/s

首Token延遲1.04s

首回答延遲71.55s

可用提供商

(LS 內部計價單位)

提供商	輸入價格	輸出價格
ZAI	1.4M	4.4M

外部連結

LLM Stats