fix(litellm): 修复 GLM 限额后 DeepSeek 兜底 thinking 兼容

mudssky · mudssky · commit eaefc9da6806 · 2026-05-08T10:07:11.000+08:00
GLM Coding Plan 用量耗尽后，Claude Code 请求会经 LiteLLM 降级到 DeepSeek Anthropic 兼容端点。跨供应商 fallback 时，历史消息里的 thinking/tool 块不一定满足 DeepSeek 对 thinking 模式的完整回传校验，因此兜底路由主动丢弃 thinking 与 reasoning_effort，并启用 modify_params，让请求优先不中断。

同时记录 DeepSeek 官方 CLAUDE_CODE_EFFORT_LEVEL=max 与 output_config.effort 的关系，以及两级 DeepSeek fallback 的备选方案；当前暂不采用两级路由，避免配置复杂度和失败重试延迟。

Constraint: DeepSeek Anthropic 兼容端点在 thinking 模式下要求完整 thinking 历史，GLM 429 fallback 无法保证历史块跨供应商完全兼容
Rejected: 两级 DeepSeek fallback | 可降低 thinking 能力影响，但增加路由复杂度与一次失败重试延迟
Confidence: medium
Scope-risk: narrow
Directive: 不要在 GLM 主路由全局丢弃 thinking；DeepSeek 直连若要吃满官方 thinking 能力，应使用独立路由
Tested: 此前已完成 YAML 解析、newapi/local 配置同步差异检查、pnpm qa、git diff --check
Not-tested: 用户明确配置改动不用再次检查；未重启 LiteLLM 容器验证实时 429 fallback
diff --git a/.trellis/spec/infra/index.md b/.trellis/spec/infra/index.md
@@ -0,0 +1,12 @@
+# Infra Integration Guidelines
+
+> 本目录记录仓库内基础设施集成的可执行约定，尤其是本地网关、模型路由、环境变量和跨供应商兼容边界。
+
+---
+
+## Guidelines Index
+
+| Guide | Description | Status |
+|-------|-------------|--------|
+| [LiteLLM Gateway](./litellm-gateway.md) | LiteLLM 路由、fallback、参数兼容和验证边界 | Active |
+
diff --git a/.trellis/spec/infra/litellm-gateway.md b/.trellis/spec/infra/litellm-gateway.md
@@ -0,0 +1,140 @@
+# LiteLLM Gateway Spec
+
+> 本规范记录 `ai/gateway/litellm` 的路由与跨供应商兼容约定。修改 LiteLLM 配置、Claude Code 网关入口或模型 fallback 时必须先阅读。
+
+---
+
+## Scenario: Claude Code GLM 429 Fallback 到 DeepSeek
+
+### 1. Scope / Trigger
+
+- Trigger: 修改 `ai/gateway/litellm/*.yaml` 中 Claude Code GLM 入口、DeepSeek 兜底别名、`router_settings.fallbacks`、`additional_drop_params` 或 `litellm_settings.modify_params`。
+- Scope: `cc-glmplan-opus` / `cc-glmplan-haiku` 主路由优先使用智谱 GLM Coding Plan；GLM 返回 429 或 LiteLLM `RateLimitError` 后短重试，仍失败才 fallback 到 DeepSeek Anthropic 兼容端点。
+- Design intent: 主路由尽量保留 Claude Code extended thinking；兜底路由优先保证请求不中断。
+
+### 2. Signatures
+
+- Client-facing model names:
+  - `cc-glmplan-opus`
+  - `cc-glmplan-haiku`
+  - `claude-code-deepseek-v4-pro`
+  - `claude-code-deepseek-v4-flash`
+- Provider model mapping:
+  - `cc-glmplan-*` -> `anthropic/GLM-5.1`
+  - `claude-code-deepseek-v4-pro` -> `anthropic/deepseek-v4-pro[1m]`
+  - `claude-code-deepseek-v4-flash` -> `anthropic/deepseek-v4-flash`
+- Router fallback contract:
+  - `cc-glmplan-opus` -> `claude-code-deepseek-v4-pro`
+  - `cc-glmplan-haiku` -> `claude-code-deepseek-v4-flash`
+
+### 3. Contracts
+
+- Required environment keys:
+  - `Z_AI_ANTHROPIC_API_BASE`: 智谱 Anthropic 兼容端点。
+  - `Z_AI_API_KEY`: 智谱 Coding Plan 密钥。
+  - `DEEPSEEK_ANTHROPIC_API_BASE`: DeepSeek Anthropic 兼容端点。
+  - `DEEPSEEK_API_KEY`: DeepSeek 密钥。
+  - `LITELLM_MASTER_KEY`: LiteLLM 对外鉴权密钥。
+- Fallback-only parameter policy:
+  - DeepSeek 兜底别名必须显式丢弃 `thinking` 与 `reasoning_effort`。
+  - DeepSeek 官方 Claude Code 直连配置推荐 `CLAUDE_CODE_EFFORT_LEVEL=max`；在 Anthropic 兼容接口里，DeepSeek 的 effort 语义对应 `output_config.effort`，不是 OpenAI 兼容接口里的 `reasoning_effort`。
+  - 不得把 `output_config` 加入 DeepSeek 兜底别名的 `additional_drop_params`；当前只丢弃 `thinking` 与 `reasoning_effort`，避免误伤 DeepSeek 官方 Anthropic effort 参数。
+  - 丢弃 `thinking` 会让 DeepSeek 兜底不再显式请求 extended thinking；这是用 fallback 质量上限换取 GLM 429 后链路不中断。
+  - 丢弃范围只绑定到 DeepSeek 兜底别名；不得在 GLM 主路由上全局禁用 Claude Code thinking。
+  - 如果 `claude-code-deepseek-*` 被直接调用，也会应用同一丢弃策略；因此该别名应被视为 fallback/兼容专用入口。
+- LiteLLM settings:
+  - `drop_params: true` 用于丢弃上游不识别的普通参数。
+  - `modify_params: true` 用于允许 LiteLLM 修正 Anthropic tool/thinking 历史块兼容问题。
+
+### 4. Validation & Error Matrix
+
+| Condition | Expected Behavior |
+|-----------|-------------------|
+| GLM 正常可用 | `cc-glmplan-*` 直接走 GLM，保留 Claude Code thinking 语义 |
+| GLM 返回 429 / `RateLimitError` | LiteLLM 先按 retry policy 短重试 |
+| GLM 短重试耗尽 | Router fallback 到对应 `claude-code-deepseek-*` |
+| DeepSeek 收到 `thinking` / `reasoning_effort` | 配置必须在兜底别名处提前丢弃这些参数 |
+| DeepSeek 收到 `output_config.effort` | 不应通过 `additional_drop_params` 丢弃；这是 DeepSeek Anthropic 兼容接口承接 `CLAUDE_CODE_EFFORT_LEVEL=max` 的官方字段 |
+| 历史消息缺少完整 `thinking_blocks` | `modify_params` 允许 LiteLLM 做兼容修正，避免 fallback 被 Anthropic 兼容端点拒绝 |
+| GLM 与 DeepSeek 都失败 | LiteLLM 将最终错误返回给 Claude Code，不伪装成功 |
+
+### 5. Good/Base/Bad Cases
+
+- Good: GLM 429 后切到 DeepSeek，DeepSeek 不接收 `thinking` / `reasoning_effort`，请求以普通非-thinking 模式继续完成。
+- Good: DeepSeek 兜底别名不丢弃 `output_config.effort`；如果 Claude Code / LiteLLM 以 DeepSeek Anthropic 官方字段表达 effort，`CLAUDE_CODE_EFFORT_LEVEL=max` 仍有机会透传。
+- Base: GLM 正常响应时不触发 fallback，不改变 Claude Code 对 GLM 主路由的 thinking 使用方式。
+- Bad: 全局丢弃 `thinking`，导致 GLM 主路由也失去 Claude Code extended thinking 能力。
+- Bad: DeepSeek 兜底别名保留 `thinking`，fallback 后报 `content[].thinking` / `thinking_blocks` 相关 `invalid_request_error`。
+- Bad: 看到 DeepSeek 官方推荐 `CLAUDE_CODE_EFFORT_LEVEL=max` 后，把 fallback 别名改成保留 `thinking`；直连 DeepSeek 与跨供应商 fallback 的历史消息完整性不同，不能混为一谈。
+
+### 6. Tests Required
+
+- Config parse: YAML 必须能被项目现有解析方式读取。
+- Config sync: 如果 `newapi.yaml` 与 `litellm.local.yaml` 应保持一致，修改后需要确认两者没有非预期差异。
+- Route contract: 检查 `router_settings.fallbacks` 仍指向专用 DeepSeek 兜底别名。
+- Parameter contract: 检查 `additional_drop_params` 只出现在 DeepSeek 兜底别名或其它明确的兼容专用路由上。
+- Runtime note: 真实 429 fallback 依赖上游额度、密钥和实时响应；本地配置验证不能证明线上额度恢复或供应商端协议行为。
+
+### 7. Wrong vs Correct
+
+#### Wrong
+
+```yaml
+litellm_settings:
+  drop_params: true
+
+model_list:
+  - model_name: "cc-glmplan-opus"
+    litellm_params:
+      model: "anthropic/GLM-5.1"
+  - model_name: "claude-code-deepseek-v4-pro"
+    litellm_params:
+      model: "anthropic/deepseek-v4-pro[1m]"
+```
+
+问题：DeepSeek 兜底仍可能收到 Claude Code extended thinking 参数；跨供应商 fallback 时，历史消息缺少完整 `thinking_blocks` 会触发 `invalid_request_error`。
+
+#### Correct
+
+```yaml
+model_list:
+  - model_name: "claude-code-deepseek-v4-pro"
+    litellm_params:
+      model: "anthropic/deepseek-v4-pro[1m]"
+      additional_drop_params:
+        - reasoning_effort
+        - thinking
+
+litellm_settings:
+  drop_params: true
+  modify_params: true
+```
+
+理由：DeepSeek 兜底别名是降级链路专用入口，优先保证 GLM 429 后可用；主 GLM 路由仍保留 Claude Code extended thinking。
+
+#### DeepSeek effort vs thinking
+
+```yaml
+model_list:
+  - model_name: "claude-code-deepseek-v4-pro"
+    litellm_params:
+      model: "anthropic/deepseek-v4-pro[1m]"
+      additional_drop_params:
+        - reasoning_effort
+        - thinking
+        # 不要加入 output_config；DeepSeek Anthropic 兼容接口用它承接 effort。
+```
+
+结论：`CLAUDE_CODE_EFFORT_LEVEL=max` 是 DeepSeek 官方 Claude Code 直连推荐配置；在 DeepSeek Anthropic 兼容接口里，effort 对应 `output_config.effort`。兜底配置丢弃 `reasoning_effort` 主要影响 OpenAI 风格参数，丢弃 `thinking` 则会关闭显式 extended thinking。该取舍只应用于跨供应商 fallback，因为 GLM 生成的历史 `thinking` 块不一定满足 DeepSeek/Anthropic 兼容端点对完整 thinking 历史的校验。
+
+#### Deferred option: two-stage DeepSeek fallback
+
+```yaml
+router_settings:
+  fallbacks:
+    - cc-glmplan-opus:
+        - claude-code-deepseek-v4-pro
+        - claude-code-deepseek-v4-pro-safe
+```
+
+说明：可以先尝试完整 DeepSeek 路由，再 fallback 到丢弃 `thinking` / `reasoning_effort` 的 safe 路由，以尽量保留 DeepSeek 官方 thinking 能力。但 LiteLLM YAML 不能按 DeepSeek 返回的精确错误文本改写同一请求后重放；两级路由会增加配置复杂度和一次失败重试延迟。当前策略选择直接让 DeepSeek 兜底路由进入 safe 模式，优先保证 GLM 429 后 Claude Code 不被中断。
diff --git a/ai/coding/claude/README.md b/ai/coding/claude/README.md
@@ -222,7 +222,7 @@ pwsh -NoProfile -File ./ai/coding/claude/Sync-ClaudeConfig.ps1
 }
 ```
 
-这里的 key 是 LiteLLM 网关 `LITELLM_MASTER_KEY`，上游智谱和 DeepSeek 密钥仍放在 `ai/gateway/litellm/.env.local`。GLM 额度耗尽时 LiteLLM 会自动降级到 DeepSeek，并在冷却 1 小时后重新尝试 GLM。
+这里的 key 是 LiteLLM 网关 `LITELLM_MASTER_KEY`，上游智谱和 DeepSeek 密钥仍放在 `ai/gateway/litellm/.env.local`。GLM 额度耗尽时 LiteLLM 会先短重试，再自动降级到 DeepSeek，并在冷却 1 小时后重新尝试 GLM。DeepSeek 兜底会丢弃 Claude Code 的 extended thinking 参数，避免跨供应商 fallback 时因为历史消息缺少 `thinking_blocks` 被兼容端点拒绝；`CLAUDE_CODE_EFFORT_LEVEL=max` 在 DeepSeek Anthropic 官方接口中对应 `output_config.effort`，当前兜底配置不会丢弃 `output_config`，但会牺牲显式 `thinking` 模式以保证 fallback 可用。如果 GLM 重试和 DeepSeek 兜底都失败，最终错误仍会返回给 Claude Code。
 
 ### 场景 4：切换到新机器
 
diff --git a/ai/gateway/litellm/litellm.md b/ai/gateway/litellm/litellm.md
@@ -180,6 +180,8 @@ curl http://127.0.0.1:34000/v1/chat/completions `
 - `ANTHROPIC_API_KEY` 使用 LiteLLM 的 `LITELLM_MASTER_KEY`，不要填写上游真实密钥。
 - `cc-glmplan-opus` 先走智谱 `GLM-5.1`；遇到 429 / `RateLimitError` 时由 LiteLLM 网关短重试，仍失败后会降级到 `claude-code-deepseek-v4-pro`。
 - `cc-glmplan-haiku` 先走智谱 `GLM-5.1`；遇到 429 / `RateLimitError` 时由 LiteLLM 网关短重试，仍失败后会降级到 `claude-code-deepseek-v4-flash`。
+- DeepSeek 兜底路由会丢弃 `thinking` / `reasoning_effort` 这类 extended thinking 参数，并启用 LiteLLM 的 `modify_params` 兼容修正；这能避免跨供应商 fallback 时因为历史消息缺少 `thinking_blocks` 被 Anthropic 兼容端点拒绝。
+- DeepSeek 官方 Claude Code 直连配置推荐 `CLAUDE_CODE_EFFORT_LEVEL=max`；在 DeepSeek Anthropic 兼容接口里，effort 对应 `output_config.effort`。当前兜底只丢弃 `thinking` / `reasoning_effort`，不丢弃 `output_config`，避免误伤官方 effort 字段；代价是 fallback 不再显式请求 extended thinking。
 - GLM 两个 Claude Code 入口的 `cooldown_time=3600` 表示失败后冷却 1 小时；冷却结束后下一次请求会重新尝试 GLM，如果额度恢复就会切回 GLM。
 - 429 无感只覆盖这两个 Claude Code GLM 入口；如果 GLM 重试和 DeepSeek fallback 全部失败，LiteLLM 仍会把最终错误返回给 Claude Code / 客户端。
 
@@ -250,6 +252,9 @@ curl "$env:Z_AI_CODING_API_BASE/models" `
 - `cc-glmplan-opus`：为 Claude Code 提供稳定主入口，优先走智谱 `GLM-5.1` 的 Anthropic 兼容端点。
 - `cc-glmplan-haiku`：为 Claude Code Haiku / subagent 流量提供独立入口，优先走智谱 `GLM-5.1` 的 Anthropic 兼容端点。
 - `claude-code-deepseek-v4-pro` / `claude-code-deepseek-v4-flash`：作为 GLM 额度耗尽或临时不可用时的 DeepSeek 兜底路由。
+- `additional_drop_params`：只在 DeepSeek 兜底路由上丢弃 `thinking` / `reasoning_effort`，优先保证 429 后请求能继续完成；GLM 主入口仍保留 Claude Code 传入的 thinking 语义。
+- DeepSeek effort 兼容：不要把 `output_config` 加入 DeepSeek 兜底路由的丢弃参数；DeepSeek Anthropic 兼容接口使用 `output_config.effort` 承接 Claude Code 的 `CLAUDE_CODE_EFFORT_LEVEL=max`。
+- `litellm_settings.modify_params`：允许 LiteLLM 对 Anthropic tool/thinking 消息做兼容修正；当历史消息缺少必要的 `thinking_blocks` 时，本轮兜底请求会降级掉不安全的 thinking 参数。
 - `router_settings.num_retries` / `retry_policy.RateLimitErrorRetries`：让 Claude Code GLM 入口的瞬时 429 先在网关内短重试，避免直接把限流错误透给客户端。
 - `router_settings.fallbacks`：把 Claude Code 的 GLM 主入口分别降级到对应 DeepSeek 路由，并通过 GLM 部署自己的 1 小时冷却实现周期性恢复探测。
 - `GLM-*` fallback：对智谱 Coding Plan 已存在但未显式注册的 GLM 官方模型保留透传能力，同时避免误落到 NewAPI。
diff --git a/ai/gateway/litellm/newapi.yaml b/ai/gateway/litellm/newapi.yaml
@@ -68,6 +68,10 @@ model_list:
       api_key: "os.environ/DEEPSEEK_API_KEY"              # 从环境变量读 DeepSeek 密钥
       # 兜底时保持用户原先 Claude Code 的 pro 默认模型。
       model: "anthropic/deepseek-v4-pro[1m]"
+      # DeepSeek 兜底优先接住 GLM 429；跨供应商时不强制保留 Claude Code extended thinking 参数。
+      additional_drop_params:
+        - reasoning_effort
+        - thinking
   # DeepSeek 轻量兜底给 Haiku / subagent 使用，避免主模型降级后小任务也占用 pro。
   - model_name: "claude-code-deepseek-v4-flash"
     litellm_params:
@@ -118,6 +122,8 @@ litellm_settings:
   telemetry: false
   # 对上游不识别的参数自动丢弃，减少不同模型间的参数兼容性噪音。
   drop_params: true
+  # 允许 LiteLLM 修正 Anthropic tool/thinking 历史块不完整的问题，避免 DeepSeek 兜底被兼容性错误拦截。
+  modify_params: true
   # 默认重试 2 次，兼顾临时抖动恢复与整体响应时延。
   num_retries: 2
   # 统一请求超时为 60 秒，覆盖常规对话模型与较慢的推理模型。

Original file line number	Diff line number	Diff line change
`@@ -222,7 +222,7 @@ pwsh -NoProfile -File ./ai/coding/claude/Sync-ClaudeConfig.ps1`
`222`	`222`	`}`
`223`	`223`	```
`224`	`224`
`225`		-这里的 key 是 LiteLLM 网关 `LITELLM_MASTER_KEY`，上游智谱和 DeepSeek 密钥仍放在 `ai/gateway/litellm/.env.local`。GLM 额度耗尽时 LiteLLM 会自动降级到 DeepSeek，并在冷却 1 小时后重新尝试 GLM。
	`225`	+这里的 key 是 LiteLLM 网关 `LITELLM_MASTER_KEY`，上游智谱和 DeepSeek 密钥仍放在 `ai/gateway/litellm/.env.local`。GLM 额度耗尽时 LiteLLM 会先短重试，再自动降级到 DeepSeek，并在冷却 1 小时后重新尝试 GLM。DeepSeek 兜底会丢弃 Claude Code 的 extended thinking 参数，避免跨供应商 fallback 时因为历史消息缺少 `thinking_blocks` 被兼容端点拒绝；`CLAUDE_CODE_EFFORT_LEVEL=max` 在 DeepSeek Anthropic 官方接口中对应 `output_config.effort`，当前兜底配置不会丢弃 `output_config`，但会牺牲显式 `thinking` 模式以保证 fallback 可用。如果 GLM 重试和 DeepSeek 兜底都失败，最终错误仍会返回给 Claude Code。
`226`	`226`
`227`	`227`	`### 场景 4：切换到新机器`
`228`	`228`