mudssky
diff --git a/‎.trellis/spec/infra/litellm-gateway.md‎
Lines changed: 20 additions & 9 deletions b/‎.trellis/spec/infra/litellm-gateway.md‎
Lines changed: 20 additions & 9 deletions
diff --git a/‎.trellis/tasks/05-12-glm-429-cooldown-thinking-fallback/check.jsonl‎
Lines changed: 1 addition & 0 deletions b/‎.trellis/tasks/05-12-glm-429-cooldown-thinking-fallback/check.jsonl‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.trellis/tasks/05-12-glm-429-cooldown-thinking-fallback/implement.jsonl‎
Lines changed: 1 addition & 0 deletions b/‎.trellis/tasks/05-12-glm-429-cooldown-thinking-fallback/implement.jsonl‎
Lines changed: 1 addition & 0 deletions
@@ -8,8 +8,8 @@
 
 ### 1. Scope / Trigger
 
-- Trigger: 修改 `ai/gateway/litellm/*.yaml` 中 Claude Code GLM 入口、DeepSeek 兜底别名、`router_settings.fallbacks`、`additional_drop_params`、`litellm_settings.callbacks`、`callbacks/deepseek_thinking_sanitizer*.py` 或 `litellm_settings.modify_params`。
-- Scope: `cc-glmplan-opus` / `cc-glmplan-haiku` 主路由优先使用智谱 GLM Coding Plan；GLM 返回 429 或 LiteLLM `RateLimitError` 后短重试，仍失败才 fallback 到 DeepSeek Anthropic 兼容端点。
+- Trigger: 修改 `ai/gateway/litellm/*.yaml` 中 Claude Code GLM 入口、DeepSeek 兜底别名、`router_settings.fallbacks`、`additional_drop_params`、`litellm_settings.callbacks`、`callbacks/gateway_callback.py`、`callbacks/framework/**`、`callbacks/adapters/**` 或 `litellm_settings.modify_params`。
+- Scope: `cc-glmplan-opus` / `cc-glmplan-haiku` 主路由优先使用智谱 GLM Coding Plan；GLM 返回 429 或 LiteLLM `RateLimitError` 后短重试，仍失败才 fallback 到 DeepSeek Anthropic 兼容端点；GLM 额度 429 返回 reset 时间后，callback adapter 在 reset 后延迟恢复前会预先避让 GLM。
 - Design intent: 主路由尽量保留 Claude Code extended thinking；兜底路由优先保证请求不中断。
 
 ### 2. Signatures
@@ -37,7 +37,7 @@
   - `LITELLM_MASTER_KEY`: LiteLLM 对外鉴权密钥。
 - Fallback-only parameter policy:
   - `claude-code-deepseek-*` 是 Claude Code Anthropic messages 专用兜底入口，必须保留当前请求的 top-level `thinking`、`reasoning_effort` 与 `output_config.effort`；不得再通过 `additional_drop_params` 丢弃 `thinking` / `reasoning_effort`。
-  - Claude `/v1/messages` 原生路径必须启用 `callbacks.deepseek_thinking_sanitizer.proxy_handler_instance`，因为该路径会把历史 `messages[].content[]` 直接传给上游，`additional_drop_params` 不能移除 `content[].thinking` / `redacted_thinking` 内容块。
+  - Claude `/v1/messages` 原生路径必须启用 `callbacks.gateway_callback.proxy_handler_instance`，并由其中的 DeepSeek thinking sanitizer adapter 处理 DeepSeek 请求，因为该路径会把历史 `messages[].content[]` 直接传给上游，`additional_drop_params` 不能移除 `content[].thinking` / `redacted_thinking` 内容块。
   - DeepSeek 官方 Claude Code 直连配置推荐 `CLAUDE_CODE_EFFORT_LEVEL=max`；在 Anthropic 兼容接口里，DeepSeek 的 effort 语义对应 `output_config.effort`，不是 OpenAI 兼容接口里的 `reasoning_effort`。
   - 原生 Anthropic messages fallback 的核心问题是历史 assistant thinking 内容块有两类语义：带 `signature` 的 `thinking` 与带 `data` 的 `redacted_thinking` 是上游要求完整回传的不透明块；缺少签名/不透明数据的 thinking 块通常来自跨供应商或中间层转换，DeepSeek 无法校验。sanitizer 应保留可回传块，只清理不兼容块与 `thinking_blocks` 辅助字段。
   - 不得把 `thinking`、`reasoning_effort`、`output_config` 或 `output_config.effort` 加入 DeepSeek Claude Code 兜底别名的 `additional_drop_params`；DeepSeek Anthropic 兼容接口使用 top-level thinking 与 `output_config.effort` 承接 Claude Code effort。
@@ -46,7 +46,11 @@
 - LiteLLM settings:
   - `drop_params: true` 用于丢弃上游不识别的普通参数。
   - `modify_params: true` 用于允许 LiteLLM 修正 Anthropic tool/thinking 历史块兼容问题。
-  - `callbacks` 必须包含 DeepSeek thinking sanitizer；`compose.yaml` 必须挂载 `./callbacks:/app/callbacks:ro`，否则配置中的 Python 回调无法导入。
+  - `callbacks` 必须包含统一入口 `callbacks.gateway_callback.proxy_handler_instance`；`compose.yaml` 必须挂载 `./callbacks:/app/callbacks:ro`，否则配置中的 Python 回调无法导入。
+  - callback 顶层目录只放 LiteLLM import 入口：`gateway_callback.py`。框架基础设施放在 `callbacks/framework/**`，供应商能力放在 `callbacks/adapters/<provider>/**`，离线测试放在 `callbacks/tests/**`。
+  - `GatewayCallbackHub` 是唯一主入口，负责把 LiteLLM 生命周期 hook 分发给启用的 adapter；adapter 默认 fail-open，异常日志不得包含 prompt、API key、完整 headers 或完整 request body。
+  - `GlmCooldownAdapter` 必须只对 `cc-glmplan-opus` / `cc-glmplan-haiku` 的 GLM 额度或限流错误生效；识别到 reset 时间时按 `reset + LITELLM_GLM_RESET_BUFFER_SECONDS` 冷却，无法解析 reset 但确认是额度/限流错误时使用 `LITELLM_GLM_FALLBACK_COOLDOWN_SECONDS` 兜底。
+  - GLM cooldown adapter 在冷却期间应于 Router 选部署前把 `cc-glmplan-opus` 改写为 `claude-code-deepseek-v4-pro`，把 `cc-glmplan-haiku` 改写为 `claude-code-deepseek-v4-flash`；不得影响 GLM 非 Claude Code 路由或其它供应商。
   - DeepSeek sanitizer 修改真实请求体时必须使用 `async_pre_call_deployment_hook`。该 hook 在 Router 选中 fallback 部署后、provider 构造 Anthropic messages 请求体前运行，可以基于 `litellm_metadata.deployment` / `deployment_model_name` / `api_base` 识别 DeepSeek 兜底部署。
   - Anthropic messages pass-through 会把 `messages` 作为位置参数继续传给 handler；sanitizer 不能只给 `kwargs["messages"]` 赋一个新列表，必须原地修改原 `messages` 列表引用，否则 provider request body 仍可能使用未清理的历史。
   - DeepSeek fallback 的 sanitizer 必须递归清理 content 结构，覆盖 `messages[*].content[*]`、嵌套 tool/result content、`thinking_blocks` 与 `redacted_thinking`；真实 Claude Code 历史不保证 thinking 只出现在第一层 content 列表。
@@ -64,6 +68,9 @@
 | GLM 正常可用 | `cc-glmplan-*` 直接走 GLM，保留 Claude Code thinking 语义 |
 | GLM 返回 429 / `RateLimitError` | LiteLLM 先按 retry policy 短重试 |
 | GLM 短重试耗尽 | Router fallback 到对应 `claude-code-deepseek-*` |
+| GLM 429 错误体包含 reset 时间 | callback adapter 记录 `reset + 60 秒` 的冷却截止时间 |
+| GLM 仍在 adapter 冷却期 | 请求进入 Router 前直接改写到对应 DeepSeek fallback，避免 5 小时额度窗口内重复撞限流 |
+| GLM 429 不是额度/限流错误 | 不记录 5 小时兜底冷却，避免普通上游错误导致长时间避让 |
 | DeepSeek 收到顶层 `thinking` / `reasoning_effort` | Claude Code DeepSeek 兜底路由应保留这些当前请求参数；sanitizer 只处理历史 content thinking 块 |
 | DeepSeek 收到带签名/不透明数据的历史 `content[].thinking` / `redacted_thinking` | sanitizer 必须原样保留这些块；DeepSeek thinking mode 需要它们维持工具调用回合的推理连续性 |
 | DeepSeek 收到无签名/不完整的历史 `content[].thinking` / `redacted_thinking` | sanitizer 必须在 deployment pre-call 阶段移除这些不兼容块，否则 DeepSeek 可能返回 thinking 历史校验错误 |
@@ -78,6 +85,8 @@
 ### 5. Good/Base/Bad Cases
 
 - Good: GLM 429 后切到 DeepSeek，原生 Anthropic `/v1/messages` fallback 保留当前 top-level `thinking`，同时只移除无签名/不完整的历史 thinking content 块。
+- Good: GLM 额度 429 返回 `您的限额将在 ... 重置` 后，后续 `cc-glmplan-*` 请求在 reset + buffer 前由 callback adapter 预先切到对应 DeepSeek fallback。
+- Good: callback 目录按 `framework/`、`adapters/<provider>/`、`tests/` 分层，顶层只保留 LiteLLM 配置直接 import 的薄入口。
 - Good: sanitizer 原地修改 `messages` 列表并递归清理嵌套 content；日志显示 `top_level_thinking_before: enabled/adaptive`、`top_level_thinking_after: enabled/adaptive`、`remaining_thinking_paths: []`，同时 `preserved_thinking_blocks_after` 可大于 0。
 - Good: DeepSeek 兜底别名不丢弃 `thinking`、`reasoning_effort` 或 `output_config.effort`；如果 Claude Code / LiteLLM 以 DeepSeek Anthropic 官方字段表达 effort，`CLAUDE_CODE_EFFORT_LEVEL=max` 仍有机会透传。
 - Base: GLM 正常响应时不触发 fallback，不改变 Claude Code 对 GLM 主路由的 thinking 使用方式。
@@ -86,21 +95,23 @@
 - Bad: sanitizer 删除所有 `content[].thinking`，导致 DeepSeek 在带工具调用历史的 thinking mode 中报 `content[].thinking in the thinking mode must be passed back`。
 - Bad: sanitizer 诊断函数直接用 `value.get("type") in THINKING_BLOCK_TYPES`，真实请求里 `type` 是 dict 时会在 LiteLLM logging pre-call 阶段抛异常，反而遮蔽 fallback 的真实错误。
 - Bad: 为了兼容 Chat/Responses，把 `claude-code-deepseek-*` 继续配置成丢弃 `thinking` / `reasoning_effort`，导致 Claude Code 兜底链路失去 DeepSeek thinking / effort 能力。
+- Bad: 把所有 adapter 平铺在 `callbacks/` 顶层，导致 LiteLLM import 入口、框架抽象、供应商实现和测试混在同一目录。
 
 ### 6. Tests Required
 
 - Config parse: YAML 必须能被项目现有解析方式读取。
 - Config sync: 如果 `newapi.yaml` 与 `litellm.local.yaml` 应保持一致，修改后需要确认两者没有非预期差异。
 - Route contract: 检查 `router_settings.fallbacks` 仍指向专用 DeepSeek 兜底别名。
 - Parameter contract: 检查 `claude-code-deepseek-*` 不再配置 `additional_drop_params` 丢弃 `thinking` / `reasoning_effort`；如果出现 safe 兼容路由，其命名必须与 Claude Code 兜底路由区分。
-- Callback contract: 检查 `callbacks.deepseek_thinking_sanitizer.proxy_handler_instance` 能在 LiteLLM 镜像内导入，并实现 `async_pre_call_deployment_hook`，能在 `CallTypes.anthropic_messages` 且 deployment metadata 指向 DeepSeek 时原地清理请求参数。
+- Callback contract: 检查 `callbacks.gateway_callback.proxy_handler_instance` 能在 LiteLLM 镜像内导入，并实现 `async_pre_call_hook`、`async_pre_call_deployment_hook`、`async_log_failure_event` 与 `log_pre_api_call` 的 adapter 分发。
 - Hook-stage contract: 离线测试必须直接调用 `async_pre_call_deployment_hook`，输入包含 `litellm_metadata.deployment` / `deployment_model_name` / `api_base`、顶层 `thinking` / `reasoning_effort`、历史 `content[].thinking` / `redacted_thinking`，断言清理发生在 provider 请求体构造前。
+- GLM cooldown contract: 离线测试必须覆盖 GLM 429 中文 reset 时间解析、`reset + 60 秒` 计算、解析失败但确认限流时的固定兜底冷却、非限流错误不记录长冷却、冷却期间 `cc-glmplan-*` 请求前改写到对应 DeepSeek fallback。
 - Reference contract: 离线测试必须断言原始 `messages` 列表对象 ID 不变，且清理后 `kwargs["messages"] is messages`；这是 Anthropic messages pass-through 位置参数链路的关键行为。
 - Recursive contract: 离线测试必须包含嵌套 content 中的 `redacted_thinking`、signed `thinking`、unsigned `thinking` 和 message-level `thinking_blocks`，并断言 unsigned/incomplete thinking 被清理、signed/redacted opaque thinking 被保留、`thinking_paths(...)` 在清理后为空。
 - Diagnostic robustness contract: 离线测试必须覆盖 content block `type` 为非字符串的异常结构，断言 `thinking_paths(...)` 不抛异常且只报告真实 thinking block 路径。
 - Current thinking contract: 离线测试必须覆盖 top-level `thinking`、`reasoning_effort` 与 `output_config.effort`，断言 sanitizer 清理历史 thinking 后仍保留这些当前请求参数。
 - Runtime smoke contract: 真实验证可用 `/v1/messages?beta=true` 先直打 `claude-code-deepseek-v4-pro`，再打 `cc-glmplan-opus` 触发 429 fallback；成功样本应返回 HTTP 200，容器日志应有两个阶段的 `deepseek thinking sanitized`、`remaining_thinking_paths: []`，并允许 `preserved_thinking_blocks_after > 0`。
-- Runtime callback contract: 重启 LiteLLM 后调用 `/active/callbacks`，确认运行态 `litellm.callbacks` 包含 `callbacks.deepseek_thinking_sanitizer.DeepSeekThinkingSanitizer`；不要用 `docker exec python` 新进程里的 `litellm.callbacks` 判断服务进程状态。
+- Runtime callback contract: 重启 LiteLLM 后调用 `/active/callbacks`，确认运行态 `litellm.callbacks` 包含 `callbacks.gateway_callback.GatewayCallbackHub` 或等价统一入口；不要用 `docker exec python` 新进程里的 `litellm.callbacks` 判断服务进程状态。
 - Runtime note: 真实 429 fallback 依赖上游额度、密钥和实时响应；本地配置验证不能证明线上额度恢复或供应商端协议行为。
 
 ### 7. Wrong vs Correct
@@ -134,10 +145,10 @@ litellm_settings:
   drop_params: true
   modify_params: true
   callbacks:
-    - callbacks.deepseek_thinking_sanitizer.proxy_handler_instance
+    - callbacks.gateway_callback.proxy_handler_instance
 ```
 
-理由：DeepSeek 兜底别名是 Claude Code 降级链路专用入口，应保留当前请求的 thinking / effort 能力；sanitizer 只处理 Anthropic `/v1/messages` 历史 content thinking 块，两者不能互相替代。
+理由：DeepSeek 兜底别名是 Claude Code 降级链路专用入口，应保留当前请求的 thinking / effort 能力；gateway callback hub 分发给 DeepSeek sanitizer adapter 与 GLM cooldown adapter，sanitizer 只处理 Anthropic `/v1/messages` 历史 content thinking 块，两者不能互相替代。
 
 #### DeepSeek effort vs thinking
 
@@ -157,7 +168,7 @@ model_list:
 ```yaml
 litellm_settings:
   callbacks:
-    - callbacks.deepseek_thinking_sanitizer.proxy_handler_instance
+    - callbacks.gateway_callback.proxy_handler_instance
 ```
 
 说明：Claude Code 使用 `/v1/messages?beta=true` 时，LiteLLM 走 Anthropic 原生 messages pass-through。该路径的 `messages` 与 `thinking` 不走普通 OpenAI 参数映射，`additional_drop_params` 不能删除历史 `messages[*].content[*]` 中的 thinking 内容块。DeepSeek 返回 `content[].thinking in the thinking mode must be passed back` 时，应先确认 sanitizer 是否误删了带 `signature` 的 thinking 或带 `data` 的 redacted thinking，而不是只改 `additional_drop_params`。
 
@@ -0,0 +1 @@
+{"_example": "Fill with {\"file\": \"<path>\", \"reason\": \"<why>\"}. Put spec/research files only — no code paths. Run `python3 .trellis/scripts/get_context.py --mode packages` to list available specs. Delete this line once real entries are added."}
@@ -0,0 +1 @@
+{"_example": "Fill with {\"file\": \"<path>\", \"reason\": \"<why>\"}. Put spec/research files only — no code paths. Run `python3 .trellis/scripts/get_context.py --mode packages` to list available specs. Delete this line once real entries are added."}
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+{"_example": "Fill with {\"file\": \"<path>\", \"reason\": \"<why>\"}. Put spec/research files only — no code paths. Run `python3 .trellis/scripts/get_context.py --mode packages` to list available specs. Delete this line once real entries are added."}