|
| 1 | +# Coding Plan Window Warmer Spec |
| 2 | + |
| 3 | +> 本规范记录 `ai/coding/window-warmer` 的预热调度、直连上游、依赖管理和 PM2 管理约定。修改窗口预热工具、默认 TOML、PM2 配置或相关启动文档时必须先阅读。 |
| 4 | +
|
| 5 | +--- |
| 6 | + |
| 7 | +## Scenario: Direct Coding Plan Window Warmup |
| 8 | + |
| 9 | +### 1. Scope / Trigger |
| 10 | + |
| 11 | +- Trigger: 修改 `ai/coding/window-warmer/**`、窗口预热启动命令、默认 `window-warmer.toml`、PM2 ecosystem 配置,或 LiteLLM 网关文档中的窗口预热说明。 |
| 12 | +- Scope: 宿主机侧独立脚本按多个 `[[plans]]` 的 `fixed_times` 或 `interval` 调度发送轻量 completion 请求,用于把 Coding Plan 额度窗口尽量锁定到可预期的时间段。 |
| 13 | +- Design intent: 预热是独立运维工具,不属于 LiteLLM callback、LiteLLM Proxy 路由或 Docker Compose sidecar;默认请求必须直连上游 Coding Plan 服务端点,避免被 LiteLLM Proxy fallback 到 DeepSeek 或其它兜底路由。 |
| 14 | + |
| 15 | +### 2. Signatures |
| 16 | + |
| 17 | +- Direct run: |
| 18 | + - `uv run --script ai/coding/window-warmer/window_warmer.py --config ai/coding/window-warmer/window-warmer.toml` |
| 19 | + - `uv run --script ai/coding/window-warmer/window_warmer.py --config ai/coding/window-warmer/window-warmer.toml --print-next` |
| 20 | + - `uv run --script ai/coding/window-warmer/window_warmer.py --config ai/coding/window-warmer/window-warmer.toml --once --dry-run` |
| 21 | +- PM2: |
| 22 | + - `pm2 start ai/coding/window-warmer/window-warmer.pm2.config.cjs` |
| 23 | + - PM2 app name: `coding-window-warmer` |
| 24 | +- Python script: |
| 25 | + - Entry file: `ai/coding/window-warmer/window_warmer.py` |
| 26 | + - Dependency declaration: PEP 723 script metadata with `litellm>=1.81.0` |
| 27 | + - Helper package: `ai/coding/window-warmer/window_warmer_lib/` |
| 28 | + |
| 29 | +### 3. Contracts |
| 30 | + |
| 31 | +- Target config `[target]`: |
| 32 | + - `name`: log-only target name. |
| 33 | + - `base_url`: direct upstream OpenAI-compatible API base URL. Default points to `https://open.bigmodel.cn/api/coding/paas/v4`, not local LiteLLM Proxy. |
| 34 | + - `container_name`: optional local Docker readiness gate. When set to `litellm`, it only proves the local gateway container is running; it must not change the warm request destination. |
| 35 | + - `api_key_env`: optional environment variable for upstream API key. Default for Z.ai Coding Plan is `Z_AI_API_KEY`. |
| 36 | + - `env_file`: optional dotenv-style file path, resolved relative to the TOML file. |
| 37 | + - `health_path`: optional direct target health path. Default is `/models`. |
| 38 | + - `request_timeout_seconds`: timeout used by health check and LiteLLM SDK completion. |
| 39 | +- Plan config `[[plans]]`: |
| 40 | + - `model`: LiteLLM SDK model string. For direct OpenAI-compatible upstreams, use `openai/<provider-model>`, for example `openai/GLM-5.1`. |
| 41 | + - `prompt`: light warmup prompt. Logs must not print prompt text. |
| 42 | + - `schedule_mode`: `fixed_times` or `interval`. |
| 43 | + - `times`: required for `fixed_times`. |
| 44 | + - `start_time` or `start_at` plus `window`: required for `interval`. |
| 45 | + - `jitter_seconds`, `retry_count`, `retry_delay_seconds`: per-plan overrides. |
| 46 | +- Request contract: |
| 47 | + - Warm requests use `litellm.completion(model=plan.model, messages=[...], api_base=target.base_url, api_key=api_key, timeout=..., max_tokens=..., temperature=...)`. |
| 48 | + - The warmer must not call local LiteLLM Proxy `/v1/chat/completions` for default GLM warmup. |
| 49 | + - Health checks may use direct HTTP GET because they are a readiness probe, not the warmup completion. |
| 50 | + |
| 51 | +### 4. Validation & Error Matrix |
| 52 | + |
| 53 | +| Condition | Expected Behavior | |
| 54 | +|-----------|-------------------| |
| 55 | +| `scheduler.enabled=false` | Script exits successfully without scheduling warmups | |
| 56 | +| No enabled plans | `--once` / watch mode logs `没有启用的 plan` and does not send requests | |
| 57 | +| `container_name` configured but Docker missing | Warmup is skipped with `未找到 docker 命令` diagnostic | |
| 58 | +| `container_name` configured but container not running | Warmup is skipped before reading/sending completion | |
| 59 | +| `api_key_env` configured but missing from env and `env_file` | Warmup is skipped with missing key diagnostic | |
| 60 | +| `health_path` configured but direct target health check fails | Warmup is skipped before completion request | |
| 61 | +| `--dry-run` or `scheduler.dry_run=true` | Docker/API readiness checks and completion request are skipped | |
| 62 | +| LiteLLM SDK completion fails | Failure is logged without prompt/key/body; retry up to `retry_count` | |
| 63 | +| Multiple plans share the same base time | Each plan remains in the event queue and is executed independently | |
| 64 | + |
| 65 | +### 5. Good/Base/Bad Cases |
| 66 | + |
| 67 | +- Good: Default config checks optional local `litellm` container but sends `openai/GLM-5.1` to `https://open.bigmodel.cn/api/coding/paas/v4` through LiteLLM SDK. |
| 68 | +- Good: `uv run --script` handles LiteLLM SDK dependency without creating repo-level `requirements.txt`, `pyproject.toml`, or a committed virtual environment. |
| 69 | +- Good: Time calculation is pure and unit-tested separately from HTTP/LiteLLM SDK calls. |
| 70 | +- Base: `fixed_times = ["08:00", "13:00", "18:00", "23:00"]` with `jitter_seconds = 120` schedules each event within two minutes after the base time. |
| 71 | +- Bad: Pointing `[target].base_url` at `http://127.0.0.1:34000` for default GLM warmup, because the request can enter LiteLLM Proxy fallback chains. |
| 72 | +- Bad: Using model `GLM-5.1` without an explicit provider prefix for direct upstream calls, because LiteLLM SDK provider inference can be ambiguous. |
| 73 | +- Bad: Logging prompt text, API key, full headers, or full request body. |
| 74 | +- Bad: Re-merging all helper modules into a single thousand-line script. |
| 75 | + |
| 76 | +### 6. Tests Required |
| 77 | + |
| 78 | +- Unit tests for `fixed_times` next-day rollover. |
| 79 | +- Unit tests for `interval` continuous-window rollover across midnight. |
| 80 | +- Unit tests for multiple plans with simultaneous base time remaining independently executable. |
| 81 | +- Config parse tests for multiple `[[plans]]`. |
| 82 | +- SDK call test mocking the local wrapper around `litellm.completion`, asserting: |
| 83 | + - `model` keeps the configured provider-prefixed model. |
| 84 | + - `api_base` is the direct target URL. |
| 85 | + - prompt, max tokens, temperature and timeout are passed. |
| 86 | +- Dry-run test asserting readiness checks are skipped. |
| 87 | +- Smoke commands: |
| 88 | + - `uv run --script ai/coding/window-warmer/window_warmer.py --config ai/coding/window-warmer/window-warmer.toml --print-next` |
| 89 | + - `uv run --script ai/coding/window-warmer/window_warmer.py --config ai/coding/window-warmer/window-warmer.toml --once --dry-run` |
| 90 | + - `node -c ai/coding/window-warmer/window-warmer.pm2.config.cjs` |
| 91 | + |
| 92 | +### 7. Wrong vs Correct |
| 93 | + |
| 94 | +#### Wrong |
| 95 | + |
| 96 | +```toml |
| 97 | +[target] |
| 98 | +name = "local-litellm" |
| 99 | +base_url = "http://127.0.0.1:34000" |
| 100 | +api_key_env = "LITELLM_MASTER_KEY" |
| 101 | +health_path = "/health" |
| 102 | + |
| 103 | +[[plans]] |
| 104 | +model = "GLM-5.1" |
| 105 | +endpoint = "/v1/chat/completions" |
| 106 | +``` |
| 107 | + |
| 108 | +问题:这会把 warmup 请求送进 LiteLLM Proxy;如果 GLM 429 或被 callback 标记冷却,请求可能 fallback 到 DeepSeek,既不能锁定 GLM Coding Plan 窗口,也会消耗兜底额度。 |
| 109 | + |
| 110 | +#### Correct |
| 111 | + |
| 112 | +```toml |
| 113 | +[target] |
| 114 | +name = "z-ai-coding-plan" |
| 115 | +base_url = "https://open.bigmodel.cn/api/coding/paas/v4" |
| 116 | +container_name = "litellm" |
| 117 | +api_key_env = "Z_AI_API_KEY" |
| 118 | +health_path = "/models" |
| 119 | + |
| 120 | +[[plans]] |
| 121 | +name = "glm-coding-plan" |
| 122 | +model = "openai/GLM-5.1" |
| 123 | +schedule_mode = "fixed_times" |
| 124 | +times = ["08:00", "13:00", "18:00", "23:00"] |
| 125 | +``` |
| 126 | + |
| 127 | +理由:`container_name` 只是可选本机启动条件;真实 warmup completion 由 LiteLLM SDK 直连 `target.base_url`,不会进入 LiteLLM Proxy 路由/fallback。 |
0 commit comments