Skip to content

Commit b897dff

Browse files
chore(devops): add-cron-timeout-overrides.sh for #4808 (Lane B of #4755)
Per-provider timeout shim for isolated agentTurn crons. Unblocks release-please dispatch cadence (cron ad1ab50a). Closes #4808
1 parent 79fba30 commit b897dff

4 files changed

Lines changed: 581 additions & 0 deletions

File tree

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"$schema": "https://openclaw.dev/schemas/config.json",
3+
"name": "aegis-cron-timeout-shim",
4+
"description": "Reference OpenClaw config snippet demonstrating the models.providers.<provider>.timeoutSeconds knob that #4808 uses to raise the per-provider timeout ceiling for non-trivial isolated agentTurn cron payloads. Apply with scripts/devops/add-cron-timeout-overrides.sh.",
5+
"models": {
6+
"mode": "merge",
7+
"providers": {
8+
"minimax-portal": {
9+
"timeoutSeconds": 600
10+
},
11+
"kimi": {
12+
"timeoutSeconds": 600
13+
},
14+
"zai": {
15+
"timeoutSeconds": 600
16+
}
17+
}
18+
}
19+
}

scripts/devops/README.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Cron Timeout Override Shim
2+
3+
**Issue:** [#4808](https://github.com/OneStepAt4time/aegis/issues/4808) — Lane B of [#4755](https://github.com/OneStepAt4time/aegis/issues/4755).
4+
5+
## Problem
6+
7+
Non-trivial `isolated agentTurn` cron payloads time out per-provider during
8+
the OpenClaw sequential fallback chain. Observed: 5 providers × ~2.5min ≈
9+
13min exceeds each provider's per-call timeout for complex multi-step
10+
workloads (release-please pre-flight). The cron fails with
11+
`FallbackSummaryError: All models failed (5)`.
12+
13+
## Why this script exists
14+
15+
The root fix is upstream ([openclaw/openclaw#95408](https://github.com/openclaw/openclaw/issues/95408)
16+
per-agent `model.requestTimeoutSeconds`, Lane C, Hermes). Until that
17+
merges + ships + this host upgrades, we need a workaround on the Aegis
18+
side.
19+
20+
The workaround: bump `models.providers.<provider>.timeoutSeconds` for the
21+
3 unique providers used by `ag-hermes` (the agent that runs the
22+
release-please cron). OpenClaw 2026.5.7 reads this knob at
23+
`model-f6pqrkVH.js:348` (`applyConfiguredProviderOverrides`).
24+
25+
This script applies the override idempotently.
26+
27+
## Why it's safe (global per-provider, not per-agent)
28+
29+
The OpenClaw 2026.5.7 schema only honors `timeoutSeconds` at the
30+
`models.providers.<provider>` level, not per-agent. Setting it raises the
31+
ceiling for every agent that uses those providers. This is acceptable:
32+
33+
- **Simple-payload crons** (watchdog, qa-scan, sentinel) complete in ~30s,
34+
well under any reasonable `timeoutSeconds` value. The bump is invisible.
35+
- **Outer cron-level bound** (`payload.timeoutSeconds`) is unchanged.
36+
Each cron still has its own outer timeout (e.g., 120s for watchdog,
37+
900s for release-please). Bumping the inner per-provider timeout
38+
doesn't extend those.
39+
- **Cost ceiling** is the same — the LLM call still pays per token, just
40+
gets more wall-clock before giving up.
41+
42+
The shim is documented as a workaround. Once Lane C merges + ships +
43+
this host upgrades, the override can be reverted by deleting the
44+
`timeoutSeconds` field from each provider in `~/.openclaw/openclaw.json`.
45+
46+
## Usage
47+
48+
```bash
49+
# DRY-RUN (default) — show what would change
50+
bash scripts/devops/add-cron-timeout-overrides.sh
51+
52+
# Apply the default 600s (10min) override
53+
APPLY=1 bash scripts/devops/add-cron-timeout-overrides.sh
54+
55+
# Apply a custom timeout
56+
TIMEOUT_SECONDS=900 APPLY=1 bash scripts/devops/add-cron-timeout-overrides.sh
57+
58+
# Apply to a subset of providers
59+
TARGET_PROVIDERS="minimax-portal zai" APPLY=1 bash scripts/devops/add-cron-timeout-overrides.sh
60+
61+
# Non-default install path
62+
OPENCLAW_CONFIG=/path/to/openclaw.json APPLY=1 bash scripts/devops/add-cron-timeout-overrides.sh
63+
```
64+
65+
Default timeout: **600s (10min)** — 4× the observed ~2.5min per-provider
66+
ceiling, giving headroom for ~2× LLM round-trip variance.
67+
68+
## Re-enabling the release-please cron
69+
70+
After applying the override, the `ad1ab50a-dba8-40e2-a3de-ca2d2d09dba5`
71+
cron (release-please dispatch) can be re-enabled. The current state has
72+
it disabled with `sessionTarget: "session:agent:ag-hermes:..."` (named
73+
session, from Hephaestus's prior failed workaround on the named-session
74+
lock-in bug).
75+
76+
The cron config update is manual at the `~/.openclaw/cron/jobs.json`
77+
level. Two changes required:
78+
79+
1. Set `enabled: true`
80+
2. Change `sessionTarget` back to `"isolated"`
81+
3. Update the prompt to a current release-please dispatch (the current
82+
one references issue #4708 and a 2026-06-16 memory file)
83+
84+
The cron daemon picks up the change on its next read cycle (< 60s).
85+
86+
## Restart the OpenClaw gateway
87+
88+
The new `timeoutSeconds` takes effect on the next gateway reload. To
89+
pick up immediately:
90+
91+
```bash
92+
openclaw gateway restart
93+
```
94+
95+
Then trigger one manual isolated agentTurn run on `ad1ab50a` to verify
96+
the new ceiling holds.
97+
98+
## Tests
99+
100+
`scripts/devops/__tests__/add-cron-timeout-overrides.test.ts` covers:
101+
102+
1. DRY-RUN does not modify the config
103+
2. APPLY=1 sets `timeoutSeconds` on each target provider
104+
3. `TIMEOUT_SECONDS` env var overrides the default
105+
4. Idempotency (re-running is a no-op)
106+
5. Skip semantics (providers already at-or-above target)
107+
6. Scope (`TARGET_PROVIDERS` env var)
108+
7. Error paths (missing config, invalid timeout, malformed config)
109+
8. Partial success (missing target provider doesn't abort other updates)
110+
111+
Run with:
112+
113+
```bash
114+
npx vitest run scripts/devops/__tests__/add-cron-timeout-overrides.test.ts
115+
```
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
/**
2+
* Regression tests for scripts/devops/add-cron-timeout-overrides.sh
3+
*
4+
* Covers #4808 (Lane B of #4755). The script applies a per-provider
5+
* `timeoutSeconds` override to the OpenClaw config so non-trivial isolated
6+
* agentTurn cron payloads don't time out per-provider during the
7+
* sequential fallback chain.
8+
*
9+
* These tests run the actual bash script against fixture OpenClaw config
10+
* files in a temp directory. They verify:
11+
* 1. DRY-RUN mode does NOT modify the config
12+
* 2. APPLY=1 mode sets the timeoutSeconds on each target provider
13+
* 3. Idempotency: re-running with the same target leaves the config unchanged
14+
* 4. Skip semantics: providers already at-or-above target are skipped
15+
* 5. Error path: missing jq, missing config, invalid timeout value
16+
*
17+
* Requires `bash` and `jq` on PATH (same as the script itself).
18+
*/
19+
import { execFileSync } from 'node:child_process';
20+
import { mkdtempSync, writeFileSync, readFileSync, rmSync } from 'node:fs';
21+
import { tmpdir } from 'node:os';
22+
import { join, resolve } from 'node:path';
23+
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
24+
25+
const REPO_ROOT = resolve(__dirname, '../../..');
26+
const SCRIPT_PATH = join(REPO_ROOT, 'scripts/devops/add-cron-timeout-overrides.sh');
27+
28+
interface OpenClawConfigFixture {
29+
models: {
30+
mode: string;
31+
providers: Record<string, Record<string, unknown>>;
32+
};
33+
}
34+
35+
function makeFixture(
36+
overrides: Partial<Record<string, Record<string, unknown>>> = {},
37+
): OpenClawConfigFixture {
38+
return {
39+
models: {
40+
mode: 'merge',
41+
providers: {
42+
'minimax-portal': { baseUrl: 'https://example.test' },
43+
kimi: { baseUrl: 'https://example.test' },
44+
zai: { baseUrl: 'https://example.test' },
45+
'unrelated-provider': { baseUrl: 'https://example.test' },
46+
...overrides,
47+
},
48+
},
49+
};
50+
}
51+
52+
function runScript(params: {
53+
configPath: string;
54+
env?: Record<string, string>;
55+
apply?: boolean;
56+
}): { stdout: string; stderr: string; status: number } {
57+
const env: Record<string, string> = {
58+
...process.env,
59+
OPENCLAW_CONFIG: params.configPath,
60+
...(params.apply ? { APPLY: '1' } : {}),
61+
...(params.env ?? {}),
62+
};
63+
try {
64+
const stdout = execFileSync('bash', [SCRIPT_PATH], {
65+
env,
66+
encoding: 'utf8',
67+
stdio: ['ignore', 'pipe', 'pipe'],
68+
});
69+
return { stdout, stderr: '', status: 0 };
70+
} catch (err) {
71+
const e = err as { stdout?: string; stderr?: string; status?: number };
72+
return {
73+
stdout: e.stdout ?? '',
74+
stderr: e.stderr ?? '',
75+
status: e.status ?? 1,
76+
};
77+
}
78+
}
79+
80+
describe('add-cron-timeout-overrides.sh', () => {
81+
let workDir: string;
82+
83+
beforeEach(() => {
84+
workDir = mkdtempSync(join(tmpdir(), 'cron-timeout-shim-test-'));
85+
});
86+
87+
afterEach(() => {
88+
rmSync(workDir, { recursive: true, force: true });
89+
});
90+
91+
function writeFixture(config: OpenClawConfigFixture): string {
92+
const path = join(workDir, 'openclaw.json');
93+
writeFileSync(path, JSON.stringify(config, null, 2));
94+
return path;
95+
}
96+
97+
function readConfig(path: string): OpenClawConfigFixture {
98+
return JSON.parse(readFileSync(path, 'utf8')) as OpenClawConfigFixture;
99+
}
100+
101+
it('DRY-RUN mode does not modify the config', () => {
102+
const configPath = writeFixture(makeFixture());
103+
104+
const { stdout, status } = runScript({ configPath });
105+
106+
expect(status).toBe(0);
107+
expect(stdout).toContain('DRY-RUN');
108+
109+
const config = readConfig(configPath);
110+
expect(config.models.providers['minimax-portal'].timeoutSeconds).toBeUndefined();
111+
expect(config.models.providers.kimi.timeoutSeconds).toBeUndefined();
112+
expect(config.models.providers.zai.timeoutSeconds).toBeUndefined();
113+
});
114+
115+
it('APPLY=1 sets timeoutSeconds on each target provider', () => {
116+
const configPath = writeFixture(makeFixture());
117+
118+
const { stdout, status } = runScript({ configPath, apply: true });
119+
120+
expect(status).toBe(0);
121+
expect(stdout).toContain('APPLY');
122+
123+
const config = readConfig(configPath);
124+
expect(config.models.providers['minimax-portal'].timeoutSeconds).toBe(600);
125+
expect(config.models.providers.kimi.timeoutSeconds).toBe(600);
126+
expect(config.models.providers.zai.timeoutSeconds).toBe(600);
127+
});
128+
129+
it('APPLY=1 with TIMEOUT_SECONDS uses the override value', () => {
130+
const configPath = writeFixture(makeFixture());
131+
132+
const { status } = runScript({
133+
configPath,
134+
apply: true,
135+
env: { TIMEOUT_SECONDS: '900' },
136+
});
137+
138+
expect(status).toBe(0);
139+
const config = readConfig(configPath);
140+
expect(config.models.providers['minimax-portal'].timeoutSeconds).toBe(900);
141+
expect(config.models.providers.zai.timeoutSeconds).toBe(900);
142+
});
143+
144+
it('idempotent: re-running leaves the config unchanged after first apply', () => {
145+
const configPath = writeFixture(makeFixture());
146+
147+
const first = runScript({ configPath, apply: true });
148+
expect(first.status).toBe(0);
149+
150+
const afterFirst = readFileSync(configPath, 'utf8');
151+
152+
const second = runScript({ configPath, apply: true });
153+
expect(second.status).toBe(0);
154+
expect(second.stdout).toContain('Already at or above target (skipped): 3');
155+
156+
const afterSecond = readFileSync(configPath, 'utf8');
157+
expect(afterSecond).toBe(afterFirst);
158+
});
159+
160+
it('skips providers already at or above the target timeout', () => {
161+
const configPath = writeFixture(
162+
makeFixture({
163+
'minimax-portal': { timeoutSeconds: 900 },
164+
}),
165+
);
166+
167+
const { stdout, status } = runScript({ configPath, apply: true });
168+
169+
expect(status).toBe(0);
170+
const config = readConfig(configPath);
171+
expect(config.models.providers['minimax-portal'].timeoutSeconds).toBe(900);
172+
expect(config.models.providers.kimi.timeoutSeconds).toBe(600);
173+
expect(config.models.providers.zai.timeoutSeconds).toBe(600);
174+
175+
expect(stdout).toContain('already has timeoutSeconds=900');
176+
});
177+
178+
it('does not touch providers outside TARGET_PROVIDERS', () => {
179+
const configPath = writeFixture(makeFixture());
180+
181+
const { status } = runScript({ configPath, apply: true });
182+
183+
expect(status).toBe(0);
184+
const config = readConfig(configPath);
185+
expect(config.models.providers['unrelated-provider'].timeoutSeconds).toBeUndefined();
186+
});
187+
188+
it('TARGET_PROVIDERS env var scopes the patch', () => {
189+
const configPath = writeFixture(makeFixture());
190+
191+
const { status } = runScript({
192+
configPath,
193+
apply: true,
194+
env: { TARGET_PROVIDERS: 'zai' },
195+
});
196+
197+
expect(status).toBe(0);
198+
const config = readConfig(configPath);
199+
expect(config.models.providers.zai.timeoutSeconds).toBe(600);
200+
expect(config.models.providers['minimax-portal'].timeoutSeconds).toBeUndefined();
201+
expect(config.models.providers.kimi.timeoutSeconds).toBeUndefined();
202+
});
203+
204+
it('exits non-zero when config file is missing', () => {
205+
const missing = join(workDir, 'does-not-exist.json');
206+
const { status, stderr } = runScript({ configPath: missing });
207+
208+
expect(status).not.toBe(0);
209+
expect(stderr).toContain('not found');
210+
});
211+
212+
it('exits non-zero when TIMEOUT_SECONDS is invalid', () => {
213+
const configPath = writeFixture(makeFixture());
214+
215+
const { status, stderr } = runScript({
216+
configPath,
217+
apply: true,
218+
env: { TIMEOUT_SECONDS: 'not-a-number' },
219+
});
220+
221+
expect(status).not.toBe(0);
222+
expect(stderr).toContain('TIMEOUT_SECONDS must be a positive integer');
223+
});
224+
225+
it('exits non-zero when config lacks models.providers object', () => {
226+
const bogus = join(workDir, 'bogus.json');
227+
writeFileSync(bogus, JSON.stringify({ meta: { foo: 'bar' } }));
228+
229+
const { status, stderr } = runScript({ configPath: bogus });
230+
231+
expect(status).not.toBe(0);
232+
expect(stderr).toContain('does not have a models.providers object');
233+
});
234+
235+
it('reports missing target provider in summary without aborting other updates', () => {
236+
const fixture: OpenClawConfigFixture = {
237+
models: {
238+
mode: 'merge',
239+
providers: {
240+
'minimax-portal': { baseUrl: 'https://example.test' },
241+
zai: { baseUrl: 'https://example.test' },
242+
},
243+
},
244+
};
245+
const configPath = writeFixture(fixture);
246+
247+
const { stdout, status } = runScript({ configPath, apply: true });
248+
249+
expect(status).toBe(0);
250+
// The script uses an em-dash and 'not found' marker; assert on the stable parts.
251+
expect(stdout).toMatch(/kimi\s+\S+\s+not found in models\.providers/);
252+
expect(stdout).toContain('Provider not found in config: 1');
253+
254+
const config = readConfig(configPath);
255+
expect(config.models.providers['minimax-portal'].timeoutSeconds).toBe(600);
256+
expect(config.models.providers.zai.timeoutSeconds).toBe(600);
257+
});
258+
});

0 commit comments

Comments
 (0)