Commit 5ad81e8
authored
feat(retention): strict bool env parsing + stale-RUNNING cleanup override (#306)
Two hardening changes to the scheduled task-retention cleanup (#272
follow-up) ahead of the EY rollout. Context: [EY data-retention
thread](https://scaleapi.slack.com/archives/C0AMZ12G2G7) + the sgp-dev
verification doc.
## 1. Strict boolean env parsing
`RETENTION_CLEANUP_DRY_RUN` was parsed with `os.environ.get(...) ==
"true"`, so **any** unrecognized value silently meant `dry_run=False` →
live deletion. `DRY_RUN=True` (capital T — exactly what YAML/Helm
tooling tends to render) was the documented gotcha in the sgp-dev test
writeup; in the EY environment this config flows through system-manager
value_overrides → Helm → env, which makes that footgun very real.
Booleans now accept `true/false/1/0` case-insensitively and **raise on
anything else**, so a misconfigured worker refuses to run instead of
deleting data. Applied to `RETENTION_CLEANUP_DRY_RUN`,
`RETENTION_CLEANUP_ENABLED`, and `ENABLE_HEALTH_CHECK_WORKFLOW` (same
pattern).
## 2. `RETENTION_CLEANUP_STALE_RUNNING_DAYS` (default `0` = disabled,
current behavior unchanged)
The sweep unconditionally refuses RUNNING tasks. Tasks abandoned mid-run
(agent crashed, workflow never reached terminal state) therefore stay in
Mongo/Postgres **forever** — the sgp-dev test surfaced 121 such tasks on
a single agent (`test-v2`: cleaned 0, skipped 121). For a
retention-compliance feature that's a structural gap: the data most
likely to be forgotten is precisely the data the policy can never touch.
When set `> 0`, a RUNNING task with no interaction for at least that
many days (same last-interaction definition as the idle check:
`max(task.updated_at, latest message)`) is treated as abandoned and
becomes cleanable. Each override emits a WARNING forensic log. The
unprocessed-events correctness guard is unchanged. Recommended setting
for OneEdge/EY: `30`.
Wiring: env → `load_cleanup_config` → sweep → child workflow →
`clean_task` activity → use case → service. The new activity parameter
defaults to `0`, so in-flight workflows started before a deploy are
unaffected.
## Tests
- env parsing: case-insensitivity matrix, garbage → `ValueError`, new
var default/parse
- service: RUNNING refused by default; stale-RUNNING cleaned with
override; idle-but-not-stale-enough refused; recent Mongo message blocks
override; preview honors override without writes
- workflow: `stale_running_days` propagation sweep → child → activity;
existing fakes updated for the new activity arg
- `uv run --group test pytest tests/unit/...retention...` → **36
passed**; ruff + ruff-format clean
cc @stas
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR hardens retention cleanup configuration and adds an override for
abandoned RUNNING tasks. The main changes are:
- Strict parsing for boolean env vars using `true/false/1/0`
case-insensitively.
- New `RETENTION_CLEANUP_STALE_RUNNING_DAYS` setting for stale RUNNING
task cleanup.
- Retention service guard updates that compute last interaction once and
scope event-guard relaxation to stale RUNNING tasks.
- Temporal activity and workflow wiring for the new stale-running
setting.
- Unit coverage for env parsing, service guard behavior, and workflow
propagation.
<details><summary><h3>Confidence Score: 5/5</h3></summary>
The retention cleanup changes appear merge-safe with targeted coverage
across configuration parsing, service guard behavior, and workflow
propagation.
The implementation is narrowly scoped, preserves the default behavior
for stale RUNNING cleanup, fails closed on invalid boolean
configuration, and includes unit tests covering the key runtime paths.
</details>
<details><summary><h3><a href="https://www.greptile.com/trex"><img
alt="T-Rex"
src="https://greptile-static-assets.s3.amazonaws.com/trex/trex_green.svg"
height="20" align="absmiddle"></a> T-Rex Logs</h3></summary>
**What T-Rex did**
- An env boolean-parsing test was run to compare pre-change and
post-change behavior: before, boolean env vars were silently coerced to
false; after, 'True' and '1' are true, 'False' and '0' are false,
'garbage' raises ValueError, and stale\_running\_days is returned as 0
by default or 14 when overridden.
- A RUNNING cleanup workflow test was executed to verify
stale\_running\_days handling, including the default 0 and an explicit
9, ensuring that RUNNING is refused in certain cases, idle cleanup runs
with delete/update counters, zero writes are produced in dry-run or
recent-update scenarios, and that stale\_running\_days is included in
child/activity arguments.
<a
href="https://app.greptile.com/trex/runs/12291157/artifacts"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://greptile-static-assets.s3.amazonaws.com/badges/ViewAllArtifactsDark.svg?v=1"><source
media="(prefers-color-scheme: light)"
srcset="https://greptile-static-assets.s3.amazonaws.com/badges/ViewAllArtifacts.svg?v=1"><img
alt="View all artifacts"
src="https://greptile-static-assets.s3.amazonaws.com/badges/ViewAllArtifacts.svg?v=1"
height="32"></picture></a>
<sub><a href="https://www.greptile.com/trex"><img alt="T-Rex"
src="https://greptile-static-assets.s3.amazonaws.com/trex/trex_green.svg"
height="14" align="absmiddle"></a> Ran code and verified through
T-Rex</sub>
</details>
<sub>Reviews (5): Last reviewed commit: ["Merge branch 'main'
into
retention/stale..."](5c0abc4)
| [Re-trigger
Greptile](https://app.greptile.com/api/retrigger?id=37220328)</sub>
<!-- /greptile_comment -->1 parent b43e8c3 commit 5ad81e8
10 files changed
Lines changed: 507 additions & 42 deletions
File tree
- agentex
- src
- config
- domain
- services
- use_cases
- temporal
- activities
- workflows
- tests/unit
- config
- services
- temporal
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
| 180 | + | |
180 | 181 | | |
181 | 182 | | |
182 | 183 | | |
| |||
242 | 243 | | |
243 | 244 | | |
244 | 245 | | |
| 246 | + | |
245 | 247 | | |
246 | 248 | | |
247 | 249 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| 68 | + | |
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
| |||
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
79 | 104 | | |
80 | 105 | | |
81 | 106 | | |
| |||
128 | 153 | | |
129 | 154 | | |
130 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
131 | 160 | | |
132 | 161 | | |
133 | 162 | | |
| |||
210 | 239 | | |
211 | 240 | | |
212 | 241 | | |
213 | | - | |
214 | | - | |
215 | | - | |
| 242 | + | |
| 243 | + | |
216 | 244 | | |
217 | 245 | | |
218 | 246 | | |
219 | 247 | | |
220 | | - | |
221 | | - | |
| 248 | + | |
| 249 | + | |
222 | 250 | | |
223 | 251 | | |
224 | 252 | | |
| |||
239 | 267 | | |
240 | 268 | | |
241 | 269 | | |
242 | | - | |
243 | | - | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
244 | 275 | | |
245 | 276 | | |
246 | 277 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
| 180 | + | |
180 | 181 | | |
181 | 182 | | |
182 | 183 | | |
| |||
189 | 190 | | |
190 | 191 | | |
191 | 192 | | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
192 | 201 | | |
193 | 202 | | |
194 | | - | |
| 203 | + | |
| 204 | + | |
195 | 205 | | |
196 | 206 | | |
197 | 207 | | |
| |||
235 | 245 | | |
236 | 246 | | |
237 | 247 | | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
238 | 257 | | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
244 | 264 | | |
245 | 265 | | |
246 | 266 | | |
247 | 267 | | |
248 | 268 | | |
249 | | - | |
250 | | - | |
251 | | - | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
252 | 273 | | |
253 | 274 | | |
254 | 275 | | |
| |||
293 | 314 | | |
294 | 315 | | |
295 | 316 | | |
| 317 | + | |
296 | 318 | | |
297 | 319 | | |
298 | 320 | | |
| |||
306 | 328 | | |
307 | 329 | | |
308 | 330 | | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
314 | 343 | | |
315 | 344 | | |
316 | 345 | | |
317 | 346 | | |
318 | | - | |
319 | | - | |
320 | | - | |
321 | | - | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
322 | 350 | | |
323 | 351 | | |
324 | 352 | | |
| |||
452 | 480 | | |
453 | 481 | | |
454 | 482 | | |
455 | | - | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
456 | 534 | | |
457 | | - | |
| 535 | + | |
458 | 536 | | |
459 | 537 | | |
460 | | - | |
461 | | - | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
462 | 542 | | |
463 | | - | |
464 | 543 | | |
465 | 544 | | |
466 | 545 | | |
| |||
479 | 558 | | |
480 | 559 | | |
481 | 560 | | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
482 | 566 | | |
483 | 567 | | |
484 | | - | |
| 568 | + | |
485 | 569 | | |
486 | 570 | | |
487 | 571 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
49 | 50 | | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
54 | 59 | | |
55 | 60 | | |
56 | 61 | | |
57 | 62 | | |
58 | 63 | | |
| 64 | + | |
59 | 65 | | |
60 | 66 | | |
61 | 67 | | |
62 | 68 | | |
63 | 69 | | |
64 | 70 | | |
65 | 71 | | |
| 72 | + | |
66 | 73 | | |
67 | 74 | | |
68 | 75 | | |
| |||
72 | 79 | | |
73 | 80 | | |
74 | 81 | | |
| 82 | + | |
75 | 83 | | |
76 | 84 | | |
77 | 85 | | |
| |||
Lines changed: 16 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| |||
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
132 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
133 | 138 | | |
134 | 139 | | |
135 | 140 | | |
| |||
139 | 144 | | |
140 | 145 | | |
141 | 146 | | |
| 147 | + | |
| 148 | + | |
142 | 149 | | |
143 | 150 | | |
144 | 151 | | |
| |||
149 | 156 | | |
150 | 157 | | |
151 | 158 | | |
152 | | - | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
153 | 163 | | |
154 | 164 | | |
155 | 165 | | |
| |||
164 | 174 | | |
165 | 175 | | |
166 | 176 | | |
167 | | - | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
168 | 181 | | |
169 | 182 | | |
170 | 183 | | |
| |||
0 commit comments