You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Design-review follow-up to #235. Three hardenings to the atomic env merge:
1. In-tx teardown guard (closes a TOCTOU). The "stack is mid-teardown"
check was a pre-read in the handler (stack.Status == deleting before
MergeStackEnvVars) — racy: the teardown worker can flip status between
GetStackBySlug and the merge, committing an env change onto a deleting
stack. The SELECT now fetches status under the FOR UPDATE lock and
returns models.ErrStackDeleting (→ 409 stack_deleting); the handler
pre-read is removed so the in-tx check is authoritative. Same posture as
the resource-DELETE / SetTTL hardenings this session.
2. SET LOCAL lock_timeout = '3s' as the first tx statement, so a PATCH that
races a long-held row lock fails fast to a retryable 503 instead of
hanging the request goroutine indefinitely.
3. keys_set audit count now counts only actual upserts (v != ""). The old
`len(body.Env) - deletes` over-counted a no-op delete (empty value for an
absent key: not a delete, not a set), making the rule-12 audit lie.
Tests: TestMergeStackEnvVars_Branches rewritten for the new SQL (SET LOCAL
exec + status,env_vars SELECT) and adds the lock_timeout-error and
status='deleting'→ErrStackDeleting cases — MergeStackEnvVars stays 100%.
The existing real-DB deleting handler tests now exercise the in-tx path
deterministically. The two stack-env fault tests' failAfter bumped +1 for
the added in-tx SET LOCAL exec (which the fault harness counts). Verified
vs real Postgres+Redis; all changed lines covered.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0 commit comments