test: make flaky deadline retry tests robust to CI speed by pjb157 · Pull Request #277 · doublewordai/fusillade

pjb157 · 2026-05-29T09:32:18Z

Summary

Fixes the flaky fusillade deadline integration tests (test_deadline_aware_retry_stops_before_deadline, test_retry_stops_at_deadline_when_no_limits_set).

Root cause: they asserted the number of retries that fit in a fixed completion window, but that count is wall-clock dependent — backoff not_before, the can_retry deadline check, and the SQL claim filter all use chrono::Utc::now(). On a slow/loaded CI runner fewer retries fit, so a fixed range flakes (a recent run fit ~8 and tripped the (9..12) lower bound).

Fix: assert the invariant the deadline logic guarantees — independent of CPU speed — instead of a magic count range:

buffered (stop_before_deadline_ms = 500): the request stops before the batch expires — failed_at < batch_expires_at.
no buffer: retries run up to the deadline — failed_at is within one backoff step (≤200ms) of batch_expires_at (500ms slack for jitter).
both: retried ≥ 1 time, exactly one HTTP call per attempt (call_count == retry_count + 1), and the terminal failure carries an error reason.

tokio::time::pause() was considered but doesn't apply — the path uses chrono/SQL wall-clock (not tokio::time) and runs against real Postgres under #[sqlx::test]; true determinism would require injecting a mock clock into production + SQL, out of scope here.

Test-only; no library behavior change. Standalone, unrelated to #276.

Test plan

cargo test --test integration → 18/18 pass
Two deadline tests stable under single-core CPU starvation (8/8 runs, ~2x wall-time) — the new failed_at invariants hold under load
cargo fmt --check clean; cargo clippy -- -D warnings (CI's command) clean

🤖 Generated with Claude Code

The two deadline integration tests asserted a narrow retry-count range, but the count is wall-clock dependent: the backoff `not_before`, the deadline check in can_retry, and the SQL claim filter all use chrono::Utc::now(). On a slow/loaded CI runner each retry's DB + CPU overhead grows, so fewer retries fit in the fixed completion window; a recent run fit only ~8 and tripped the (9..12) lower bound. Re-anchor each assertion to the real invariant: a tight upper bound equal to the exponential-backoff ceiling (retries cannot run past the deadline) plus a generous lower bound that absorbs slow-runner slack. Also fixes the half-open (9..12) range that contradicted its own "9-12" comment. Test-only; no library behavior change. tokio paused-time was rejected -- the path uses chrono/SQL wall-clock, not tokio::time, and runs against real Postgres under #[sqlx::test]. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Loosens retry-count assertions in two deadline-related integration tests so they no longer flake under slow/loaded CI, by anchoring to the exponential-backoff ceiling (tight upper bound) and a generous lower bound, and fixes a half-open range that contradicted its own comment.

Changes:

test_deadline_aware_retry_stops_before_deadline: (7..=9) → (4..=8), with an updated explanatory comment.
test_retry_stops_at_deadline_when_no_limits_set: (9..12) → (5..=11), fixing the half-open range vs. "9-12" comment mismatch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

doubleword-code

Summary

This PR fixes flaky assertions in two deadline-aware retry integration tests by widening the acceptable retry-count ranges and correcting the underlying rationale. The changes are test-only and do not modify library behavior.

Verdict: Approved — the changes are correct, well-motivated, and address a real CI flakiness issue without masking bugs.

Research notes

I reviewed:

The can_retry logic in src/request/transitions.rs:272-333 which calculates not_before = now + backoff_duration and denies retries when not_before >= effective_deadline
The backoff calculation: backoff_ms * (backoff_factor ^ retry_attempt), capped at max_backoff_ms
Both test functions (test_deadline_aware_retry_stops_before_deadline and test_retry_stops_at_deadline_when_no_limits_set) to verify the timeline math

Key findings from the code:

Test 1 uses backoff_ms=50, backoff_factor=2, max_backoff_ms=200, effective deadline = 2000ms - 500ms buffer = 1500ms
Test 2 uses the same backoff config but no buffer, so effective deadline = 2000ms
With zero overhead, Test 1 can fit at most 8 retries (next would be at t=1550ms > 1500ms), Test 2 can fit at most 11 retries (next would be at t=2150ms > 2000ms)

Suggested next steps

No action required — this PR is ready to merge. Consider running the tests a few times on CI to confirm the flakiness is resolved.

General findings

None — the changes are narrowly scoped to test assertions and comments, and the rationale is sound.

The retry-count range assertions were wall-clock dependent: how many retries fit the fixed completion window depends on real DB/CPU cycle time, so any fixed range is CI-speed dependent (the original flake). Assert what the deadline logic actually guarantees, independent of speed: - buffered: stops before batch expiry (failed_at < batch_expires_at), exercising the 500ms stop_before_deadline_ms buffer; - no buffer: retries run up to the deadline (failed_at within one backoff step of batch_expires_at); - both: retried >= 1 time, one HTTP call per attempt, error reason set. Verified stable under single-core CPU starvation (8/8 runs). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

doubleword-code

Summary

This PR fixes flaky integration tests by replacing brittle retry-count assertions with semantic deadline invariants. The core insight is correct: retry counts are wall-clock dependent (how many DB/CPU cycles fit in the deadline window), so fixed ranges inevitably flake on slow CI.

The new assertions verify what the deadline logic actually guarantees:

Buffered case (stop_before_deadline_ms: Some(500)): Retries stop before batch expiry
No-buffer case: Retries run up to the deadline (within 500ms slack)

Both tests retain the call_count == retry_count + 1 invariant (one HTTP call per attempt) and error-reason checks.

Verdict: Approved — the changes are semantically correct and address the root cause of flakiness without weakening test coverage meaningfully.

Research notes

I reviewed the can_retry implementation in src/request/transitions.rs:272-333 to understand the deadline logic:

Effective deadline = batch_expires_at - stop_before_deadline_ms (or just batch_expires_at if no buffer)
Retry denied when not_before >= effective_deadline
not_before = Utc::now() + backoff_duration

The daemon flow in src/daemon/mod.rs:1064-1143 shows that failed_at is set when each HTTP attempt fails, and the terminal Failed state preserves the last attempt's timestamp.

Suggested next steps

Merge as-is — the assertions are correct and will eliminate the flakiness.
Optional polish: Consider strengthening the buffered-case assertion slightly (see inline comment on line 870).

General findings

None — all issues are anchored to specific changed lines below.

doubleword-code · 2026-05-29T10:30:40Z


-        // 2. Verify HTTP call count matches retry attempts (1 initial + N retries)
+        // The 500ms buffer must stop the request *before* the batch expires.
+        assert!(


Non-blocking: Consider strengthening this assertion to more directly verify the buffer was effective.

Why it matters: The current assertion failed.state.failed_at < failed.state.batch_expires_at only verifies we didn't pass expiry. Combined with retry_count >= 1, it provides meaningful coverage, but doesn't strongly verify the 500ms buffer specifically. A request that failed after 1 retry at t=100ms would pass, even though the buffer should have allowed retries until ~t=1500ms.

A stronger invariant would be: failed.state.failed_at <= failed.state.batch_expires_at - chrono::Duration::milliseconds(400) (allowing 100ms slack for the final attempt's execution time). This would verify we actually stopped due to the buffer, not some other factor.

That said, the current assertion is correct and does distinguish the buffered behavior from the no-buffer case when read alongside the second test. The trade-off between assertion strength and flakiness risk is reasonable here.

Suggested fix: If you want stronger verification without reintroducing flakiness, consider:

assert!( failed.state.failed_at <= failed.state.batch_expires_at - chrono::Duration::milliseconds(400), "buffered retry should stop ~500ms before batch expiry: failed_at={}, expires_at={}", failed.state.failed_at, failed.state.batch_expires_at );

Or leave as-is if the current assertion proves stable in CI.

doubleword-code · 2026-05-29T10:30:40Z

        );

-        // 2. Verify HTTP call count matches retry attempts (1 initial + N retries)
+        // With no buffer, retries run right up to the deadline: the final attempt


Nit: Minor clarification on the comment.

Why it matters: The comment says "fails within one backoff step (≤200ms) of batch expiry" but the assertion uses 500ms slack. This is intentional (as noted: "500ms slack absorbs scheduling jitter"), but could be slightly clearer about why the slack is larger than the backoff step.

Suggested fix: Consider rewording to:

// With no buffer, retries run right up to the deadline: the final attempt // fails near batch expiry (the max backoff step is 200ms, but we use 500ms // slack to absorb scheduling jitter and DB overhead on slow CI).

Copilot AI review requested due to automatic review settings May 29, 2026 09:32

Copilot started reviewing on behalf of pjb157 May 29, 2026 09:32 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

doubleword-code Bot reviewed May 29, 2026

View reviewed changes

pjb157 changed the title ~~test: make deadline retry-count assertions robust to slow CI~~ test: make flaky deadline retry tests robust to CI speed May 29, 2026

doubleword-code Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: make flaky deadline retry tests robust to CI speed#277

test: make flaky deadline retry tests robust to CI speed#277
pjb157 wants to merge 2 commits into
mainfrom
fix/deadline-retry-test-flakiness

pjb157 commented May 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

doubleword-code Bot left a comment

Uh oh!

doubleword-code Bot left a comment

Uh oh!

doubleword-code Bot May 29, 2026

Uh oh!

doubleword-code Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pjb157 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

doubleword-code Bot left a comment

Choose a reason for hiding this comment

Summary

Research notes

Suggested next steps

General findings

Uh oh!

doubleword-code Bot left a comment

Choose a reason for hiding this comment

Summary

Research notes

Suggested next steps

General findings

Uh oh!

doubleword-code Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

doubleword-code Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pjb157 commented May 29, 2026 •

edited

Loading