|
| 1 | +--- |
| 2 | +ddx: |
| 3 | + id: TP-021 |
| 4 | + depends_on: |
| 5 | + - FEAT-004 |
| 6 | + - FEAT-006 |
| 7 | + - FEAT-010 |
| 8 | + - FEAT-012 |
| 9 | + - API-001 |
| 10 | + - TD-010 |
| 11 | +--- |
| 12 | +# Test Plan: Multi-Worker `ddx try` Reliability |
| 13 | + |
| 14 | +## Scope |
| 15 | + |
| 16 | +Validate that concurrent `ddx try` and `ddx work` executions in multiple |
| 17 | +worktrees make progress without turning local git or tracker coordination into |
| 18 | +the bottleneck. The suite is intentionally local-only: it uses fixture |
| 19 | +repositories, the deterministic `script` harness, local clone/worktree attempt |
| 20 | +backends, and subprocesses with isolated `HOME`/`XDG_DATA_HOME`. It must not |
| 21 | +depend on network access, external model providers, Docker image pulls, hosted |
| 22 | +git remotes, or developer-specific agent CLIs. |
| 23 | + |
| 24 | +This plan covers host-local concurrency for one project root. Multi-machine |
| 25 | +coordination remains API-001/SD-020 scope; native toolchain cache contention |
| 26 | +remains project policy per TD-010. |
| 27 | + |
| 28 | +## Contract |
| 29 | + |
| 30 | +Concurrent workers may serialize short parent-repo mutation windows. They must |
| 31 | +not serialize the harness wait or the agent's work inside the attempt worktree. |
| 32 | + |
| 33 | +The bounded mutation windows are: |
| 34 | + |
| 35 | +1. pre-dispatch tracker commit, dirty-checkpoint, base resolution, and attempt |
| 36 | + workspace registration; slow clone/docker setup must not monopolize the |
| 37 | + main-git lock; |
| 38 | +2. durable audit and evidence publication writes; |
| 39 | +3. landing, preserve-ref creation, target ref update, and main-worktree index |
| 40 | + sync; post-land hooks and other arbitrary commands must not execute while the |
| 41 | + main-git lock is held; |
| 42 | +4. startup cleanup mutation of stale execution worktrees and worker state. |
| 43 | + |
| 44 | +`index.lock` and `.ddx/.git-tracker.lock` hold times are performance contracts, |
| 45 | +not best-effort diagnostics. The default caps remain 10 s for `index.lock` and |
| 46 | +30 s for `.ddx/.git-tracker.lock`; fast tests should assert much smaller local |
| 47 | +budgets where the fixture is deterministic. |
| 48 | + |
| 49 | +## Existing Coverage |
| 50 | + |
| 51 | +- `cli/internal/integration/lock_contention_test.go` proves 5 |
| 52 | + `ddx work --watch` workers and 20 operator bead commands can overlap without |
| 53 | + operator tracker-lock timeouts, and asserts p99 lock holds stay below the |
| 54 | + configured caps. |
| 55 | +- `cli/internal/agent/execute_bead_lock_scope_test.go` proves the git index lock |
| 56 | + and DDx tracker lock are not held across the harness subprocess wait. |
| 57 | +- `cli/internal/lockmetrics/lockcap_test.go` proves default caps, cap override, |
| 58 | + violation logging, and `lock-violation.json` evidence. |
| 59 | +- `cli/internal/agent/tracker_lock_test.go` covers main-git lock sharing across |
| 60 | + linked worktrees, stale-lock recovery, malformed lock diagnostics, and retry |
| 61 | + policy. |
| 62 | +- `cli/internal/bead/chaos_test.go` and `cli/internal/bead/store_test.go` cover |
| 63 | + JSONL tracker concurrent append, update, claim, and close invariants. |
| 64 | + |
| 65 | +## Gaps And Required Tests |
| 66 | + |
| 67 | +### Fast Chaos |
| 68 | + |
| 69 | +- `TestChaos_PreDispatchMutationWindowDoesNotHoldLockAcrossHarnessWait`: |
| 70 | + instrument the pre-dispatch path with a script harness that sleeps after |
| 71 | + workspace preparation. Assert tracker-lock release occurs before the |
| 72 | + subprocess-running interval and that workspace creation is the last operation |
| 73 | + inside the lock. |
| 74 | +- `TestChaos_DurableAuditCommitUnderWorkerAndOperatorContention`: run several |
| 75 | + local-clone `ddx try` attempts that all publish evidence while concurrent |
| 76 | + `ddx bead create/update/close` commands run. Assert no tracker-lock timeout, |
| 77 | + no missing `prompt.md`/`manifest.json`/`result.json`, and no `index.lock` |
| 78 | + failures in worker output. |
| 79 | +- `TestChaos_StartupCleanupSkipsWhenAnotherWorkerOwnsCleanupLock`: start N |
| 80 | + `ddx work --once` processes against a fixture with stale worktree metadata. |
| 81 | + Assert exactly one cleanup pass mutates the stale worktree and the others emit |
| 82 | + `cleanup.skipped` without blocking claim. |
| 83 | +- `TestChaos_PostLandCommandDoesNotHoldMainGitLock`: use a blocking post-land |
| 84 | + command runner or local script, then assert another goroutine can acquire the |
| 85 | + main-git lock while the post-land command is blocked. |
| 86 | +- `TestChaos_AttemptPrepareDoesNotHoldMainGitLockForSlowClone`: inject a slow |
| 87 | + attempt backend or slow clone setup and assert another worker/operator can |
| 88 | + acquire `.ddx/.git-tracker.lock` before the slow preparation unblocks. |
| 89 | + |
| 90 | +### Integration |
| 91 | + |
| 92 | +- `TestIntegration_ConcurrentTryDistinctBeads_LocalClone`: seed 8 independent |
| 93 | + beads, run 4 concurrent `ddx try <id>` subprocesses with the `script` harness |
| 94 | + and `--attempt-backend local-clone`, and assert all attempts either land or |
| 95 | + preserve cleanly with unique attempt IDs, unique worktree/clone paths, and no |
| 96 | + lingering attempt directories after cleanup. |
| 97 | +- `TestIntegration_ConcurrentTrySameBead_OneClaimWins`: run 3 concurrent |
| 98 | + `ddx try <same-id>` subprocesses. Assert at most one attempt claims and runs; |
| 99 | + losing attempts exit through the existing not-claimable path without creating |
| 100 | + durable evidence bundles that look terminal. |
| 101 | +- `TestIntegration_ConcurrentTryPreserveRefsUnique`: force non-landed attempts |
| 102 | + with `--no-merge` or a failing gate and assert hidden refs under |
| 103 | + `refs/ddx/iterations/<bead-id>/` are unique when attempts start within the |
| 104 | + same second. |
| 105 | + |
| 106 | +### Performance |
| 107 | + |
| 108 | +- `TestPerformance_PreDispatchMutationWindowP95UnderBudget`: measure |
| 109 | + tracker-lock hold duration for pre-dispatch with a warm local fixture. Target |
| 110 | + p95 < 2 s and max < 5 s for linked-worktree; record local-clone separately |
| 111 | + because clone checkout may be filesystem-sensitive. |
| 112 | +- `TestPerformance_WorktreePrepareAndCleanupUnderBudget`: measure |
| 113 | + `git worktree add`/remove and local-clone prepare/cleanup for a small fixture. |
| 114 | + Target linked-worktree p95 < 2 s and cleanup p95 < 1 s; fail only on the |
| 115 | + deterministic fixture, not on large real repos. |
| 116 | +- `TestPerformance_LockMetricsScenarioRunsUnderWallClockBudget`: keep the |
| 117 | + multi-worker contention scenario usable in CI by asserting the one-shot |
| 118 | + scenario completes under a bounded wall clock, with `go test -short` still |
| 119 | + skipping the subprocess-heavy variant. |
| 120 | +- `TestPerformance_CheckoutSyncIndexRetryBudget`: exercise checkout sync under |
| 121 | + artificial `.git/index.lock` contention with a fakeable retry/backoff seam and |
| 122 | + assert the main-git lock hold stays below the deterministic budget. |
| 123 | + |
| 124 | +### Static/Contract Guards |
| 125 | + |
| 126 | +- `TestWorkerPathDoesNotUseFetchOriginAncestryCheck`: fail if `ddx try`, |
| 127 | + `ddx work`, or pre-claim worker paths wire `FetchOriginAncestryCheck` instead |
| 128 | + of the network-free local ancestry check. |
| 129 | +- `TestManagedTrackerPathListsStayInSync`: assert the durable-audit managed |
| 130 | + path list, pre-claim tracker metadata list, and staged-path exemption helper |
| 131 | + classify `.ddx/beads.jsonl`, `.ddx/beads-archive.jsonl`, |
| 132 | + `.ddx/metrics/attempts.jsonl`, and `.ddx/attachments/...` identically. |
| 133 | +- `TestWorkerFailurePathsReleaseClaimAtomically`: inject a heartbeat-removal |
| 134 | + failure and assert worker failure paths use `Release` or otherwise avoid a |
| 135 | + fresh sidecar lease that keeps an open bead invisible to `ReadyExecution`. |
| 136 | + |
| 137 | +## Fixture Rules |
| 138 | + |
| 139 | +- Use `testutils.BuildDDxBinary` for subprocess tests so spawned workers execute |
| 140 | + the code under test. |
| 141 | +- Use the `script` harness only; directive files may `sleep-ms`, create files, |
| 142 | + write no-change rationale, or commit. |
| 143 | +- Restrict subprocess environment to isolated `HOME`, isolated `XDG_DATA_HOME`, |
| 144 | + `GIT_CONFIG_SYSTEM=/dev/null`, `GIT_TERMINAL_PROMPT=0`, and a minimal `PATH` |
| 145 | + containing git and POSIX shell tools. |
| 146 | +- Prefer a small fixture repo with 5-10 beads. Large-repo performance belongs in |
| 147 | + an optional benchmark, not a required guard. |
| 148 | +- Record lock metrics from `.ddx/metrics/lock-events.jsonl`; assertions should |
| 149 | + use p95/p99 and max hold durations, plus explicit non-vacuity checks. |
| 150 | + |
| 151 | +## Exit Criteria |
| 152 | + |
| 153 | +- The fast chaos and performance tests run without network access and do not |
| 154 | + invoke external agent CLIs. |
| 155 | +- Concurrent worker tests prove no lock is held across harness wait, no operator |
| 156 | + bead command fails with tracker-lock timeout, and no terminal attempt evidence |
| 157 | + is missing required bundle files. |
| 158 | +- Worktree prepare/cleanup and pre-dispatch lock windows have numeric budgets |
| 159 | + that fail deterministically on the local fixture before they become user-facing |
| 160 | + multi-worker stalls. |
0 commit comments