Skip to content

Commit ccd1883

Browse files
author
ddx-checkpoint
committed
docs: add multi-worker try reliability test plan
1 parent a3c97e8 commit ccd1883

2 files changed

Lines changed: 162 additions & 1 deletion

File tree

docs/helix/01-frame/features/FEAT-012-git-awareness.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -413,7 +413,8 @@ agents and developers
413413
- Given an iteration is merge-eligible, when DDx prepares a fast-forward landing, then the only rebase performed is a rebase of the execution branch onto the latest target branch tip — `git log --merges` shows no merge commit; history remains linear.
414414
- Given an iteration is not merged (required execution failed, ratchet regression, or `--no-merge` set), when DDx preserves the iteration, then a ref matching `refs/ddx/iterations/<bead-id>/<timestamp>-<base-shortsha>` is created and the target branch is not updated.
415415
- Given `ddx try` left an orphan worktree due to a crash, when the next `ddx try` invocation starts, then DDx detects and removes orphaned worktrees matching the attempt path pattern before proceeding.
416-
- Given two `ddx try` invocations on the same bead run concurrently or in rapid succession from the same base, then each produces a distinct hidden ref because the `YYYYMMDDTHHMMSSZ-<12charsha>` combination is unique per invocation; DDx does not serialize or lock across concurrent invocations.
416+
- Given two `ddx try` invocations run concurrently or in rapid succession from the same base, then DDx may serialize only the short parent-repo mutation windows (tracker commit, dirty checkpoint, base resolution, attempt workspace creation, durable audit, landing, preserve-ref creation, and main-worktree index sync); the harness wait and agent work inside the isolated attempt worktree are never serialized by DDx's main-git lock.
417+
- Given two non-landed `ddx try` invocations preserve iterations from the same base within the same second, then each produces a distinct hidden ref because the `YYYYMMDDTHHMMSSZ-<12charsha>` combination is unique per invocation.
417418

418419
## Dependencies
419420

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
---
2+
ddx:
3+
id: TP-021
4+
depends_on:
5+
- FEAT-004
6+
- FEAT-006
7+
- FEAT-010
8+
- FEAT-012
9+
- API-001
10+
- TD-010
11+
---
12+
# Test Plan: Multi-Worker `ddx try` Reliability
13+
14+
## Scope
15+
16+
Validate that concurrent `ddx try` and `ddx work` executions in multiple
17+
worktrees make progress without turning local git or tracker coordination into
18+
the bottleneck. The suite is intentionally local-only: it uses fixture
19+
repositories, the deterministic `script` harness, local clone/worktree attempt
20+
backends, and subprocesses with isolated `HOME`/`XDG_DATA_HOME`. It must not
21+
depend on network access, external model providers, Docker image pulls, hosted
22+
git remotes, or developer-specific agent CLIs.
23+
24+
This plan covers host-local concurrency for one project root. Multi-machine
25+
coordination remains API-001/SD-020 scope; native toolchain cache contention
26+
remains project policy per TD-010.
27+
28+
## Contract
29+
30+
Concurrent workers may serialize short parent-repo mutation windows. They must
31+
not serialize the harness wait or the agent's work inside the attempt worktree.
32+
33+
The bounded mutation windows are:
34+
35+
1. pre-dispatch tracker commit, dirty-checkpoint, base resolution, and attempt
36+
workspace registration; slow clone/docker setup must not monopolize the
37+
main-git lock;
38+
2. durable audit and evidence publication writes;
39+
3. landing, preserve-ref creation, target ref update, and main-worktree index
40+
sync; post-land hooks and other arbitrary commands must not execute while the
41+
main-git lock is held;
42+
4. startup cleanup mutation of stale execution worktrees and worker state.
43+
44+
`index.lock` and `.ddx/.git-tracker.lock` hold times are performance contracts,
45+
not best-effort diagnostics. The default caps remain 10 s for `index.lock` and
46+
30 s for `.ddx/.git-tracker.lock`; fast tests should assert much smaller local
47+
budgets where the fixture is deterministic.
48+
49+
## Existing Coverage
50+
51+
- `cli/internal/integration/lock_contention_test.go` proves 5
52+
`ddx work --watch` workers and 20 operator bead commands can overlap without
53+
operator tracker-lock timeouts, and asserts p99 lock holds stay below the
54+
configured caps.
55+
- `cli/internal/agent/execute_bead_lock_scope_test.go` proves the git index lock
56+
and DDx tracker lock are not held across the harness subprocess wait.
57+
- `cli/internal/lockmetrics/lockcap_test.go` proves default caps, cap override,
58+
violation logging, and `lock-violation.json` evidence.
59+
- `cli/internal/agent/tracker_lock_test.go` covers main-git lock sharing across
60+
linked worktrees, stale-lock recovery, malformed lock diagnostics, and retry
61+
policy.
62+
- `cli/internal/bead/chaos_test.go` and `cli/internal/bead/store_test.go` cover
63+
JSONL tracker concurrent append, update, claim, and close invariants.
64+
65+
## Gaps And Required Tests
66+
67+
### Fast Chaos
68+
69+
- `TestChaos_PreDispatchMutationWindowDoesNotHoldLockAcrossHarnessWait`:
70+
instrument the pre-dispatch path with a script harness that sleeps after
71+
workspace preparation. Assert tracker-lock release occurs before the
72+
subprocess-running interval and that workspace creation is the last operation
73+
inside the lock.
74+
- `TestChaos_DurableAuditCommitUnderWorkerAndOperatorContention`: run several
75+
local-clone `ddx try` attempts that all publish evidence while concurrent
76+
`ddx bead create/update/close` commands run. Assert no tracker-lock timeout,
77+
no missing `prompt.md`/`manifest.json`/`result.json`, and no `index.lock`
78+
failures in worker output.
79+
- `TestChaos_StartupCleanupSkipsWhenAnotherWorkerOwnsCleanupLock`: start N
80+
`ddx work --once` processes against a fixture with stale worktree metadata.
81+
Assert exactly one cleanup pass mutates the stale worktree and the others emit
82+
`cleanup.skipped` without blocking claim.
83+
- `TestChaos_PostLandCommandDoesNotHoldMainGitLock`: use a blocking post-land
84+
command runner or local script, then assert another goroutine can acquire the
85+
main-git lock while the post-land command is blocked.
86+
- `TestChaos_AttemptPrepareDoesNotHoldMainGitLockForSlowClone`: inject a slow
87+
attempt backend or slow clone setup and assert another worker/operator can
88+
acquire `.ddx/.git-tracker.lock` before the slow preparation unblocks.
89+
90+
### Integration
91+
92+
- `TestIntegration_ConcurrentTryDistinctBeads_LocalClone`: seed 8 independent
93+
beads, run 4 concurrent `ddx try <id>` subprocesses with the `script` harness
94+
and `--attempt-backend local-clone`, and assert all attempts either land or
95+
preserve cleanly with unique attempt IDs, unique worktree/clone paths, and no
96+
lingering attempt directories after cleanup.
97+
- `TestIntegration_ConcurrentTrySameBead_OneClaimWins`: run 3 concurrent
98+
`ddx try <same-id>` subprocesses. Assert at most one attempt claims and runs;
99+
losing attempts exit through the existing not-claimable path without creating
100+
durable evidence bundles that look terminal.
101+
- `TestIntegration_ConcurrentTryPreserveRefsUnique`: force non-landed attempts
102+
with `--no-merge` or a failing gate and assert hidden refs under
103+
`refs/ddx/iterations/<bead-id>/` are unique when attempts start within the
104+
same second.
105+
106+
### Performance
107+
108+
- `TestPerformance_PreDispatchMutationWindowP95UnderBudget`: measure
109+
tracker-lock hold duration for pre-dispatch with a warm local fixture. Target
110+
p95 < 2 s and max < 5 s for linked-worktree; record local-clone separately
111+
because clone checkout may be filesystem-sensitive.
112+
- `TestPerformance_WorktreePrepareAndCleanupUnderBudget`: measure
113+
`git worktree add`/remove and local-clone prepare/cleanup for a small fixture.
114+
Target linked-worktree p95 < 2 s and cleanup p95 < 1 s; fail only on the
115+
deterministic fixture, not on large real repos.
116+
- `TestPerformance_LockMetricsScenarioRunsUnderWallClockBudget`: keep the
117+
multi-worker contention scenario usable in CI by asserting the one-shot
118+
scenario completes under a bounded wall clock, with `go test -short` still
119+
skipping the subprocess-heavy variant.
120+
- `TestPerformance_CheckoutSyncIndexRetryBudget`: exercise checkout sync under
121+
artificial `.git/index.lock` contention with a fakeable retry/backoff seam and
122+
assert the main-git lock hold stays below the deterministic budget.
123+
124+
### Static/Contract Guards
125+
126+
- `TestWorkerPathDoesNotUseFetchOriginAncestryCheck`: fail if `ddx try`,
127+
`ddx work`, or pre-claim worker paths wire `FetchOriginAncestryCheck` instead
128+
of the network-free local ancestry check.
129+
- `TestManagedTrackerPathListsStayInSync`: assert the durable-audit managed
130+
path list, pre-claim tracker metadata list, and staged-path exemption helper
131+
classify `.ddx/beads.jsonl`, `.ddx/beads-archive.jsonl`,
132+
`.ddx/metrics/attempts.jsonl`, and `.ddx/attachments/...` identically.
133+
- `TestWorkerFailurePathsReleaseClaimAtomically`: inject a heartbeat-removal
134+
failure and assert worker failure paths use `Release` or otherwise avoid a
135+
fresh sidecar lease that keeps an open bead invisible to `ReadyExecution`.
136+
137+
## Fixture Rules
138+
139+
- Use `testutils.BuildDDxBinary` for subprocess tests so spawned workers execute
140+
the code under test.
141+
- Use the `script` harness only; directive files may `sleep-ms`, create files,
142+
write no-change rationale, or commit.
143+
- Restrict subprocess environment to isolated `HOME`, isolated `XDG_DATA_HOME`,
144+
`GIT_CONFIG_SYSTEM=/dev/null`, `GIT_TERMINAL_PROMPT=0`, and a minimal `PATH`
145+
containing git and POSIX shell tools.
146+
- Prefer a small fixture repo with 5-10 beads. Large-repo performance belongs in
147+
an optional benchmark, not a required guard.
148+
- Record lock metrics from `.ddx/metrics/lock-events.jsonl`; assertions should
149+
use p95/p99 and max hold durations, plus explicit non-vacuity checks.
150+
151+
## Exit Criteria
152+
153+
- The fast chaos and performance tests run without network access and do not
154+
invoke external agent CLIs.
155+
- Concurrent worker tests prove no lock is held across harness wait, no operator
156+
bead command fails with tracker-lock timeout, and no terminal attempt evidence
157+
is missing required bundle files.
158+
- Worktree prepare/cleanup and pre-dispatch lock windows have numeric budgets
159+
that fail deterministically on the local fixture before they become user-facing
160+
multi-worker stalls.

0 commit comments

Comments
 (0)