Skip to content

Commit 04a65a3

Browse files
scottschreckengaustclaudekrokoko
authored
docs: ADR-005 through ADR-013 (feedback, flags, knowledge, DoD, security, recovery, conflicts, knowledge stack, validation pyramid) (aws-samples#144)
* docs: ADR-005 feedback loop, ADR-008 definition of done, ADR-009 security posture ADR-005: PR review feedback propagates upstream to issues and ADRs. Classification (nit/bug/design/architecture), pause-assess-propagate- resolve-resume protocol, stacked PR chain recovery. ADR-008: Progressive definition of done (Level 1-4). Default levels per issue type. Verification responsibility scales with risk. ADR-009: Development-time agent security. Role separation (planner/ implementor/reviewer/admin), blast radius classification, 2P review for high-risk changes, no self-approval. Refs aws-samples#136, aws-samples#139, aws-samples#140 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: ADR-006 feature flags, ADR-007 knowledge acquisition, ADR-010 error recovery, ADR-011 conflict resolution ADR-006: Feature flags for concurrent development. When to use, lifecycle (proposed→introduced→active→verified→permanent), ownership, maximum lifetime enforcement. ADR-007: Knowledge acquisition through progressive failure. Zero-context execution attempts, failure capture protocol, maturity model (L0-L3), self-improvement loop. ADR-010: Error recovery and rollback. Decision tree (revert vs fix- forward), stacked PR chain recovery, things agents must never do. ADR-011: Conflict resolution. Escalation ladder (4 levels), decision criteria, merge conflict ownership, human vs agent disagreements. Refs aws-samples#137, aws-samples#138, aws-samples#141, aws-samples#142 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(ADR-003): GraphQL dependency graph as authoritative, assignments rework - Predecessor validation now uses GitHub GraphQL blockedBy/subIssues as the machine-enforceable source of truth (hard gate) - Rename "Self-assignment" to "Assignments" — supports self-assignment, directed assignment, and priority-based pickup - Add dependency graph maintenance rules (addBlockedBy, addSubIssue) - Sync rule: graph is authoritative; prose explains rationale - Folds comment feedback from issue aws-samples#134 discussion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(AGENTS.md): add governance directive — issue required before implementation Adds two bullets to top of "Common mistakes": 1. Conversational approval ≠ governance approval — create issue first 2. Branch naming must include issue number Implements the "AGENTS.md directive" row from ADR-003's enforcement mechanisms table. Fixes aws-samples#150 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: recover ADR-012, ADR-013, and ADR-003 enforcement (lost in rebase) ADR-012: Operational knowledge stack (Decision → Guide → Skill) — three-layer pattern for executable governance. Documents observed failure mode where prose alone was insufficient. ADR-013: Tiered validation pyramid — addresses the missing Tier 2 (local sandbox) gap that causes agents to waste cycles on slow remote CI feedback loops. ADR-003: Adds "No branches without an issue" rule, "Conversational approval is NOT issue approval" section, and enforcement mechanisms table. These were lost during the rebase that consolidated the branch after PR aws-samples#143 merged. Recovered from orphaned commit 8bb9407. Refs aws-samples#148 Refs aws-samples#149 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address PR aws-samples#144 review feedback (ADR-006, -007, -009) ADR-006: Add lifecycle metadata table (merge date, max lifetime, expected removal) and mechanism constraint (synth-time vs runtime). ADR-007: Add retrospective concept as first-class knowledge artifact. ADR-009: Expand role table (Planner adds vision, Reviewer adds suggest code, Implementor clarifies no CI/security config) and add "not the last committer" constraint to no-self-approval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address reviewer feedback — status, decoupling, transition clause - Change ADR-005 through ADR-011 status from "accepted" to "proposed" (ADR-012/013 were already "proposed") - ADR-003 enforcement table: add Status column (Implemented/Planned), rename section to "Enforcement mechanisms (planned)", add transition clause for branch naming rules - ADR-012→013: make reference informational (013 depends on 012, not bidirectional) - ADR-013→012: mark as prerequisite dependency - ADR-003 references: mark 012/013 as "(proposed)" Addresses review from krokoko: critical items 1+2 (aspirational content labeled, transition clause added) and important item 4 (circular dependency broken with clear direction: 013 depends on 012). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: bgagent <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
1 parent ec6209c commit 04a65a3

21 files changed

Lines changed: 2124 additions & 6 deletions

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.
3939

4040
### Common mistakes
4141

42+
- **Starting implementation without an approved GitHub issue** — Conversational approval ("yes, do it", "go ahead", "start with X") is NOT governance approval. The correct sequence is: create a GitHub issue with acceptance criteria → get the `approved` label from an admin → self-assign → comment "Starting implementation" → then begin work. Even if the user explicitly directs the work in conversation, create the durable artifact (issue) first. See [ADR-003](./docs/decisions/003-contribution-governance.md).
43+
- **Creating branches without an issue reference** — Branch names must follow the pattern `(feat|fix|chore|docs)/<issue-number>-short-description`. A branch without an issue number is unauthorized work. Example: `feat/148-operational-knowledge-stack`.
4244
- Editing **`docs/src/content/docs/`** instead of **`docs/guides/`** or **`docs/design/`** — content is generated; sync from sources.
4345
- Adding or editing files in **`docs/design/`** or **`docs/guides/`** without running **`cd docs && node scripts/sync-starlight.mjs`** — CI will reject ("Fail build on mutation") because the Starlight mirror files in `docs/src/content/docs/` are stale. Always commit the regenerated mirrors alongside source changes.
4446
- Changing **`cdk/.../types.ts`** without updating **`cli/src/types.ts`** — CLI and API drift.

docs/decisions/003-contribution-governance.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ The rules below define how any contributor — human or AI — picks up, owns, a
1111

1212
## Decision
1313

14+
### No branches without an Issue
15+
16+
Every feature branch references an issue in its name (e.g., `feat/123-short-description` or `fix/456-bug-name`). A branch without an issue reference is unauthorized work. This prevents the failure mode where work is started "just to explore" and then snowballs into a PR without governance.
17+
1418
### No PRs without an Issue
1519

1620
Every PR references an issue. The issue provides rationale, sufficient context for the solution to be obvious, and verifiable acceptance criteria.
@@ -27,9 +31,9 @@ Issues align to the [product roadmap](https://github.com/aws-samples/sample-auto
2731

2832
Only permitted users can mark an issue `approved` — a GitHub Actions workflow validates that the label applicant is authorized. An issue is not workable until it is both approved and assigned. After approval, the issue is considered scope-frozen: further revisions that change deliverables require re-approval.
2933

30-
### Self-assignment on start
34+
### Assignments
3135

32-
Unassigned means available. On starting work, self-assign. Multiple assignees (>1) require intentionality verification.
36+
Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification.
3337

3438
### Issue body as primary directive
3539

@@ -47,10 +51,16 @@ Before implementation, the assigned contributor must:
4751

4852
**Priority evaluation:** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?"
4953

50-
**Predecessor validation:** If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
54+
**Predecessor validation (GraphQL dependency graph is authoritative):**
55+
- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate)
56+
- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight
57+
- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale
58+
- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
5159

5260
**Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph. Forward-look into downstream actions to ensure alignment.
5361

62+
**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically).
63+
5464
**Final gate:** If all checks pass, comment "Starting implementation."
5565

5666
### Identity and attribution
@@ -65,20 +75,55 @@ Provide progress signals at checkpoints. If blocked or abandoning, comment and u
6575

6676
CI passes before requesting review. After merge, verify acceptance criteria and close. Create follow-up issues for discovered work before closing.
6777

78+
### Conversational approval is NOT issue approval
79+
80+
A user saying "yes, do it" or "go ahead" in a conversation does NOT satisfy the governance gate. The correct response to conversational approval is:
81+
82+
1. Create an issue with acceptance criteria
83+
2. Request the `approved` label from an admin
84+
3. Self-assign once approved
85+
4. Then begin implementation
86+
87+
**Known failure mode:** Agents interpret conversational momentum ("Yes start with X") as authorization to skip issue creation. This is the most common governance bypass — it feels like permission because the user explicitly directed the work, but the governance requires a *durable, reviewable artifact* (the issue), not a transient conversation.
88+
89+
**Why this matters:** Conversations are ephemeral. Issues are auditable. If an agent creates work based on a conversation and that conversation is lost (context compaction, session end), no record exists of what was authorized, what the acceptance criteria were, or why the work was started.
90+
91+
### Enforcement mechanisms (planned)
92+
93+
Prose governance is necessary but insufficient. The following enforcement points are planned to prevent bypass progressively. Mechanisms are deployed incrementally — see #186 for implementation tracking.
94+
95+
| Mechanism | Layer | What it catches | Status |
96+
|-----------|-------|-----------------|--------|
97+
| AGENTS.md directive | Agent prompt | Explicit instruction: "Do NOT begin implementation without an approved issue, even if the user says 'go ahead' in conversation" | Implemented |
98+
| Branch name convention | Git workflow | Branch must match `(feat|fix|chore|docs)/<issue-number>-*` — rejects branches without issue reference | Planned |
99+
| Commit-msg hook (Tier 0) | Pre-commit | Rejects commits without `Refs #N` or `Fixes #N` | Planned |
100+
| Pre-push hook (Tier 1) | Pre-push | Validates referenced issue exists and has `approved` label via `gh` API | Planned |
101+
| Claude Code hook (`PreToolUse: Write`) | Agent runtime | Blocks file creation in governed paths without declared issue context | Planned |
102+
| Skill gate: `pickup-issue` | Agent workflow | Agent must invoke before implementation — hard-fails without valid issue | Planned |
103+
104+
**Transition:** Branch naming and commit-msg rules apply to branches created after the corresponding hooks are deployed. Existing branches (including this PR's) pre-date enforcement.
105+
106+
**Progressive enforcement:** Start with the commit-msg hook (cheapest, catches all contributors). Add pre-push validation next. Skill gates enforce at the agent-workflow level (see ADR-012, proposed, for the skill model).
107+
68108
## Consequences
69109

70110
- (+) Prevents duplicate effort — assignment signals ownership
71111
- (+) Prevents priority inversion — agents challenge low-priority requests
72112
- (+) Prevents rework — predecessor validation catches out-of-order work
73113
- (+) Issue body stays current — threads are folded back
74114
- (+) Cross-reference audit catches duplicates early
115+
- (+) Enforcement mechanisms catch bypass at multiple points
75116
- (-) Pre-start overhead for small tasks
76117
- (-) Requires discipline to fold threads into body
118+
- (-) Commit-msg hook adds friction for rapid iteration on approved work
77119
- (!) Assumes priority labels exist and are maintained
78120
- (!) Self-assignment is not atomic — concurrent agents may race; mitigate by verifying assignment after claiming via refresh
121+
- (!) Conversational approval bypass is the most common failure — enforcement must be structural, not behavioral
79122

80123
## References
81124

82125
- Issue #134 — full RFC with open questions and automation requirements
83126
- Roadmap: Scale and collaboration (Agent swarm, Multi-user and teams)
84127
- ADR-001 — delivery methodology referenced by completion rules
128+
- ADR-012 (proposed) — operational knowledge stack; planned enforcement via skill gates
129+
- ADR-013 (proposed) — tiered validation; planned enforcement hooks at Tier 0 and Tier 1
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# ADR-005: Feedback loop — PR reviews propagate to issues and ADRs
2+
3+
**Status:** proposed
4+
**Date:** 2026-05-19
5+
6+
## Context
7+
8+
PR review comments are addressed locally (fix the code) but systemic issues they reveal are not propagated upstream. A reviewer says "this approach is wrong" but the issue still says "use this approach." ADRs are treated as immutable when they should be living decisions that evolve with implementation experience.
9+
10+
Without a feedback protocol, review insights are lost, issue bodies rot, and architectural mistakes persist across stacked PR chains.
11+
12+
## Decision
13+
14+
### Review comment classification
15+
16+
| Type | Action | Propagates to |
17+
|------|--------|---------------|
18+
| Nit (style, naming) | Fix in PR | Nothing |
19+
| Bug (logic error) | Fix in PR | Nothing (unless systemic) |
20+
| Design concern | Pause PR; evaluate | Issue body |
21+
| Architecture challenge | Pause PR; escalate | ADR (supersede? amend?) |
22+
| Scope question | Clarify | Issue body |
23+
| Blocker (won't approve as-is) | Pause PR | Issue body |
24+
25+
### Upstream propagation
26+
27+
When a review surfaces a design concern or architecture challenge:
28+
29+
1. **Pause** — Do not force-merge. Do not continue stacked PRs above this one.
30+
2. **Assess** — Does this invalidate the issue's approach? The ADR's decision?
31+
3. **Propagate** — Update the relevant upstream document (issue body, ADR, stacked PR dependents).
32+
4. **Resolve** — Revise the approach, defend with evidence, or cancel the work.
33+
5. **Resume** — Once resolved, unblock the PR and dependents.
34+
35+
### ADR evolution
36+
37+
| Trigger | Response |
38+
|---------|----------|
39+
| Implementation reveals the decision doesn't work | New RFC proposing a successor ADR |
40+
| Reviewer challenges the architectural premise | `**UNRESOLVED**` on the issue; pause |
41+
| New information makes the decision obsolete | Successor ADR with `Supersedes: ADR-NNN` |
42+
| Decision works but needs refinement | Amend via PR (minor, no new ADR) |
43+
44+
Never silently ignore a challenged decision.
45+
46+
### Stacked PR chain revision
47+
48+
When feedback on PR N invalidates PRs N+1 through N+M:
49+
1. Comment on all affected PRs
50+
2. Do not rebase dependent PRs until the base is stable
51+
3. If architectural: re-evaluate whether the remaining stack is valid
52+
4. If redesign needed: close dependent PRs, revise issue, re-plan
53+
54+
## Consequences
55+
56+
- (+) Review insights propagate to architectural decisions
57+
- (+) Issue bodies stay current with implementation learnings
58+
- (+) ADRs evolve rather than silently becoming outdated
59+
- (+) Stacked PR chains have a defined recovery protocol
60+
- (-) Adds process overhead to reviews (classification step)
61+
- (-) Pausing stacked chains delays delivery
62+
- (!) Requires discipline to actually propagate feedback upstream
63+
64+
## References
65+
66+
- Issue #136 — full RFC with open questions
67+
- ADR-003 — governance (issue body as source of truth)
68+
- ADR-001 — stacked PRs (chain revision protocol)
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# ADR-006: Feature flags for concurrent development
2+
3+
**Status:** proposed
4+
**Date:** 2026-05-19
5+
6+
## Context
7+
8+
Multiple agents working on related features in the same area must serialize — one waits for the other to merge. Incomplete features either block the main branch or require long-lived branches that diverge. SRE needs kill switches without reverting commits.
9+
10+
Feature flags enable trunk-based development where incomplete work merges safely behind toggles, and concurrent contributors avoid blocking each other.
11+
12+
## Decision
13+
14+
### When to use flags
15+
16+
| Situation | Use a flag? |
17+
|-----------|-------------|
18+
| Feature spans multiple PRs, incomplete state is unsafe | Yes |
19+
| Two contributors touch the same module for different purposes | Yes |
20+
| SRE needs a kill switch for a new capability | Yes |
21+
| Simple refactor with no behavioral change | No |
22+
| Bug fix | No |
23+
| One-PR feature, complete on merge | No |
24+
25+
### Flag ownership
26+
27+
- Every flag has an owner (the issue that introduced it)
28+
- Every flag has an expiration (the issue/PR that removes it)
29+
- Flags without a removal plan are rejected in review
30+
31+
### Separation of concerns
32+
33+
- **Planners** decide which features get flags (issue/RFC level)
34+
- **Implementors** add/use flags in code (PR level)
35+
- **SRE/operators** toggle flags in production (runtime level)
36+
- **No self-approval** — the person who introduces a flag cannot approve its removal
37+
38+
### Flag lifecycle
39+
40+
1. **Proposed** — issue identifies the need for a flag
41+
2. **Introduced** — PR adds the flag (default: off)
42+
3. **Active** — feature behind flag is in development
43+
4. **Verified** — feature complete, flag toggled on in testing
44+
5. **Permanent** — flag removed, feature is always-on (or removed entirely)
45+
46+
### Lifecycle metadata
47+
48+
Each flag must track:
49+
50+
| Field | Required | Source |
51+
|-------|----------|--------|
52+
| Flag name | Yes | Code constant |
53+
| Purpose / linked issue | Yes | Issue reference |
54+
| First merge date | Yes | Auto from git log |
55+
| Max lifetime | Yes | Declared at creation (default: 4 weeks) |
56+
| Expected removal date | Yes | first_merge + max_lifetime |
57+
| Actual removal date || Auto when flag deleted |
58+
| Days active || Computed |
59+
60+
### Maximum lifetime
61+
62+
Flags must be removed within the declared max lifetime (default: 4 weeks) of the feature being verified. The max lifetime can be overridden per-flag with justification in the issue. Stale flags are treated as technical debt and surfaced in periodic reviews.
63+
64+
### Mechanism constraint
65+
66+
Flags MUST be resolvable at synth time for infrastructure flags and at runtime for behavior flags. The specific storage mechanism (CDK context, DynamoDB, SSM Parameter Store, env vars) is context-dependent and follows from this split — it is not prescribed by this ADR.
67+
68+
## Consequences
69+
70+
- (+) Concurrent work proceeds without blocking
71+
- (+) Trunk-based development: main stays deployable
72+
- (+) SRE can disable features without code changes
73+
- (+) Partial features merge safely
74+
- (-) Flag management overhead
75+
- (-) Combinatorial testing complexity if many flags exist simultaneously
76+
- (!) Maximum lifetime must be enforced or flags accumulate indefinitely
77+
78+
## References
79+
80+
- Issue #137 — full RFC with open questions on mechanism (CDK context vs. DynamoDB vs. env vars)
81+
- ADR-003 — governance (flag introduction requires approval)
82+
- ADR-005 — feedback loop (reviewer may flag-gate a feature during review)
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# ADR-007: Knowledge acquisition through progressive failure
2+
3+
**Status:** proposed
4+
**Date:** 2026-05-19
5+
6+
## Context
7+
8+
Agents with fresh context (tabula rasa) attempt to follow documentation and hit gaps they cannot resolve. These gaps are silently worked around (agent asks a human) rather than systematically fixed. The system cannot self-improve its onboarding because failures are not captured.
9+
10+
Knowledge acquisition starts from zero. Each iteration creates the roadmap to better knowledge by discovering gaps through actual failures.
11+
12+
## Decision
13+
14+
### Zero-context execution attempts
15+
16+
Periodically, an agent with no project memory attempts to follow guides end-to-end. The agent follows ONLY what is written — no inference, no training data knowledge, no asking colleagues.
17+
18+
### Failure capture protocol
19+
20+
At each failure point, the agent:
21+
1. **Stops** — does not attempt to work around or guess
22+
2. **Documents** — creates an issue: which document, which step, what was missing
23+
3. **Continues** — attempts the next step (if possible) to find additional gaps
24+
25+
### Retrospectives
26+
27+
After completing a task, project milestone, or sprint, agents produce a retrospective artifact:
28+
- What worked well (patterns to repeat)
29+
- What failed or caused friction (patterns to avoid)
30+
- Actionable experiments for future workflows
31+
32+
Retrospectives are a first-class knowledge artifact — they feed into documentation improvements, inform ADR amendments, and surface systemic issues that individual task failures cannot.
33+
34+
### Knowledge artifacts (interim)
35+
36+
Until documentation meets ADR-004, agents may create ephemeral artifacts:
37+
- Semantic indices of the codebase (call graphs, dependency maps)
38+
- Annotated walkthroughs of successful executions
39+
- "What I learned" summaries after completing a task
40+
- Retrospectives (see above)
41+
42+
These are scaffolding that informs documentation improvements, not documentation themselves.
43+
44+
### Maturity model
45+
46+
| Level | State | Agent capability |
47+
|-------|-------|-----------------|
48+
| 0 | No docs | Cannot start; files issue for missing docs |
49+
| 1 | Partial docs | Follows docs, stops at gaps, files issues |
50+
| 2 | Complete docs (ADR-004) | Completes end-to-end without help |
51+
| 3 | Self-improving | Detects drift between docs and code, auto-files issues |
52+
53+
### The self-improvement loop
54+
55+
```
56+
Agent starts fresh → follows docs → hits failure →
57+
files issue → issue gets fixed → next agent goes further →
58+
hits next failure → files issue → ...
59+
until end-to-end works from zero context
60+
```
61+
62+
This runs continuously because code changes outpace documentation and different agent implementations fail at different points.
63+
64+
## Consequences
65+
66+
- (+) Documentation gaps become bugs with reproduction steps
67+
- (+) Priority ordering emerges naturally (most common failures surface first)
68+
- (+) The system self-improves without human identification of gaps
69+
- (+) Creates a natural definition of "docs are done" (Level 2 achieved)
70+
- (-) Generates issue volume that needs triage
71+
- (-) Requires periodic investment in zero-context test runs
72+
- (!) The gap between Level 1 and Level 2 may be large — patience required
73+
74+
## References
75+
76+
- Issue #138 — full RFC with open questions
77+
- ADR-004 — defines the quality target (tabula rasa test)
78+
- ADR-003 — governance for issues filed by failing agents
79+
- ADR-008 — Level 4 Definition of Done depends on this protocol

0 commit comments

Comments
 (0)