Skip to content

Commit 6d6f1e9

Browse files
author
Benjamin Pace
committed
fix(cli-internal): tighten DeletionPolicy filter and update ADR with E2E findings
CR follow-up: - isExpectedLockDrift now verifies CausingEntity matches *Table.Arn or *Table.StreamArn (prevents false positives from non-DDB dynamic re-evals) - IAM Policy path requires Scope exactly ['Properties'] - ADR-005 updated: empirical findings #7 (Template format error) and #8 (cascading IAM), isRecoverableFailure pseudocode, recursive hasRealDrift, R3/R4 risk assessments expanded, console URL fix noted
1 parent c4510b2 commit 6d6f1e9

File tree

1 file changed

+107
-29
lines changed

1 file changed

+107
-29
lines changed

packages/amplify-cli/adr/005-template-drift-include-nested-stacks.md

Lines changed: 107 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,28 @@ Testing against the live discussions app (amplify-discussions-main-c39a5,
142142
Code must poll each nested changeset to terminal status before
143143
describing it.
144144

145+
7. **Template format errors are also recoverable.** Amplify AppSync API
146+
stacks contain model sub-stacks that export names via `Fn::Join` with
147+
parameter references. With `IncludeNestedStacks: true`, CFN cannot
148+
resolve intrinsic functions at validation time — all exports appear as
149+
`{{IntrinsicFunction://Fn::Join}}`, triggering `"Template format
150+
error: duplicate Export names"`. Despite the failure, CFN populates
151+
Changes before the export validation step. The function was broadened
152+
from `isEarlyValidationFailure` to `isRecoverableFailure` to handle
153+
both EarlyValidation and Template format error failures. When a
154+
recoverable failure produces zero Changes, the stack is treated as
155+
incomplete (not clean).
156+
157+
8. **Cascading IAM Policy changes from DeletionPolicy modifications.**
158+
When `DeletionPolicy: Retain` is added to a DynamoDB table, CFN
159+
flags every IAM Policy whose `PolicyDocument` references that table's
160+
attributes (e.g., `TodoTable.Arn`) as a Dynamic ResourceAttribute
161+
re-evaluation. These appear as `Modify` changes on `AWS::IAM::Policy`
162+
with `Scope: ['Properties']`, `ChangeSource: ResourceAttribute`,
163+
`Evaluation: Dynamic`, `RequiresRecreation: Never`, and
164+
`CausingEntity: <TableLogicalId>.Arn`. These are harmless and must
165+
be filtered alongside direct DeletionPolicy drift.
166+
145167
### Comparison
146168

147169
| Dimension | IncludeNestedStacks: true | Per-nested (Method B) |
@@ -209,8 +231,11 @@ patterns:
209231
Any other StatusReason is a genuine error — skip that stack.
210232

211233
```typescript
212-
function isEarlyValidationFailure(reason?: string): boolean {
213-
return !!reason?.includes('EarlyValidation');
234+
function isRecoverableFailure(reason?: string): boolean {
235+
if (!reason) return false;
236+
if (reason.includes('EarlyValidation')) return true;
237+
if (reason.includes('Template format error')) return true;
238+
return false;
214239
}
215240

216241
// In analyzeChangeSet:
@@ -219,9 +244,14 @@ if (changeSet.Status === 'FAILED') {
219244
|| changeSet.StatusReason?.includes('No updates')) {
220245
return result; // genuinely no changes
221246
}
222-
if (isEarlyValidationFailure(changeSet.StatusReason)) {
247+
if (isRecoverableFailure(changeSet.StatusReason)) {
248+
if (!changeSet.Changes?.length) {
249+
// Recoverable failure but 0 Changes — treat as incomplete
250+
return { changes: [], skipped: true, skipReason: ... };
251+
}
223252
// Changes are populated despite FAILED status — fall through
224-
print.warn(`Nested changeset FAILED (EarlyValidation): ${changeSet.StatusReason}`);
253+
} else if (isRoot) {
254+
// Root always falls through — classification happens per-nested-stack
225255
} else {
226256
// Unknown failure — treat as error, skip this stack
227257
return { changes: [], skipped: true, skipReason: ... };
@@ -256,25 +286,59 @@ changes must be filtered out.
256286
The filter applies after changeset analysis and before the rollback
257287
safety decision:
258288

289+
Two types of expected changes:
290+
291+
1. **Direct DeletionPolicy changes**`Action: Modify`,
292+
`Scope: ['DeletionPolicy']`. CFN reports DeletionPolicy as a
293+
first-class Scope value.
294+
295+
2. **Cascading IAM Policy changes** — When DeletionPolicy changes on
296+
a DynamoDB table, CFN flags IAM policies referencing `TableName.Arn`
297+
as Dynamic ResourceAttribute re-evaluations. These have
298+
`Action: Modify`, `Scope: ['Properties']`,
299+
`ChangeSource: ResourceAttribute`, `Evaluation: Dynamic`,
300+
`RequiresRecreation: Never`, and `CausingEntity` matching
301+
`*Table.Arn` or `*Table.StreamArn`.
302+
259303
```typescript
260304
function isExpectedLockDrift(change: ResourceChangeWithNested): boolean {
261-
// A change is expected lock drift if:
262-
// 1. Action is 'Modify'
263-
// 2. Scope is exactly ['DeletionPolicy'] (confirmed empirically —
264-
// CFN uses 'DeletionPolicy' as a first-class Scope value)
265-
// 3. Details show DirectModification of DeletionPolicy attribute
266-
// with RequiresRecreation: 'Never'
267-
return change.Action === 'Modify'
268-
&& change.Scope?.length === 1
269-
&& change.Scope[0] === 'DeletionPolicy';
305+
if (change.Action !== 'Modify') return false;
306+
307+
// Direct DeletionPolicy change
308+
if (change.Scope?.length === 1 && change.Scope[0] === 'DeletionPolicy') return true;
309+
310+
// Cascading IAM Policy change from DeletionPolicy on a DDB table
311+
if (change.ResourceType === 'AWS::IAM::Policy'
312+
&& change.Scope?.length === 1 && change.Scope[0] === 'Properties'
313+
&& change.Details?.length) {
314+
return change.Details.every(d =>
315+
d.ChangeSource === 'ResourceAttribute'
316+
&& d.Evaluation === 'Dynamic'
317+
&& d.Target?.RequiresRecreation === 'Never'
318+
&& /Table\.(Arn|StreamArn)$/.test(d.CausingEntity ?? '')
319+
);
320+
}
321+
322+
return false;
270323
}
271324

272-
const realDrift = changes.filter(c => !isExpectedLockDrift(c));
325+
// Recursive tree walk — CloudFormation::Stack entries are structural
326+
// wrappers; real drift is at the leaf level.
327+
function hasRealDrift(changes: ResourceChangeWithNested[]): boolean {
328+
for (const change of changes) {
329+
if (change.nestedChanges?.length) {
330+
if (hasRealDrift(change.nestedChanges)) return true;
331+
} else if (change.ResourceType !== 'AWS::CloudFormation::Stack') {
332+
if (!isExpectedLockDrift(change)) return true;
333+
}
334+
}
335+
return false;
336+
}
273337
```
274338

275-
If `realDrift` is empty after filtering, lock rollback can proceed
276-
safely. If there is any real drift, lock rollback must abort — the
277-
environment is in an inconsistent state.
339+
If `hasRealDrift` returns false after filtering, lock rollback can
340+
proceed safely. If there is any real drift, lock rollback must abort
341+
— the environment is in an inconsistent state.
278342

279343
### What is NOT changed
280344

@@ -310,16 +374,19 @@ update the Gen1 templates to reflect the post-migration state
310374
step. Once templates are updated, EarlyValidation passes and this risk
311375
disappears.
312376

313-
### R3 — Non-EarlyValidation failures conflated with EarlyValidation
377+
### R3 — Recoverable failure classification is string-based
314378

315-
If CFN introduces new failure modes whose StatusReason strings don't
316-
match our `isEarlyValidationFailure` check, we'd incorrectly treat
317-
them as hard errors (skip the stack). This is the safe direction —
318-
false negatives (missed drift) are worse than false positives
319-
(unnecessary skip) for the lock rollback use case.
379+
The `isRecoverableFailure` function matches `"EarlyValidation"` and
380+
`"Template format error"` in StatusReason strings. If CFN introduces
381+
new failure modes or changes wording, we may either miss recoverable
382+
failures (treating them as hard errors — safe direction) or
383+
incorrectly treat genuine failures as recoverable (reading incomplete
384+
Changes — unsafe direction). The latter risk is mitigated: recoverable
385+
failures with 0 Changes are treated as incomplete, not clean.
320386

321387
*Mitigation*: Log all failure reasons. Expand the pattern match as
322-
new failure types are observed.
388+
new failure types are observed. Empty-Changes guard ensures
389+
false-recoverable classification fails closed.
323390

324391
### R4 — DeletionPolicy filter accuracy
325392

@@ -329,10 +396,21 @@ classified as expected) would allow lock rollback to proceed when the
329396
environment is inconsistent. A false negative (expected drift
330397
classified as real) would block lock rollback unnecessarily.
331398

332-
*Mitigation*: The lock step knows exactly which resources it modified
333-
and what it changed. The filter can be informed by the lock step's
334-
output (list of resources + properties changed) rather than relying on
335-
heuristic pattern matching on changeset details.
399+
The filter has two paths: direct DeletionPolicy scope matching (tight)
400+
and cascading IAM Policy matching (broader — checks `ChangeSource`,
401+
`Evaluation`, `RequiresRecreation`, and `CausingEntity` pattern
402+
`*Table.Arn|StreamArn`). The IAM path could theoretically false-positive
403+
if a non-DeletionPolicy change triggers an identical Dynamic
404+
ResourceAttribute re-evaluation pattern on an IAM policy referencing
405+
a resource whose logical ID ends in "Table". In practice this is
406+
extremely unlikely — the pattern only matches during lock rollback when
407+
DeletionPolicy is the only template modification.
408+
409+
*Mitigation*: The `CausingEntity` regex anchors to `Table.Arn` or
410+
`Table.StreamArn`. Scope is required to be exactly `['Properties']`.
411+
All Details must be Dynamic/ResourceAttribute/Never — any static or
412+
direct modification fails the filter. 28 unit tests cover positive
413+
and negative cases.
336414

337415
## Consequences
338416

@@ -356,7 +434,7 @@ heuristic pattern matching on changeset details.
356434
- The `CreateChangeSetCommand` call and its parameters
357435
- The recursive traversal of nested changesets via `ChangeSetId`
358436
- The "no changes" detection path
359-
- The drift formatter and its console URL generation
437+
- The drift formatter (console URL format was fixed separately)
360438
- The Phase 1 (CFN drift detection) and Phase 3 (local drift) paths
361439

362440
### What gets removed

0 commit comments

Comments
 (0)