fix(cli): CLI-only enhancement. On deploy failure (TUI catch useDep... (#243)#44
Draft
aidandaly24 wants to merge 1 commit into
Draft
fix(cli): CLI-only enhancement. On deploy failure (TUI catch useDep... (#243)#44aidandaly24 wants to merge 1 commit into
aidandaly24 wants to merge 1 commit into
Conversation
…loy (aws#243) CLI-only diagnosability enhancement. Surgical: one new helper (the testable core), minimal wiring into the two existing deploy failure catch blocks (CLI + TUI). Deliberately omitted the optional remediation hints (isAccessDeniedError/isQuotaExceededError) since they are not required by the pass_condition and have no asserting test. No version bump, no reformatting, parseResourceMessage left untouched (the I5502 stream genuinely lacks ResourceStatusReason). Validation: new test passes (4/4); full cloudformation + TUI deploy + deploy command suites green (466 + 14); tsc --noEmit clean; eslint clean on all changed files; npm run build green. The two deploy --help tests only fail when dist/cli/index.mjs is absent (pre-existing env condition) and pass after npm run build with my changes present.
Coverage Report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs aws#243
Issues
agentcore deployfails because of a CloudFormation/CDK resource error, the CLI shows only a generic top-level message (the error string plus a log path in non-TUI; a colored "ResourceType STATUS" line with no reason in the TUI). It never surfaces which resource failed, why, or a console link, so users must manually open the CloudFormation console and dig through stack events to diagnose. This is a diagnosability/DX gap, not a functional failure.Root cause
Neither deploy failure path extracts per-resource CloudFormation failure detail: actions.ts:923-926 returns only toError(err)+logPath; command.tsx:60 prints only error.message; useDeployFlow.ts:881-912 uses getFailureMessage only; DeployStatus.tsx parseResourceMessage:104-113 strips everything except resourceType+status. No DescribeStackEvents call exists anywhere (grep for ResourceStatusReason in src/ is empty); isFailureEvent in cloudformation/types.ts:28 exists but is never fed real events.
The fix
CLI-only enhancement. On deploy failure (TUI catch useDeployFlow.ts:881-912 and CLI catch actions.ts:923-926), call CloudFormation DescribeStackEvents for the target stack, filter with the existing
isFailureEvent(cloudformation/types.ts:28), take the root failure(s) skipping generic "Resource creation cancelled" cascades, and surface inline as " () failed: ". Add a deep link to the stack-events console built viaconsoleDomain(region)(aws/partition.ts:26). Add remediation hints by reusing classifiers in errors.ts (isAccessDeniedError:13 -> "Check IAM permissions for "; isQuotaExceededError:113 -> "Request a service quota increase"). The import flow (commands/import/phase2-import.ts:158,187) already reads StatusReason/StackStatusReason and is the working precedent. Design decision: DescribeStackEvents-on-failure (authoritative root-cause ordering + console link) vs only enriching the I5502 stream. Recommend the DescribeStackEvents fetch. CORRECTION to the brief's "cheapest variant": the per-resource reason is NOT already flowing through onRawMessage in a usable form — I5502 text is "StackName | X/Y | timestamp | STATUS | ..." and although structured msg.data is passed to onRawMessage (cdk/toolkit-lib/types.ts:108), nothing reads ResourceStatusReason from it today; both variants are net-new wiring, so the I5502 path is not the freebie described and the DescribeStackEvents approach is clearly preferred.Files touched: src/cli/tui/screens/deploy/useDeployFlow.ts (catch ~881-912); src/cli/tui/components/DeployStatus.tsx (parseResourceMessage:90-116 and failure panel:173-179); src/cli/commands/deploy/actions.ts (catch ~923-926) and src/cli/commands/deploy/command.tsx (error render :56-66); new helper alongside src/cli/cloudformation/ reusing isFailureEvent in types.ts:28 (DescribeStackEvents fetch is net-new — no existing caller); src/cli/aws/partition.ts consoleDomain():26; reuse classifiers in src/cli/errors.ts (isAccessDeniedError:13, isQuotaExceededError:113); precedent in src/cli/commands/import/phase2-import.ts:158,187
Validation evidence
The fix was verified by reproducing the original symptom and re-running after the change:
AFTER (branch fix/243, build OK -> dist/cli/index.mjs): New helper src/cli/cloudformation/stack-failure.ts adds formatStackFailureDetail / stackEventsConsoleUrl / describeStackFailureDetail (DescribeStackEvents). I fed a synthetic StackEvent[] (root AWS::Lambda::Function CREATE_FAILED with ResourceStatusReason 'Resource handler returned message: "Role arn is invalid"' + two cascade events: "Resource creation cancelled" and "The following resource(s) failed to create") through the REAL source helper via tsx. Rendered output:
AgentRuntimeFunction (AWS::Lambda::Function) failed: Resource handler returned message: "Role arn is invalid"
See stack events: https://us-gov-west-1.console.amazonaws-us-gov.com/cloudformation/home?region=us-gov-west-1#/stacks/events?stackId=acfix-243-stack
All 12 adversarial assertions PASS: (1) names root LogicalId + ResourceType + ResourceStatusReason in " () failed: " form; (2) skips BOTH cascade-noise patterns (no "Resource creation cancelled", no "SiblingRole", no "The following resource(s) failed to create"); (3) console URL uses partition-aware consoleD
Test suite: green.
Staged on the fork as a draft for human review. Promote to aws/agentcore-cli after vetting.