Skip to content

fix(cli): CLI-only enhancement. On deploy failure (TUI catch useDep... (#243)#44

Draft
aidandaly24 wants to merge 1 commit into
mainfrom
fix/243
Draft

fix(cli): CLI-only enhancement. On deploy failure (TUI catch useDep... (#243)#44
aidandaly24 wants to merge 1 commit into
mainfrom
fix/243

Conversation

@aidandaly24

Copy link
Copy Markdown
Owner

Refs aws#243

Issues

  • Surface actionable errors when agentcore deploy fails aws/agentcore-cli#243 — When agentcore deploy fails because of a CloudFormation/CDK resource error, the CLI shows only a generic top-level message (the error string plus a log path in non-TUI; a colored "ResourceType STATUS" line with no reason in the TUI). It never surfaces which resource failed, why, or a console link, so users must manually open the CloudFormation console and dig through stack events to diagnose. This is a diagnosability/DX gap, not a functional failure.

Root cause

Neither deploy failure path extracts per-resource CloudFormation failure detail: actions.ts:923-926 returns only toError(err)+logPath; command.tsx:60 prints only error.message; useDeployFlow.ts:881-912 uses getFailureMessage only; DeployStatus.tsx parseResourceMessage:104-113 strips everything except resourceType+status. No DescribeStackEvents call exists anywhere (grep for ResourceStatusReason in src/ is empty); isFailureEvent in cloudformation/types.ts:28 exists but is never fed real events.

The fix

CLI-only enhancement. On deploy failure (TUI catch useDeployFlow.ts:881-912 and CLI catch actions.ts:923-926), call CloudFormation DescribeStackEvents for the target stack, filter with the existing isFailureEvent (cloudformation/types.ts:28), take the root failure(s) skipping generic "Resource creation cancelled" cascades, and surface inline as " () failed: ". Add a deep link to the stack-events console built via consoleDomain(region) (aws/partition.ts:26). Add remediation hints by reusing classifiers in errors.ts (isAccessDeniedError:13 -> "Check IAM permissions for "; isQuotaExceededError:113 -> "Request a service quota increase"). The import flow (commands/import/phase2-import.ts:158,187) already reads StatusReason/StackStatusReason and is the working precedent. Design decision: DescribeStackEvents-on-failure (authoritative root-cause ordering + console link) vs only enriching the I5502 stream. Recommend the DescribeStackEvents fetch. CORRECTION to the brief's "cheapest variant": the per-resource reason is NOT already flowing through onRawMessage in a usable form — I5502 text is "StackName | X/Y | timestamp | STATUS | ..." and although structured msg.data is passed to onRawMessage (cdk/toolkit-lib/types.ts:108), nothing reads ResourceStatusReason from it today; both variants are net-new wiring, so the I5502 path is not the freebie described and the DescribeStackEvents approach is clearly preferred.

Files touched: src/cli/tui/screens/deploy/useDeployFlow.ts (catch ~881-912); src/cli/tui/components/DeployStatus.tsx (parseResourceMessage:90-116 and failure panel:173-179); src/cli/commands/deploy/actions.ts (catch ~923-926) and src/cli/commands/deploy/command.tsx (error render :56-66); new helper alongside src/cli/cloudformation/ reusing isFailureEvent in types.ts:28 (DescribeStackEvents fetch is net-new — no existing caller); src/cli/aws/partition.ts consoleDomain():26; reuse classifiers in src/cli/errors.ts (isAccessDeniedError:13, isQuotaExceededError:113); precedent in src/cli/commands/import/phase2-import.ts:158,187

Validation evidence

The fix was verified by reproducing the original symptom and re-running after the change:

BEFORE (committed HEAD baseline, fix stashed): grep -rln "DescribeStackEvents|ResourceStatusReason" src/cli returned EMPTY -> a deploy failure could only surface the generic top-level error.message + log path (non-TUI command.tsx) or a colored "ResourceType STATUS" line with no reason (TUI DeployStatus.tsx); user had to open the CFN console manually. Symptom confirmed real.

AFTER (branch fix/243, build OK -> dist/cli/index.mjs): New helper src/cli/cloudformation/stack-failure.ts adds formatStackFailureDetail / stackEventsConsoleUrl / describeStackFailureDetail (DescribeStackEvents). I fed a synthetic StackEvent[] (root AWS::Lambda::Function CREATE_FAILED with ResourceStatusReason 'Resource handler returned message: "Role arn is invalid"' + two cascade events: "Resource creation cancelled" and "The following resource(s) failed to create") through the REAL source helper via tsx. Rendered output:
AgentRuntimeFunction (AWS::Lambda::Function) failed: Resource handler returned message: "Role arn is invalid"
See stack events: https://us-gov-west-1.console.amazonaws-us-gov.com/cloudformation/home?region=us-gov-west-1#/stacks/events?stackId=acfix-243-stack
All 12 adversarial assertions PASS: (1) names root LogicalId + ResourceType + ResourceStatusReason in " () failed: " form; (2) skips BOTH cascade-noise patterns (no "Resource creation cancelled", no "SiblingRole", no "The following resource(s) failed to create"); (3) console URL uses partition-aware consoleD

Test suite: green.


Staged on the fork as a draft for human review. Promote to aws/agentcore-cli after vetting.

…loy (aws#243)

CLI-only diagnosability enhancement. Surgical: one new helper (the
testable core), minimal wiring into the two existing deploy failure
catch blocks (CLI + TUI). Deliberately omitted the optional remediation
hints (isAccessDeniedError/isQuotaExceededError) since they are not
required by the pass_condition and have no asserting test. No version
bump, no reformatting, parseResourceMessage left untouched (the I5502
stream genuinely lacks ResourceStatusReason). Validation: new test
passes (4/4); full cloudformation + TUI deploy + deploy command suites
green (466 + 14); tsc --noEmit clean; eslint clean on all changed files;
npm run build green. The two deploy --help tests only fail when
dist/cli/index.mjs is absent (pre-existing env condition) and pass after
npm run build with my changes present.
@github-actions github-actions Bot added size/m PR size: M agentcore-harness-reviewing AgentCore Harness review in progress labels Jun 25, 2026
@github-actions

Copy link
Copy Markdown

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 37.18% 13613 / 36609
🔵 Statements 36.45% 14474 / 39702
🔵 Functions 31.87% 2341 / 7344
🔵 Branches 31.11% 9011 / 28958
Generated in workflow #98 for commit c29ef03 by the Vitest Coverage Report Action

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant