Skip to content

feat: auto-trigger debugger in CI#2158

Draft
lionello wants to merge 2 commits into
lio/cleanup-aifrom
lio/trigger-debugger
Draft

feat: auto-trigger debugger in CI#2158
lionello wants to merge 2 commits into
lio/cleanup-aifrom
lio/trigger-debugger

Conversation

@lionello

@lionello lionello commented Jun 30, 2026

Copy link
Copy Markdown
Member

Description

Trigger the AI debugger on failure in CI.

Linked Issues

Checklist

  • I have performed a self-review of my code
  • I have added appropriate tests
  • I have updated the Defang CLI docs and/or README to reflect my changes, if necessary

Summary by CodeRabbit

  • New Features

    • Added a cleanup workflow to help identify and remove leftover cloud resources after teardown.
    • Expanded cloud cleanup support for common AWS resources, including load balancers, databases, DNS records, and container images.
    • Improved debug flows to better handle interactive and non-interactive environments.
  • Bug Fixes

    • Made compose commands more reliable when debugging is unavailable or auto-approval is enabled.
    • Refined confirmation prompts and non-interactive behavior to avoid unnecessary prompts or failures.

@lionello lionello requested a review from jordanstephens as a code owner June 30, 2026 17:03
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds an OrphanCleaner interface and AWS implementations (ALB, RDS, ECR, Route53) for discovering and remediating resources left behind after defang down. A new cleanup_resources AI agent tool invokes this flow. Simultaneously, the debug agent and Debugger gain a non-interactive mode with auto-approval based on subscription tier, and all NewDebugger call sites are updated to pass the interactive flag.

Orphan Cleanup + Non-Interactive Debug Agent

Layer / File(s) Summary
OrphanCleaner interface and AWS helper primitives
src/pkg/cli/client/cleanup.go, src/pkg/clouds/aws/ecr.go, src/pkg/clouds/aws/ecr_test.go, src/pkg/clouds/aws/elbv2.go, src/pkg/clouds/aws/elbv2_test.go, src/pkg/clouds/aws/rds.go, src/pkg/clouds/aws/rds_test.go, src/pkg/clouds/aws/route53.go, src/pkg/clouds/aws/route53_cleanup_test.go, src/go.mod
Defines OrphanResource and OrphanCleaner; adds paginated find/modify helpers for ECR, ELBv2, RDS, and Route53 with unit tests; bumps AWS SDK dependencies.
ByocAws DiscoverOrphans and CleanupOrphan
src/pkg/cli/client/byoc/aws/byoc.go, src/pkg/cli/client/byoc/aws/cleanup.go, src/pkg/cli/client/byoc/aws/domain_test.go
Adds orphans map field to ByocAws, implements DiscoverOrphans (ALB, RDS, ECR, Route53) and CleanupOrphan with per-category dispatch, and stubs ChangeResourceRecordSets in the domain test mock.
Agent non-interactive mode and elicitations sentinel
src/pkg/agent/agent.go, src/pkg/elicitations/elicitations.go, src/pkg/utils.go
Adds Option/WithNonInteractive to agent.New, gates REPL behavior on interactive flag, adds ErrNotSupported to the elicitations controller, and adds IsPaidTier helper.
Debugger non-interactive mode and AutoApprove
src/pkg/debug/debug.go, src/pkg/debug/debug_test.go
NewDebugger gains interactive parameter, derives defaultPermission via isPaidAccount/IsPaidTier, exposes AutoApprove(), bypasses survey prompts when non-interactive, and adds TestDebugDeploymentNonInteractive.
cleanup_resources AI tool
src/pkg/agent/tools/cleanup.go, src/pkg/agent/tools/cleanup_test.go, src/pkg/agent/tools/tools.go
Implements HandleCleanupTool with interactive/non-interactive orphan reporting and per-orphan confirmation, registers it as cleanup_resources in CollectDefangTools, and covers it with subtests.
Compose and CLI wiring
src/cmd/cli/command/compose.go, src/cmd/cli/command/debug.go, src/cmd/cli/command/session.go, src/pkg/cli/stacks.go, src/pkg/cli/stacks_test.go
Passes !global.NonInteractive to all NewDebugger call sites; compose down invokes handleTailAndMonitorErr on tail failure; AutoApprove() gates non-interactive debug paths; fixes RequestEnum option order in stack removal confirmation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 A bunny hopped through leftover clouds,
Found ALBs and RDS hiding in shrouds,
"No browser? No problem! I'll auto-approve,"
Cleaned ECR repos with one tidy move.
Non-interactive rabbits still get the job done—
Orphans begone, and cleanup is fun! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly describes the CI debugger automation, which is a real and important part of the changeset.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lio/trigger-debugger

Comment @coderabbitai help to get the list of available commands.

@lionello lionello changed the base branch from main to lio/cleanup-ai June 30, 2026 17:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
src/pkg/agent/tools/cleanup_test.go (1)

85-121: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a test case exercising CleanupOrphan failure.

mockCleanerProvider.cleanupErr is wired into the mock (Line 31) but no table case sets it, so the failed++ branch and the "... failed: %v" / "N cleaned, N skipped, N failed" reporting in HandleCleanupTool are never covered. A case with cleanupErr set and confirm: "yes" would close this gap.

🧪 Suggested case
+		{
+			name:           "cleanup failure is reported",
+			provider:       &mockCleanerProvider{orphans: twoOrphans, cleanupErr: errors.New("boom")},
+			confirm:        "yes",
+			expectContains: []string{"0 cleaned, 0 skipped, 2 failed", "failed: boom"},
+			expectCleaned:  nil,
+		},

As per path instructions: "Add tests for new behavior and important failure modes: ... missing returns, ... failed".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/agent/tools/cleanup_test.go` around lines 85 - 121, Add a
table-driven test case in cleanup_test.go that sets
mockCleanerProvider.cleanupErr and confirms with "yes" so CleanupOrphan failure
is exercised. Use the existing HandleCleanupTool test table and a case alongside
the current "confirm yes cleans every orphan" scenario, asserting the failed
branch is reported with the "N cleaned, N skipped, N failed" summary and the
"... failed: %v" message while still checking the cleaned/skipped counts.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pkg/agent/tools/cleanup.go`:
- Around line 32-36: The cleanup flow is discarding the error from
loader.ProjectWorkingDir(ctx), which can hide real filesystem or loader failures
before stacks.NewManager is called. Update the cleanup logic to handle that
error explicitly in the same path that sets workingDir, and return a wrapped
error instead of continuing with an empty directory unless there is a clearly
documented reason it is safe to ignore. Keep the fix localized around
ProjectWorkingDir and the subsequent stacks.NewManager call so the root cause is
preserved.

In `@src/pkg/cli/client/byoc/aws/cleanup.go`:
- Around line 70-102: The ALB/RDS discovery in cleanup uses broad prefix
matching that can catch resources from the wrong project, so tighten the
matching to the full generated name shape instead of a bare prefix. Update the
lookup logic around b.resourceBaseName, FindLoadBalancersByPrefix, and
FindDBInstancesByPrefix so it validates the expected suffix/boundary for the
stack name before calling add, and only offers resources whose names exactly
match the intended defang-generated pattern.

In `@src/pkg/cli/client/byoc/aws/domain_test.go`:
- Around line 40-43: The Route53 mock mutation method is currently a silent
no-op, which can hide unexpected calls from prepareDomainDelegation. Update
r53Mock.ChangeResourceRecordSets in the domain test mock to fail fast by
panicking or otherwise triggering an explicit test failure instead of returning
nil, nil, so any unexpected record mutation is surfaced immediately during
tests.

In `@src/pkg/clouds/aws/rds_test.go`:
- Around line 26-38: Add a paginated discovery test for FindDBInstancesByPrefix
to cover the out.Marker loop in rds.go, since the current single-page mockRDS
case won’t catch broken page traversal. Extend the rds_test.go coverage with a
two-page mock similar to the ELB pagination test, using mockRDS and
FindDBInstancesByPrefix to verify results are accumulated across pages and the
marker handoff is handled correctly.

In `@src/pkg/clouds/aws/route53_cleanup_test.go`:
- Around line 33-37: The delete test currently verifies record deletion but not
the new hosted-zone ID trimming behavior. Update
fakeR53.ChangeResourceRecordSets to capture the incoming HostedZoneId from the
Route53 request, then assert in TestDeleteResourceRecordSet that the value
passed is the trimmed form "Z123" rather than the original "/hostedzone/..."
string. Use the existing fakeR53 and TestDeleteResourceRecordSet symbols to keep
the check focused on the new behavior.

In `@src/pkg/clouds/aws/route53.go`:
- Around line 191-216: Wrap the raw Route53 SDK errors in the helper functions
that list and delete records so callers get both AWS detail and operation
context. In the record-listing logic that calls ListResourceRecordSets, return a
wrapped error that includes the zone identifier and that listing failed; in
DeleteResourceRecordSet, wrap the ChangeResourceRecordSets error with the
trimmed hosted zone ID and record identity (name/type/set info from the
ResourceRecordSet). Use the existing function names ListResourceRecordSets and
DeleteResourceRecordSet to locate the changes and preserve the original error
with %w.

---

Nitpick comments:
In `@src/pkg/agent/tools/cleanup_test.go`:
- Around line 85-121: Add a table-driven test case in cleanup_test.go that sets
mockCleanerProvider.cleanupErr and confirms with "yes" so CleanupOrphan failure
is exercised. Use the existing HandleCleanupTool test table and a case alongside
the current "confirm yes cleans every orphan" scenario, asserting the failed
branch is reported with the "N cleaned, N skipped, N failed" summary and the
"... failed: %v" message while still checking the cleaned/skipped counts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c9f2a0f-8f5a-4601-b506-aad846df4855

📥 Commits

Reviewing files that changed from the base of the PR and between a6ec064 and aaaf454.

⛔ Files ignored due to path filters (1)
  • src/go.sum is excluded by !**/*.sum
📒 Files selected for processing (26)
  • src/cmd/cli/command/compose.go
  • src/cmd/cli/command/debug.go
  • src/cmd/cli/command/session.go
  • src/go.mod
  • src/pkg/agent/agent.go
  • src/pkg/agent/tools/cleanup.go
  • src/pkg/agent/tools/cleanup_test.go
  • src/pkg/agent/tools/tools.go
  • src/pkg/cli/client/byoc/aws/byoc.go
  • src/pkg/cli/client/byoc/aws/cleanup.go
  • src/pkg/cli/client/byoc/aws/domain_test.go
  • src/pkg/cli/client/cleanup.go
  • src/pkg/cli/stacks.go
  • src/pkg/cli/stacks_test.go
  • src/pkg/clouds/aws/ecr.go
  • src/pkg/clouds/aws/ecr_test.go
  • src/pkg/clouds/aws/elbv2.go
  • src/pkg/clouds/aws/elbv2_test.go
  • src/pkg/clouds/aws/rds.go
  • src/pkg/clouds/aws/rds_test.go
  • src/pkg/clouds/aws/route53.go
  • src/pkg/clouds/aws/route53_cleanup_test.go
  • src/pkg/debug/debug.go
  • src/pkg/debug/debug_test.go
  • src/pkg/elicitations/elicitations.go
  • src/pkg/utils.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 6

🧹 Nitpick comments (1)
src/pkg/agent/tools/cleanup_test.go (1)

85-121: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a test case exercising CleanupOrphan failure.

mockCleanerProvider.cleanupErr is wired into the mock (Line 31) but no table case sets it, so the failed++ branch and the "... failed: %v" / "N cleaned, N skipped, N failed" reporting in HandleCleanupTool are never covered. A case with cleanupErr set and confirm: "yes" would close this gap.

🧪 Suggested case
+		{
+			name:           "cleanup failure is reported",
+			provider:       &mockCleanerProvider{orphans: twoOrphans, cleanupErr: errors.New("boom")},
+			confirm:        "yes",
+			expectContains: []string{"0 cleaned, 0 skipped, 2 failed", "failed: boom"},
+			expectCleaned:  nil,
+		},

As per path instructions: "Add tests for new behavior and important failure modes: ... missing returns, ... failed".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/agent/tools/cleanup_test.go` around lines 85 - 121, Add a
table-driven test case in cleanup_test.go that sets
mockCleanerProvider.cleanupErr and confirms with "yes" so CleanupOrphan failure
is exercised. Use the existing HandleCleanupTool test table and a case alongside
the current "confirm yes cleans every orphan" scenario, asserting the failed
branch is reported with the "N cleaned, N skipped, N failed" summary and the
"... failed: %v" message while still checking the cleaned/skipped counts.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pkg/agent/tools/cleanup.go`:
- Around line 32-36: The cleanup flow is discarding the error from
loader.ProjectWorkingDir(ctx), which can hide real filesystem or loader failures
before stacks.NewManager is called. Update the cleanup logic to handle that
error explicitly in the same path that sets workingDir, and return a wrapped
error instead of continuing with an empty directory unless there is a clearly
documented reason it is safe to ignore. Keep the fix localized around
ProjectWorkingDir and the subsequent stacks.NewManager call so the root cause is
preserved.

In `@src/pkg/cli/client/byoc/aws/cleanup.go`:
- Around line 70-102: The ALB/RDS discovery in cleanup uses broad prefix
matching that can catch resources from the wrong project, so tighten the
matching to the full generated name shape instead of a bare prefix. Update the
lookup logic around b.resourceBaseName, FindLoadBalancersByPrefix, and
FindDBInstancesByPrefix so it validates the expected suffix/boundary for the
stack name before calling add, and only offers resources whose names exactly
match the intended defang-generated pattern.

In `@src/pkg/cli/client/byoc/aws/domain_test.go`:
- Around line 40-43: The Route53 mock mutation method is currently a silent
no-op, which can hide unexpected calls from prepareDomainDelegation. Update
r53Mock.ChangeResourceRecordSets in the domain test mock to fail fast by
panicking or otherwise triggering an explicit test failure instead of returning
nil, nil, so any unexpected record mutation is surfaced immediately during
tests.

In `@src/pkg/clouds/aws/rds_test.go`:
- Around line 26-38: Add a paginated discovery test for FindDBInstancesByPrefix
to cover the out.Marker loop in rds.go, since the current single-page mockRDS
case won’t catch broken page traversal. Extend the rds_test.go coverage with a
two-page mock similar to the ELB pagination test, using mockRDS and
FindDBInstancesByPrefix to verify results are accumulated across pages and the
marker handoff is handled correctly.

In `@src/pkg/clouds/aws/route53_cleanup_test.go`:
- Around line 33-37: The delete test currently verifies record deletion but not
the new hosted-zone ID trimming behavior. Update
fakeR53.ChangeResourceRecordSets to capture the incoming HostedZoneId from the
Route53 request, then assert in TestDeleteResourceRecordSet that the value
passed is the trimmed form "Z123" rather than the original "/hostedzone/..."
string. Use the existing fakeR53 and TestDeleteResourceRecordSet symbols to keep
the check focused on the new behavior.

In `@src/pkg/clouds/aws/route53.go`:
- Around line 191-216: Wrap the raw Route53 SDK errors in the helper functions
that list and delete records so callers get both AWS detail and operation
context. In the record-listing logic that calls ListResourceRecordSets, return a
wrapped error that includes the zone identifier and that listing failed; in
DeleteResourceRecordSet, wrap the ChangeResourceRecordSets error with the
trimmed hosted zone ID and record identity (name/type/set info from the
ResourceRecordSet). Use the existing function names ListResourceRecordSets and
DeleteResourceRecordSet to locate the changes and preserve the original error
with %w.

---

Nitpick comments:
In `@src/pkg/agent/tools/cleanup_test.go`:
- Around line 85-121: Add a table-driven test case in cleanup_test.go that sets
mockCleanerProvider.cleanupErr and confirms with "yes" so CleanupOrphan failure
is exercised. Use the existing HandleCleanupTool test table and a case alongside
the current "confirm yes cleans every orphan" scenario, asserting the failed
branch is reported with the "N cleaned, N skipped, N failed" summary and the
"... failed: %v" message while still checking the cleaned/skipped counts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c9f2a0f-8f5a-4601-b506-aad846df4855

📥 Commits

Reviewing files that changed from the base of the PR and between a6ec064 and aaaf454.

⛔ Files ignored due to path filters (1)
  • src/go.sum is excluded by !**/*.sum
📒 Files selected for processing (26)
  • src/cmd/cli/command/compose.go
  • src/cmd/cli/command/debug.go
  • src/cmd/cli/command/session.go
  • src/go.mod
  • src/pkg/agent/agent.go
  • src/pkg/agent/tools/cleanup.go
  • src/pkg/agent/tools/cleanup_test.go
  • src/pkg/agent/tools/tools.go
  • src/pkg/cli/client/byoc/aws/byoc.go
  • src/pkg/cli/client/byoc/aws/cleanup.go
  • src/pkg/cli/client/byoc/aws/domain_test.go
  • src/pkg/cli/client/cleanup.go
  • src/pkg/cli/stacks.go
  • src/pkg/cli/stacks_test.go
  • src/pkg/clouds/aws/ecr.go
  • src/pkg/clouds/aws/ecr_test.go
  • src/pkg/clouds/aws/elbv2.go
  • src/pkg/clouds/aws/elbv2_test.go
  • src/pkg/clouds/aws/rds.go
  • src/pkg/clouds/aws/rds_test.go
  • src/pkg/clouds/aws/route53.go
  • src/pkg/clouds/aws/route53_cleanup_test.go
  • src/pkg/debug/debug.go
  • src/pkg/debug/debug_test.go
  • src/pkg/elicitations/elicitations.go
  • src/pkg/utils.go
🛑 Comments failed to post (6)
src/pkg/agent/tools/cleanup.go (1)

32-36: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Don't silently swallow the ProjectWorkingDir error.

The error from loader.ProjectWorkingDir(ctx) is discarded, so a real filesystem/loader failure silently yields an empty workingDir that is then passed into stacks.NewManager, masking the root cause. Either propagate it or document why it's safe to ignore here.

🛠️ Proposed fix
-	workingDir, _ := loader.ProjectWorkingDir(ctx)
+	workingDir, err := loader.ProjectWorkingDir(ctx)
+	if err != nil {
+		return "", fmt.Errorf("failed to determine project working directory: %w", err)
+	}

As per coding guidelines: "Do not silently swallow real errors; return wrapped errors for filesystem, parsing, network, and provider failures."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	workingDir, err := loader.ProjectWorkingDir(ctx)
	if err != nil {
		return "", fmt.Errorf("failed to determine project working directory: %w", err)
	}
	sm, err := stacks.NewManager(fabric, workingDir, params.ProjectName, ec)
	if err != nil {
		return "", fmt.Errorf("failed to create stack manager: %w", err)
	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/agent/tools/cleanup.go` around lines 32 - 36, The cleanup flow is
discarding the error from loader.ProjectWorkingDir(ctx), which can hide real
filesystem or loader failures before stacks.NewManager is called. Update the
cleanup logic to handle that error explicitly in the same path that sets
workingDir, and return a wrapped error instead of continuing with an empty
directory unless there is a clearly documented reason it is safe to ignore. Keep
the fix localized around ProjectWorkingDir and the subsequent stacks.NewManager
call so the root cause is preserved.

Source: Coding guidelines

src/pkg/cli/client/byoc/aws/cleanup.go (1)

70-102: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect how CD names ALBs/RDS to confirm a separator boundary after the base.
rg -nP -C3 'resourceBaseName|LoadBalancerName|DBInstanceIdentifier' src/pkg/clouds/aws src/pkg/cli/client/byoc/aws
rg -nP -C3 'alb_logs|maxALBNameLen|Prefix' src/pkg/clouds/aws

Repository: DefangLabs/defang

Length of output: 28162


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Map the relevant files and inspect the naming/selection logic.
ast-grep outline src/pkg/cli/client/byoc/aws/cleanup.go --view expanded
ast-grep outline src/pkg/clouds/aws/elbv2.go --view expanded
ast-grep outline src/pkg/clouds/aws/rds.go --view expanded

# Read the cleanup logic and the matching helpers in context.
sed -n '1,180p' src/pkg/cli/client/byoc/aws/cleanup.go
printf '\n--- elbv2 ---\n'
sed -n '1,120p' src/pkg/clouds/aws/elbv2.go
printf '\n--- rds ---\n'
sed -n '1,120p' src/pkg/clouds/aws/rds.go

# Look for any tests or code that establish the expected name shape.
rg -n -C3 'LoadBalancerName|DBInstanceIdentifier|resourceBaseName|Prefix' src/pkg/clouds/aws src/pkg/cli/client/byoc/aws

Repository: DefangLabs/defang

Length of output: 50373


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the resource naming logic that produces ALB/RDS identifiers.
rg -n -C3 'alb_logs|LoadBalancerName|DBInstanceIdentifier|resourceBaseName|trunc|31' src/pkg/clouds/aws src/pkg/cli/client/byoc/aws

# Read the specific ALB naming source if present.
fd -a 'alb_logs.go|*alb*logs*.go' src/pkg/clouds/aws src/pkg/cli/client/byoc/aws

Repository: DefangLabs/defang

Length of output: 50374


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read only the small slices that define ALB/RDS naming and the cleanup matcher.
sed -n '1,220p' src/pkg/cli/client/byoc/aws/alb_logs.go
printf '\n--- cleanup ---\n'
sed -n '1,130p' src/pkg/cli/client/byoc/aws/cleanup.go
printf '\n--- cloudformation template references ---\n'
rg -n -C2 'LoadBalancerName|DBInstanceIdentifier|DeletionProtection|resourceBaseName|CdTaskPrefix|Prefix' src/pkg/clouds/aws src/pkg/cli/client/byoc/aws | head -n 120

Repository: DefangLabs/defang

Length of output: 20667


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact ALB/RDS name construction and any fixed suffixes.
rg -n -C4 'ALB|load balancer|DBInstanceIdentifier|db|suffix|hash|name.*31|resourceBaseName' src/pkg/clouds/aws src/pkg/cli/client/byoc/aws

# Read the remainder of the ALB log helper if naming is defined there.
sed -n '220,420p' src/pkg/cli/client/byoc/aws/alb_logs.go

# Read any cloudformation/template code that sets the ALB/RDS names.
fd -a 'template.go|*.yaml|*.yml' src/pkg/clouds/aws src/pkg/cli/client/byoc/aws | head -n 50

Repository: DefangLabs/defang

Length of output: 50374


Anchor ALB/RDS discovery to the full generated name shape.
FindLoadBalancersByPrefix uses a 31-char truncated ALB prefix, and FindDBInstancesByPrefix uses the raw base prefix, so any other resource with the same leading characters can be offered for cleanup. That can disable deletion protection on the wrong project/stack. Match the exact expected suffix/boundary instead of a bare prefix.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/cli/client/byoc/aws/cleanup.go` around lines 70 - 102, The ALB/RDS
discovery in cleanup uses broad prefix matching that can catch resources from
the wrong project, so tighten the matching to the full generated name shape
instead of a bare prefix. Update the lookup logic around b.resourceBaseName,
FindLoadBalancersByPrefix, and FindDBInstancesByPrefix so it validates the
expected suffix/boundary for the stack name before calling add, and only offers
resources whose names exactly match the intended defang-generated pattern.
src/pkg/cli/client/byoc/aws/domain_test.go (1)

40-43: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make the unimplemented Route53 mutation mock fail fast.

Returning nil, nil here silently accepts unexpected record mutations, so these tests could mask a regression if prepareDomainDelegation ever starts calling ChangeResourceRecordSets. Panic or return an explicit test failure instead of succeeding. Based on learnings, "In Go test files (any _test.go under the Defang codebase), it's acceptable for mocks to panic to surface issues quickly during tests. Do not add defensive error handling in mocks within tests, since panics will fail fast and highlight problems."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/cli/client/byoc/aws/domain_test.go` around lines 40 - 43, The Route53
mock mutation method is currently a silent no-op, which can hide unexpected
calls from prepareDomainDelegation. Update r53Mock.ChangeResourceRecordSets in
the domain test mock to fail fast by panicking or otherwise triggering an
explicit test failure instead of returning nil, nil, so any unexpected record
mutation is surfaced immediately during tests.

Source: Learnings

src/pkg/clouds/aws/rds_test.go (1)

26-38: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Add a paginated RDS discovery case.

FindDBInstancesByPrefix in src/pkg/clouds/aws/rds.go loops on out.Marker, but this test double never returns a second page, so a broken marker handoff would still pass here. Add a two-page case like the ELB test to lock down the new traversal behavior. As per coding guidelines, "Add tests for new behavior and important failure modes" and "Test behavior boundaries, not implementation mechanics."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/clouds/aws/rds_test.go` around lines 26 - 38, Add a paginated
discovery test for FindDBInstancesByPrefix to cover the out.Marker loop in
rds.go, since the current single-page mockRDS case won’t catch broken page
traversal. Extend the rds_test.go coverage with a two-page mock similar to the
ELB pagination test, using mockRDS and FindDBInstancesByPrefix to verify results
are accumulated across pages and the marker handoff is handled correctly.

Source: Coding guidelines

src/pkg/clouds/aws/route53_cleanup_test.go (1)

33-37: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the trimmed hosted-zone ID in the delete test.

The new behavior here is not just “delete this record” but “delete it with /hostedzone/ stripped first.” fakeR53.ChangeResourceRecordSets does not capture HostedZoneId, so TestDeleteResourceRecordSet would still pass if that trim regressed. Record the incoming zone ID in the fake and assert it is "Z123". As per coding guidelines, "Add tests for new behavior and important failure modes" and "Test behavior boundaries, not implementation mechanics."

Also applies to: 58-67

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/clouds/aws/route53_cleanup_test.go` around lines 33 - 37, The delete
test currently verifies record deletion but not the new hosted-zone ID trimming
behavior. Update fakeR53.ChangeResourceRecordSets to capture the incoming
HostedZoneId from the Route53 request, then assert in
TestDeleteResourceRecordSet that the value passed is the trimmed form "Z123"
rather than the original "/hostedzone/..." string. Use the existing fakeR53 and
TestDeleteResourceRecordSet symbols to keep the check focused on the new
behavior.

Source: Coding guidelines

src/pkg/clouds/aws/route53.go (1)

191-216: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Wrap Route53 SDK errors with operation context.

Both helpers currently return raw SDK errors. When cleanup fails, that loses which zone or record operation failed. Wrap with %w and include the zone ID / record identity so callers keep AWS detail plus actionable context. As per coding guidelines, "Preserve cloud-provider error detail while wrapping errors with operation context in cloud SDK wrappers."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/clouds/aws/route53.go` around lines 191 - 216, Wrap the raw Route53
SDK errors in the helper functions that list and delete records so callers get
both AWS detail and operation context. In the record-listing logic that calls
ListResourceRecordSets, return a wrapped error that includes the zone identifier
and that listing failed; in DeleteResourceRecordSet, wrap the
ChangeResourceRecordSets error with the trimmed hosted zone ID and record
identity (name/type/set info from the ResourceRecordSet). Use the existing
function names ListResourceRecordSets and DeleteResourceRecordSet to locate the
changes and preserve the original error with %w.

Source: Coding guidelines

@lionello lionello changed the title feat(aws): add cleanup tool to AI feat: auto-trigger debugger in CI Jun 30, 2026
@lionello lionello marked this pull request as draft June 30, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant