Skip to content

feat: destroy command — requirements and design spec#128

Merged
Shnekit merged 14 commits into
mainfrom
feat/destroy-command
Apr 15, 2026
Merged

feat: destroy command — requirements and design spec#128
Shnekit merged 14 commits into
mainfrom
feat/destroy-command

Conversation

@Shnekit
Copy link
Copy Markdown
Contributor

@Shnekit Shnekit commented Mar 31, 2026

Overview

This PR adds the requirements and design spec for the new destroy command. No implementation code yet — for team review before implementation begins.

What's included

  • .kiro/specs/destroy-command/requirements.md — 12 requirements covering the full feature scope
  • .kiro/specs/destroy-command/design.md — technical design with architecture, data models, correctness properties, and testing strategy

Feature summary

The destroy command is the inverse of deploy. It reads a manifest file and deletes all resources created by deploy, in a safe dependency order:

  1. Stop active Airflow workflow runs (re-checked at destruction time)
  2. Delete workflow-created resources (Glue jobs) via operator registry
  3. Delete Airflow serverless workflows
  4. Delete QuickSight dashboards, datasets, data sources (by prefix scan)
  5. Delete S3 objects at declared targetDirectory prefixes
  6. Delete DataZone project (only if project.create: true)

Key design decisions

  • Two-phase model: full validation across all stages before any destructive action
  • Collect-all-errors: validation never aborts early — all issues reported together before aborting
  • Operator registry: workflow-created resources (Glue jobs) discovered by parsing workflow YAML and matching operator class names against a registry dict — extensible without if/elif chains
  • Prefix-based QuickSight enumeration: scans by Resource_Prefix rather than reconstructing IDs
  • No --async mode: dependency ordering requires synchronous execution
  • Collision detection: QuickSight prefix scan and Airflow workflow name lookup both abort if more resources found than declared — even with --force

Testing plan

  • Unit tests covering all correctness properties (mocked AWS calls)
  • Integration test: extend existing test_dashboard_glue_quick_workflow.py with destroy + verify steps after the existing deploy + test steps

Review focus

Please review requirements and design for correctness, completeness, and any edge cases missed before implementation begins.

@Shnekit Shnekit requested a review from vasu2856 April 1, 2026 14:57
@Shnekit Shnekit self-assigned this Apr 1, 2026
@Shnekit Shnekit force-pushed the feat/destroy-command branch from 940267d to 22490bd Compare April 2, 2026 17:35
@Shnekit Shnekit had a problem deploying to test-aws-account April 3, 2026 15:58 — with GitHub Actions Failure
Comment thread .kiro/specs/destroy-command/requirements.md Outdated
Comment thread .kiro/specs/destroy-command/requirements.md Outdated
Comment thread .kiro/specs/destroy-command/requirements.md
Comment thread .kiro/specs/destroy-command/requirements.md Outdated

#### Acceptance Criteria

1. THE Destroy_Command SHALL run a full validation cycle across ALL targeted stages before initiating any destructive action or prompting for confirmation. Resource discovery includes live AWS API calls to resolve S3 connection details, enumerate QuickSight resources by prefix, and check for active Airflow workflow runs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the validations include missing permission errors?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See, if we can add them. Otherwise, we can run destroy multiple times

Copy link
Copy Markdown
Contributor Author

@Shnekit Shnekit Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am gonna wait for your dry-run CR to be out and see, if I can use your existing IAM validation logic to destroy resources.

Comment thread .kiro/specs/destroy-command/requirements.md Outdated
Comment thread src/smus_cicd/commands/destroy.py Outdated
Comment thread src/smus_cicd/commands/destroy.py Outdated
)
continue

s3_helper.delete_objects(bucket, object_keys, region=effective_region)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this delete the bundle as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only deletes the deploy S3 target resources. I think bundle is stored somewhere else, so it is not deleted

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deleting everything that deploy command creates. I believe bundle is saved somewhere outside of this path, so it will stay

Comment thread src/smus_cicd/commands/destroy.py
@@ -0,0 +1,51 @@
"""
Operator registry for the destroy command.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping for now, can remove this later, if customers do not want it

)


def list_dashboards(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to list the dashboards? can't we just directly fetch the ones mentioned in the manifest?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the manifest we are only listing the dashboard name. Any account can have any number of identical dashboard names and they can also be changed after creation. So, doing a prefix search is a little safer, though still not entirely deterministic

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if i follow the reasoning. We should use the same logic to identify the resource as the deploy. If deploy looks up using name, destroy should do the same as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy command is actually using dashboard_id look up as well. Dashboard name is used as the name of the exported .qs file (which QuickSight generates itself I think).

@Shnekit Shnekit force-pushed the feat/destroy-command branch from 4dc08b1 to 9a314ff Compare April 13, 2026 18:47
@Shnekit Shnekit requested a review from vasu2856 April 13, 2026 19:32
Shnekit added 14 commits April 15, 2026 12:02
- Requirements: 12 requirements covering CLI interface, validation phase,
  destruction ordering, Airflow workflow deletion, QuickSight prefix-based
  cleanup, S3 object deletion, conditional DataZone project deletion,
  operator registry for workflow-created resources (Glue jobs), idempotency,
  output formatting, and scope boundaries
- Design: two-phase architecture (validate-all-then-destroy), operator registry
  pattern, data models, correctness properties, error handling tables, unit
  test strategy, integration test extension plan, and design decision rationale
  for operator registry vs reverse DAG approach
- Add Bootstrap_Connection deletion (datazone.create_connection bootstrap actions)
- Add catalog resource deletion (glossaries, terms, form types, asset types, assets, data products)
- Refactor S3 deletion to use existing s3_helper instead of inline boto3 calls
- Add resource_type field to OPERATOR_REGISTRY entries (remove hardcoding)
- Derive registry resource types dynamically in _destroy_stage Step b
- Fix Rich markup mismatch in WARNING message (red→yellow)
- Update requirements (Req 13-15), design, and tasks for connections and catalog
- 86 unit tests passing
- Add destroy command section to docs/cli-commands.md (options, examples,
  TEXT/JSON output, two-phase behavior, use cases, notes)
- Update command count from 8 to 9 in cli-commands.md
- Add destroy to command overview table and pipeline commands list
- Expand Clean Up Resources walkthrough to cover both destroy and delete
- Add Rollback Workflow to Common Workflows section in cli-commands.md
- Add destroy cleanup step to README.md Quick Start
- Add Rollback Guide link to README.md documentation section
- Add destroy cleanup step to all 6 translated READMEs (pt, fr, it, ja, zh, he)
- Update examples/full-pipeline-lifecycle.sh Step 8 to use destroy
- Update examples/cleanup-demo-resources.sh to use destroy with explanation
- Add new docs/rollback-guide.md covering manual rollback strategies,
  destroy+redeploy clean slate approach, decision tree, catalog/project
  preservation, GitHub Actions automated rollback workflows, and
  bundle-based rollback setup
… --targets

- Remove delete command entirely (code, tests, docs, CLI registration)
- Replace delete with destroy in integration test cleanup steps
- Make --targets required for destroy (no implicit all-stages)
- Add WARNING banner: destroy deletes all matching resources including
  user-created ones, not only CI/CD-created resources
- Add irreversibility warning in CLI docs
- Remove explicit workflow run stopping from destruction phase;
  MWAA Serverless auto-terminates runs on workflow deletion
- Fetch workflow definitions from MWAA API (get_workflow) instead
  of scanning S3 for YAML files
- Add deploy/destroy drift detection: DEPLOY_RESOURCE_TYPES registry
  in resource_types.py with unit test asserting parity with
  DESTROY_SUPPORTED_RESOURCE_TYPES
- Rename --stages to --targets across all commands (including create)
  and all documentation for consistency
- Fix --targets typo in AWS EventBridge put-targets doc examples
- Consolidate cli-commands.md into single 1-10 command list, remove
  duplicate sections, fix numbering, remove stray content
- Split destroy.py (1656 lines) into four modules:
  destroy.py (282), destroy_models.py (68), destroy_validator.py (568),
  destroy_executor.py (382)
- Update specs (requirements, design, tasks) to reflect all changes
Content from main was lost during cli-commands.md consolidation.
Restores: pre-deployment validation step, --dry-run/--skip-validation
options, dry run mode documentation, catalog asset access details.
- Remove unused imports from destroy.py (F401)
- Apply black formatting to all destroy helper modules
- Update drift test to import directly from destroy_models
- Restore dry-run example reports in cli-commands.md
…, cleanup

- Strip numbered prefixes from all command help strings in cli.py
- Remove chat command (agent.chat_agent module does not exist)
- Remove last delete command reference from cleanup-demo-resources.sh
- Apply black formatting to airflow_serverless.py
- Move resolve_resource_prefix to quicksight.py (shared by deploy+destroy)
- Add resolve_resource_ids_from_overrides to quicksight.py for exact ID
  resolution from manifest overrideParameters (Dashboards/DataSets/DataSources)
- Add find_resources_by_prefix to quicksight.py for paginated prefix scan
- Update deploy to use shared helpers instead of inline boto3 calls
  (fixes missing pagination in deploy's post-import resource discovery)
- Update destroy validator: try exact IDs from overrides first, fall back
  to prefix scan for uncovered types
- Add unit tests for all three new quicksight helper functions
- Remove duplicate resolve_resource_prefix tests from test_destroy.py
- Remove dead chat command, cleanup-demo-resources.sh delete reference
- Apply black formatting to all changed files
Update cli-commands.md with main's dry-run best-effort language:
pre-deployment validation, --dry-run option, dry run mode section,
and pre-deployment validation section all now include caveats about
best-effort nature and conditions that may cause deployment failure
despite clean validation.
@Shnekit Shnekit force-pushed the feat/destroy-command branch from 2d698dc to 14da19c Compare April 15, 2026 16:10
Copy link
Copy Markdown
Contributor

@vasu2856 vasu2856 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Shnekit Shnekit merged commit eef7c05 into main Apr 15, 2026
6 checks passed
@Shnekit Shnekit deleted the feat/destroy-command branch April 15, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants