Skip to content

fix: workflows use 'onExit: stop' handler to catch all errors during create#1543

Merged
tommartensen merged 13 commits into
masterfrom
tm/onExit-workflows
Apr 24, 2025
Merged

fix: workflows use 'onExit: stop' handler to catch all errors during create#1543
tommartensen merged 13 commits into
masterfrom
tm/onExit-workflows

Conversation

@tommartensen
Copy link
Copy Markdown
Contributor

@tommartensen tommartensen commented Mar 28, 2025

Major changes:

  • change all workflows to run their destroy template in the onExit handler, reducing leakages
  • enhance e2e simulate workflow to upload a file to GCS in create and delete it in destroy, ensuring that destroy runs as intended in onExit handler.

Other changes:

  • use a bigger cluster in PRs, because the testing with GCS showed performance bottlenecks for workflows :(
  • cleanup smoke test list
  • new default: skip create-delay-seconds and delete-delay-seconds in e2e simulate workflow, saving 20 seconds for many e2e tests

@tommartensen tommartensen self-assigned this Mar 28, 2025
@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Mar 28, 2025

A single node development cluster (infra-pr-1543) was allocated in production infra for this PR.

CI will attempt to deploy quay.io/rhacs-eng/infra-server:0.11.17-13-g22d1263c55 to it.

🔌 You can connect to this cluster with:

gcloud container clusters get-credentials infra-pr-1543 --zone us-central1-a --project acs-team-temp-dev

🛠️ And pull infractl from the deployed dev infra-server with:

nohup kubectl -n infra port-forward svc/infra-server-service 8443:8443 &
make pull-infractl-from-dev-server

🚲 You can then use the dev infra instance e.g.:

bin/infractl -k -e localhost:8443 whoami

⚠️ Any clusters that you start using your dev infra instance should have a lifespan shorter then the development cluster instance. Otherwise they will not be destroyed when the dev infra instance ceases to exist when the development cluster is deleted. ⚠️

Further Development

☕ If you make changes, you can commit and push and CI will take care of updating the development cluster.

🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with:

make helm-deploy

Logs

Logs for the development infra depending on your @redhat.com authuser:

Or:

kubectl -n infra logs -l app=infra-server --tail=1 -f

@tommartensen tommartensen marked this pull request as ready for review April 16, 2025 14:30
@tommartensen tommartensen requested a review from a team as a code owner April 16, 2025 14:30
@tommartensen tommartensen enabled auto-merge (squash) April 16, 2025 14:30
@tommartensen tommartensen disabled auto-merge April 16, 2025 14:31
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}

if [[ "{{ "{{" }}inputs.parameters.test-gcs{{ "}}" }}" == "true" ]]; then
upload_or_delete_gcs_object
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using an external semaphore, in gcs, is a great idea. 👍

@tommartensen
Copy link
Copy Markdown
Contributor Author

I have smoke-tested all flavors manually, going to merge now.

@tommartensen tommartensen merged commit 646d19c into master Apr 24, 2025
10 checks passed
@tommartensen tommartensen deleted the tm/onExit-workflows branch April 24, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants