test: log ns deletion timestamp in e2e to debug resource cleanup flakiness#1085
test: log ns deletion timestamp in e2e to debug resource cleanup flakiness#1085jwtty wants to merge 6 commits intoAzure:mainfrom
Conversation
|
|
||
| workResourcesRemovedActual := workNamespaceRemovedFromClusterActual(memberCluster) | ||
| Eventually(workResourcesRemovedActual, eventuallyDuration, eventuallyInterval).Should(Succeed(), "Failed to remove work resources from member cluster %s", memberCluster.ClusterName) | ||
| Eventually(workResourcesRemovedActual, 3*eventuallyDuration, eventuallyInterval).Should(Succeed(), "Failed to remove work resources from member cluster %s", memberCluster.ClusterName) |
There was a problem hiding this comment.
this seems to be a regression on the clean up. We should not blindly relax the timeout. Instead, we should fix the root cause which is member agent GC slow
|
another case probably easy to repro locally, need to see what actually happened on the GC |
a6d0e64 to
0e03798
Compare
This one should be related to the bug fixed in #1096. The test didn't delete the namespace on the hub cluster at all. This timeout happens when we indeed try to delete the namespace but it times out waiting for GC. |
Title(Describe updated until commit ada18dc)test: log ns deletion timestamp in e2e to debug resource cleanup flakiness User descriptionDescription of your changesResource cleanup relies on garbage collection of child resources when their owner (AppliedWork) is deleted. The controllers do not wait for the clean up to fully complete to return. Our e2e test is flaky when checking the resources are completely deleted as current 10s wait time is not enough. Sample failures: https://github.com/Azure/fleet/actions/runs/13911855578/job/38927583698 This PR logs the ns deletion timestamp to help debug. Fixes # I have:
How has this code been testedSpecial notes for your reviewerPR TypeEnhancement Description
Changes walkthrough 📝
|
PR Reviewer Guide 🔍(Review updated until commit ada18dc)Here are some key observations to aid the review process:
|
|
Persistent review updated to latest commit e23caca |
|
Persistent review updated to latest commit ada18dc |
|
Hi Wantong! I am closing this PR as part of the CNCF repo migration process; please consider moving (re-creating) this PR in the new repo once the sync PR is merged. If there's any question/concern, please let me know. Thanks 🙏 |
Description of your changes
Resource cleanup relies on garbage collection of child resources when their owner (AppliedWork) is deleted. The controllers do not wait for the clean up to fully complete to return. Our e2e test is flaky when checking the resources are completely deleted as current 10s wait time is not enough. Sample failures:
https://github.com/Azure/fleet/actions/runs/13911855578/job/38927583698
https://github.com/Azure/fleet/actions/runs/13840189622/job/38725462003?pr=1079
This PR logs the ns deletion timestamp to help debug.
Fixes #
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer