Skip to content

test: Drain tool tests#1104

Closed
Arvindthiru wants to merge 10 commits intoAzure:mainfrom
Arvindthiru:drain-tool-test
Closed

test: Drain tool tests#1104
Arvindthiru wants to merge 10 commits intoAzure:mainfrom
Arvindthiru:drain-tool-test

Conversation

@Arvindthiru
Copy link
Copy Markdown
Contributor

@Arvindthiru Arvindthiru commented Apr 2, 2025

Description of your changes

Fixes #

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Special notes for your reviewer

Arvind Thirumurugan added 2 commits April 10, 2025 15:20
@Arvindthiru Arvindthiru marked this pull request as ready for review April 14, 2025 21:54
Copy link
Copy Markdown
Contributor

@michaelawyu michaelawyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, PTAL

Comment thread test/e2e/setup_test.go
}

// Build drain binary.
buildCmd := exec.Command("go", "build", "-o", drainBinaryPath, filepath.Join("../../", "tools", "draincluster"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! We have a setup.sh file; it might make a lot more sense to just call the build commands there rather than shell out in this Go source code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact these test specs might warrant an environment of its own -> but I guess we could make the move if our test specs continue to grow.

const (
uuidLength = 8
drainEvictionNameFormat = "drain-eviction-%s-%s-%s"
drainEvictionNameFormat = "drain-eviction-%s"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! Collision chances aside (which is minimal I assume), I do think that the prev. format might be more clear to the end users.


It("drain cluster using binary, should succeed", func() { runDrainClusterBinary(hubClusterName, memberCluster1EastProdName, true) })

It("should ensure no resources exist on drained clusters", func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! I suppose we also need a check on the taints (here or as a separate test spec).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we would like to check this a bit closer, an inspection on the Eviction object would also be nice.

"--clusterName", memberClusterName)
output, err := cmd.CombinedOutput()
Expect(err).ToNot(HaveOccurred(), "Uncordon command failed with error: %v\nOutput: %s", err, string(output))
Expect(strings.Contains(string(output), "uncordoned member cluster")).To(BeTrue(), "Expected uncordon to be successful")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! IIRC the error check part already guarantees that the exit code must be 0; using the string match on logs to verify if it has run successfully is a bit error prone I am afraid.

"--clusterName", memberClusterName)
output, err := cmd.CombinedOutput()
Expect(err).ToNot(HaveOccurred(), "Drain command failed with error: %v\nOutput: %s", err, string(output))
if isSuccess {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here; I am wondering if there is a better way to validate this rather than relying on log string matches.


AfterAll(func() {
// Uncordon member cluster 1 again to guarantee clean up of cordon taint on test failure.
runUncordonClusterBinary(hubClusterName, memberCluster1EastProdName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! Just a nit: since the cordon binary itself is a test target, it might be best not to rely it on the cleanup part, just in case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies for a few other test specs I assume.

Expect(hubClient.Create(ctx, &crpdb)).To(Succeed(), "Failed to create CRP Disruption Budget %s", crpName)
})

It("drain cluster using binary, should fail due to CRPDB", func() { runDrainClusterBinary(hubClusterName, memberCluster1EastProdName, false) })
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! Some of the issues mentioned earlier apply to this part as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably would be better if we do a closer inspection (taints, eviction objects) to make sure that the tool is functioning normally rather than just check the logs.


It("uncordon cluster using binary", func() { runUncordonClusterBinary(hubClusterName, memberCluster1EastProdName) })

It("should update cluster resource placement status as expected", func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! Is this step necessary? We really couldn't tell whether uncordoning works even with this step passing.


It("should place resources on all available member clusters", checkIfPlacedWorkResourcesOnAllMemberClusters)
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my $0.02 but it would be interesting to see how this tool (and Fleet) can actually help DR in an actual workflow of migrating workloads between clusters. It's a good showcase too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But obviously we could always add more later.

gotMap, gotErr := h.fetchClusterResourcePlacementNamesToEvict(context.Background())
if tc.wantErr == nil {
if gotErr != nil {
t.Errorf("fetchClusterResourcePlacementNamesToEvict test %s failed, got error %v, want error %v", tc.name, gotErr, tc.wantErr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! Style concerns aside, there are a few branches that really do not need to continue once failure are found (even though they can proceed).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

if wantErred
    if gotErr != nil -> Errorf and return
    compareErrMessages -> Errorf if needed and return

if gotErr -> Errorf and return

cmpMap

would be clearer I assume.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also apply to other cases I guess.

@michaelawyu
Copy link
Copy Markdown
Contributor

Hi Arvind! I am closing this PR as part of the CNCF repo migration process; please consider moving (re-creating) this PR in the new repo once the sync PR is merged. If there's any question/concern, please let me know. Thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants