Skip to content

chore(e2e-next): Migrate e2e_certs to e2e-next framework#3698

Closed
pascalbreuninger wants to merge 1 commit into
mainfrom
migrate/e2e-certs-to-e2e-next
Closed

chore(e2e-next): Migrate e2e_certs to e2e-next framework#3698
pascalbreuninger wants to merge 1 commit into
mainfrom
migrate/e2e-certs-to-e2e-next

Conversation

@pascalbreuninger
Copy link
Copy Markdown
Member

Summary

  • Migrates test/e2e_certs/ (certificate rotation tests) to the e2e-next/test_certs/ framework
  • Consolidates three old Describe blocks into a single Ordered Describe with four Context blocks
  • Adds CertsVCluster cluster definition with dedicated values YAML
  • Registers suite in e2e_suite_test.go with proper setup/cleanup wiring

Test plan

  • go build ./e2e-next/... compiles cleanly
  • go vet ./e2e-next/... passes
  • Run certs e2e suite against a kind cluster
  • Verify old test/e2e_certs/ files are fully removed

🤖 Generated with Claude Code


📋 E2E Migration Validator Notes

Overall: ⚠️ Changes requested

🔴 Must Fix

  • Background proxy container leak in connectVCluster (lines 342-387):
    cli.ConnectHelm with BackgroundProxy: true starts a Docker container.
    The cleanupFn only calls os.Remove(tmpPath) on the kubeconfig file — it never stops
    the Docker container. This function is called 4 times, potentially leaking 4 containers
    per test run. Fix: add container cleanup to the cleanup function, or use BackgroundProxy: false.

🟡 Should Fix

  • Dropped "certs check" coverage: The old test called certs check <name> to verify the check subcommand works both before and after rotation. Add at least one It that calls the check command and verifies it succeeds.

  • labels.PR on a slow, destructive test (line 40): This suite performs multiple cert rotations with pod-readiness waits (2+ minutes). Consider removing labels.PR and running only with labels.Certs in a dedicated job.

🟢 Nice to Have

  • os.Setenv defer scope (lines 161-166): The defer inside the By() closure fires at the end of that anonymous function, not at the end of the It(). A comment would prevent future confusion.
  • vclusterNS and vclusterName could be const instead of var.
  • The connectVCluster helper (45 lines) could be moved to a shared helper package if reused.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pascalbreuninger pascalbreuninger requested review from a team as code owners March 16, 2026 05:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ccdc1d518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +349 to +353
os.Remove(tmpPath)
}

options := &cli.ConnectOptions{
BackgroundProxy: true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop background proxy in connectVCluster cleanup

connectVCluster always turns on BackgroundProxy before calling cli.ConnectHelm, which starts a detached Docker proxy container on hosts where Docker is available, but the returned cleanup only removes the temp kubeconfig file. Because no corresponding proxy teardown is performed, this test leaves a stale vcluster connect proxy container behind after execution, which can consume CI runner resources and interfere with later runs; cleanup should also stop the background proxy (or avoid using BackgroundProxy here).

Useful? React with 👍 / 👎.

caCertBefore *x509.Certificate
)

BeforeAll(func(ctx context.Context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BeforeAll does not return context.Context. Inside an Ordered Describe, BeforeAll should return the enriched context so that It blocks receive any context modifications. In this case the hostClient is a captured var so it works, but the signature is inconsistent with the framework convention.

},
)

// waitForVClusterReady polls until all vCluster pods are running with all containers ready.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing GinkgoHelper() - waitForVClusterReady is a shared assertion helper that should call GinkgoHelper() so failure stack traces point to the call site.
Also it can be moved to some common global helpers so other tests can use it.


// connectVCluster establishes a fresh connection to the vCluster using a background proxy.
// Returns a rest.Config and a cleanup function that removes the temp kubeconfig.
func connectVCluster(ctx context.Context, name, namespace string) (*rest.Config, func()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing GinkgoHelper(), also can be moved to some Helper file

tmpFile.Close()

cleanupFn := func() {
os.Remove(tmpPath)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background proxy container leak - cleanupFn only removes the temp kubeconfig file via os.Remove(tmpPath). cli.ConnectHelm with BackgroundProxy: true starts a Docker container. The cleanup
never stops it. This function is called up to 4 times per run, potentially leaking 4 Docker containers.
Fix (option A - stop the container):

  func connectVCluster(ctx context.Context, name, namespace string) (*rest.Config, func(context.Context)) {
      // ... setup ...
      err = cli.ConnectHelm(ctx, options, globalFlags, name, nil, loftlog.Discard)
      Expect(err).NotTo(HaveOccurred())


      cleanupFn := func(ctx context.Context) {
          // Stop the background proxy container
          disconnectOptions := &cli.DisconnectOptions{}
          _ = cli.DisconnectHelm(ctx, disconnectOptions, name)
          os.Remove(tmpPath)
      }
      // ...
      return cfg, cleanupFn
  }

or avoid containers entirely:

options := &cli.ConnectOptions{
     BackgroundProxy: false,
     KubeConfig:      tmpPath,
 }

@@ -0,0 +1,411 @@
package test_certs
Copy link
Copy Markdown
Contributor

@adriankabala adriankabala Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped "certs check" subcommand coverage - old test called certs check twice

Explanation:

The old test in test/e2e_certs/certs/rotate.go called vcluster certs check twice via the CLI subcommand:

  1. Before rotation (line 703): certsCmd.SetArgs([]string{"check", f.VClusterName}) - inside the It("checking current validity date of CA cert of vCluster") block
  2. After rotation (line 797): certsCmd.SetArgs([]string{"check", f.VClusterName}) - inside the It("priniting new expiry date and time of vCluster CA cert") block (note the typo "priniting" in the original)

Both of these instantiate certscmd.NewCertsCmd(...), set args to ["check", vclusterName], call Execute(), and assert no error. This exercises the vcluster certs check CLI subcommand - verifying it can inspect a vCluster's certificates and
report them as valid.

The new migrated test in e2e-next/test_certs/test_rotate.go dropped both of these calls entirely. It validates certificate fingerprints and expiry dates directly by reading Kubernetes secrets, but it never invokes the certs check CLI
subcommand.

// contexts depend on the state left by prior ones.
var _ = Describe("Certificate Rotation",
Ordered,
labels.PR,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels.PR on a slow/destructive suite (multiple rotations + pod restarts) - maybe this should be in nightly build only?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants