Skip to content

✨ e2e: isolate tests with per-scenario dynamic catalogs#2651

Open
joelanford wants to merge 7 commits intooperator-framework:mainfrom
joelanford:e2e-test-isolation-pr
Open

✨ e2e: isolate tests with per-scenario dynamic catalogs#2651
joelanford wants to merge 7 commits intooperator-framework:mainfrom
joelanford:e2e-test-isolation-pr

Conversation

@joelanford
Copy link
Copy Markdown
Member

@joelanford joelanford commented Apr 15, 2026

Description

Replace shared, static test catalogs with dynamically generated per-scenario catalogs and bundles. Each scenario now builds and pushes its own OCI images at runtime, parameterized by scenario ID, making all cluster-scoped resource names unique. This eliminates cascading failures and enables future parallel test execution (~1,950 lines deleted net).

Commit 1 — Design spec: Add design spec documenting the test isolation architecture, builder API, feature file conventions, and naming scheme.

Commit 2 — Move test helpers: Relocate bundlefs and csv builders from internal/operator-framework/rukpak/util/testing/ to internal/testing/bundle/.

Commit 3 — Add new libraries: Composable catalog/bundle builder API (test/internal/catalog/) using go-containerregistry and Go-based registry deployment (test/internal/registry/). Adds test/internal/... to unit test coverage.

Commit 4 — Convert all tests: Update all feature files, step definitions, hooks, and README to use dynamic per-scenario catalogs with ${PACKAGE:...} and ${CATALOG:...} variable substitution. Remove image-registry dependency from test-e2e and test-experimental-e2e targets.

Commit 5 — Extension-developer-e2e: Add TestMain to deploy registry and run setup.sh from Go. Constants in Go are the single source of truth for catalog tag and package name (passed to setup.sh via env vars). Clean up Makefile exports.

Commit 6 — Remove old infrastructure: Delete static bundles, catalogs, templates, push tooling, dead code (test/helpers/, testdata/build-test-registry.sh), and the image-registry Makefile target.

Commit 7 — Port-forward for registry access: Replace localhost:30000 hostPort-based registry access with Kubernetes port-forwarding via SPDY. Remove NodePort service (now ClusterIP), kind extraPortMappings for port 30000, containerd hosts.toml, and LOCAL_REGISTRY_HOST env var. In extension-developer-e2e, images are exported from the container runtime and pushed via crane through the port-forward, avoiding Docker daemon network context issues.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

Generated with Claude Code

Copilot AI review requested due to automatic review settings April 15, 2026 01:55
@openshift-ci openshift-ci bot requested review from grokspawn and oceanc80 April 15, 2026 01:55
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 15, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit b7a6f66
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/69e1484041363c0009bc3f09
😎 Deploy Preview https://deploy-preview-2651--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelanford for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reworks the E2E testing approach to eliminate shared, static test catalogs by generating and pushing per-scenario bundles/catalogs at runtime (with scenario-specific naming), enabling safer isolation and paving the way for parallel execution. It also removes the legacy “push static fixtures into a registry” tooling and relocates bundle/CSV testing helpers into a dedicated internal testing package.

Changes:

  • Add per-scenario catalog/bundle builder (test/internal/catalog) and in-cluster registry deploy helper (test/internal/registry) to support dynamic OCI image generation/push.
  • Convert Gherkin features and step implementations to use dynamic catalogs and ${SCENARIO_ID} / ${PACKAGE:...} / ${CATALOG:...} substitution.
  • Remove legacy static catalog/bundle fixtures and push tooling under testdata/, and update Makefile targets accordingly.

Reviewed changes

Copilot reviewed 77 out of 79 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
testdata/push/push.go Removed legacy image-building/pushing tool for static test fixtures.
testdata/push/go.sum Removed legacy module deps for the deleted push tool.
testdata/push/go.mod Removed legacy module for the deleted push tool.
testdata/push/README.md Removed documentation for the deleted push tool.
testdata/images/catalogs/test-catalog/v2/configs/catalog.yaml Removed static catalog fixture content.
testdata/images/catalogs/test-catalog/v2/configs/.indexignore Removed static catalog fixture ignore file.
testdata/images/catalogs/test-catalog/v1/configs/catalog.yaml Removed static catalog fixture content.
testdata/images/catalogs/test-catalog/v1/configs/.indexignore Removed static catalog fixture ignore file.
testdata/images/bundles/test-operator/v1.2.0/metadata/properties.yaml Removed static bundle fixture properties.
testdata/images/bundles/test-operator/v1.2.0/metadata/annotations.yaml Removed static bundle fixture annotations.
testdata/images/bundles/test-operator/v1.2.0/manifests/testoperator.networkpolicy.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.2.0/manifests/testoperator.clusterserviceversion.yaml Removed static bundle fixture CSV.
testdata/images/bundles/test-operator/v1.2.0/manifests/script.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.2.0/manifests/olm.operatorframework.com_olme2etest.yaml Removed static bundle fixture CRD.
testdata/images/bundles/test-operator/v1.2.0/manifests/bundle.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.0.3/metadata/annotations.yaml Removed static bundle fixture annotations.
testdata/images/bundles/test-operator/v1.0.3/manifests/testoperator.clusterserviceversion.yaml Removed static bundle fixture CSV.
testdata/images/bundles/test-operator/v1.0.3/manifests/bundle.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.0.2/metadata/annotations.yaml Removed static bundle fixture annotations.
testdata/images/bundles/test-operator/v1.0.2/manifests/testoperator.networkpolicy.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.0.2/manifests/testoperator.clusterserviceversion.yaml Removed static bundle fixture CSV.
testdata/images/bundles/test-operator/v1.0.2/manifests/olm.operatorframework.com_olme2etest.yaml Removed static bundle fixture CRD.
testdata/images/bundles/test-operator/v1.0.2/manifests/bundle.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.0.0/metadata/annotations.yaml Removed static bundle fixture annotations.
testdata/images/bundles/test-operator/v1.0.0/manifests/testoperator.networkpolicy.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.0.0/manifests/testoperator.clusterserviceversion.yaml Removed static bundle fixture CSV.
testdata/images/bundles/test-operator/v1.0.0/manifests/script.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.0.0/manifests/olm.operatorframework.com_olme2etest.yaml Removed static bundle fixture CRD.
testdata/images/bundles/test-operator/v1.0.0/manifests/dummy.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/test-operator/v1.0.0/manifests/bundle.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/single-namespace-operator/v1.0.0/metadata/annotations.yaml Removed static bundle fixture annotations.
testdata/images/bundles/single-namespace-operator/v1.0.0/manifests/singlenamespaceoperator.clusterserviceversion.yaml Removed static bundle fixture CSV.
testdata/images/bundles/single-namespace-operator/v1.0.0/manifests/olm.operatorframework.com_singlenamespaces.yaml Removed static bundle fixture CRD.
testdata/images/bundles/own-namespace-operator/v1.0.0/metadata/annotations.yaml Removed static bundle fixture annotations.
testdata/images/bundles/own-namespace-operator/v1.0.0/manifests/ownnamespaceoperator.clusterserviceversion.yaml Removed static bundle fixture CSV.
testdata/images/bundles/own-namespace-operator/v1.0.0/manifests/olm.operatorframework.com_ownnamespaces.yaml Removed static bundle fixture CRD.
testdata/images/bundles/large-crd-operator/v1.0.0/metadata/annotations.yaml Removed static bundle fixture annotations.
testdata/images/bundles/large-crd-operator/v1.0.0/manifests/script.configmap.yaml Removed static bundle fixture manifest.
testdata/images/bundles/large-crd-operator/v1.0.0/manifests/largecrdoperator.clusterserviceversion.yaml Removed static bundle fixture CSV.
testdata/build-test-registry.sh Removed legacy script that deployed registry + push job for static fixtures.
testdata/Dockerfile Removed legacy image build that packaged static fixtures and push binary.
test/upgrade-e2e/features/operator-upgrade.feature Updated upgrade E2E feature to use dynamic per-scenario catalogs/packages.
test/internal/registry/registry.go Added Go helper to deploy an in-cluster TLS-enabled registry via SSA for tests.
test/internal/catalog/catalog_test.go Added unit tests for bundle/catalog generation logic and parameterization.
test/internal/catalog/catalog.go Added catalog builder that generates/pushes per-scenario bundle + catalog OCI images.
test/internal/catalog/bundle.go Added bundle builder producing parameterized manifests (CRD/CSV/Deploy/etc).
test/helpers/helpers.go Removed legacy E2E helper package (static catalog assumptions).
test/helpers/feature_gates.go Removed legacy feature-gate detection helper package.
test/extension-developer-e2e/setup.sh Switched setup to env-var driven config; adjusted build/push commands.
test/extension-developer-e2e/extension_developer_test.go Added TestMain orchestration for registry + setup; updated catalog ref wiring.
test/e2e/steps/upgrade_steps.go Updated/added upgrade steps for per-scenario catalogs and reconciliation checks.
test/e2e/steps/testdata/test-catalog-template.yaml Removed static catalog template.
test/e2e/steps/testdata/extra-catalog-template.yaml Removed static catalog template.
test/e2e/steps/hooks.go Updated scenario context to track per-scenario catalogs/packages; cleanup changes.
test/e2e/features_test.go Increased Godog scenario concurrency (parallel execution).
test/e2e/features/user-managed-fields.feature Updated feature to use per-scenario names and dynamic catalogs.
test/e2e/features/update.feature Updated feature to use per-scenario names and dynamic catalogs/catalog updates.
test/e2e/features/uninstall.feature Updated feature to use per-scenario names and dynamic catalogs.
test/e2e/features/status.feature Updated feature to use per-scenario names and dynamic catalogs.
test/e2e/features/recover.feature Updated feature to use per-scenario names and dynamic catalogs; updated messages.
test/e2e/features/install.feature Updated feature to use per-scenario names and dynamic catalogs; revised scenarios.
test/e2e/README.md Updated E2E docs for new variable substitution and dynamic catalog architecture.
internal/testing/bundle/fs/bundlefs_test.go Updated tests to new internal testing bundle/fs package path/name.
internal/testing/bundle/fs/bundlefs.go Renamed package to fs (new location/namespace for bundle fs builder).
internal/testing/bundle/csv/builder_test.go Updated tests to new internal testing bundle/csv package path/name.
internal/testing/bundle/csv/builder.go Renamed package to csv (new location/namespace for CSV builder).
internal/operator-controller/rukpak/render/render_test.go Updated imports to use new internal testing bundle/csv builder.
internal/operator-controller/rukpak/render/registryv1/validators/validator_test.go Updated imports to use new internal testing bundle/csv builder.
internal/operator-controller/rukpak/render/registryv1/registryv1_test.go Updated imports to use new internal testing bundle/csv builder.
internal/operator-controller/rukpak/render/registryv1/generators/generators_test.go Updated imports to use new internal testing bundle/csv builder.
internal/operator-controller/rukpak/bundle/source/source_test.go Updated imports to use new internal testing bundle/fs and bundle/csv builders.
internal/operator-controller/config/error_formatting_test.go Updated imports to use new internal testing bundle/csv builder.
internal/operator-controller/config/config_test.go Updated imports to use new internal testing bundle/csv builder.
internal/operator-controller/applier/provider_test.go Updated imports to use new internal testing bundle/fs and bundle/csv builders.
internal/operator-controller/applier/boxcutter_test.go Updated imports to use new internal testing bundle/fs and bundle/csv builders.
docs/superpowers/specs/2026-04-13-e2e-test-isolation-design.md Added design spec documenting the per-scenario dynamic catalog architecture.
Makefile Removed legacy image-registry target; updated env exports and unit-test dirs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +138 to +141
if err := wait.PollUntilContextTimeout(ctx, time.Second, 60*time.Second, true, func(ctx context.Context) (bool, error) {
if err := c.Get(ctx, deployKey, d); err != nil {
return false, nil
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the deployment availability wait loop, c.Get errors are swallowed (return false, nil). This can mask real failures (RBAC/connection issues) and only surface as a timeout. Consider returning the error unless it’s a NotFound, so the caller sees the real root cause quickly.

Copilot uses AI. Check for mistakes.
@joelanford joelanford force-pushed the e2e-test-isolation-pr branch from 5077134 to 6597bfd Compare April 15, 2026 02:03
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 51.98556% with 266 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.06%. Comparing base (dd57c28) to head (b7a6f66).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
test/internal/registry/registry.go 0.00% 137 Missing ⚠️
test/internal/catalog/catalog.go 28.44% 78 Missing ⚠️
test/internal/catalog/bundle.go 83.44% 47 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2651      +/-   ##
==========================================
- Coverage   68.92%   68.06%   -0.86%     
==========================================
  Files         140      144       +4     
  Lines        9905    10563     +658     
==========================================
+ Hits         6827     7190     +363     
- Misses       2566     2855     +289     
- Partials      512      518       +6     
Flag Coverage Δ
e2e 37.36% <ø> (-0.46%) ⬇️
experimental-e2e 52.60% <ø> (+0.21%) ⬆️
unit 53.54% <51.98%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings April 15, 2026 05:22
@joelanford joelanford force-pushed the e2e-test-isolation-pr branch from 6597bfd to 9d85b1b Compare April 15, 2026 05:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 77 out of 79 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dtfranz
Copy link
Copy Markdown
Contributor

dtfranz commented Apr 15, 2026

Have you been able to check that this setup will also work in the downstream environment?

@joelanford
Copy link
Copy Markdown
Member Author

@dtfranz That's a good point. I have not checked that. I'll make a PR downstream as well to see if I broke anything there.

@@ -0,0 +1,103 @@
# E2E Test Isolation: Per-Scenario Catalogs via Dynamic OCI Image Building
Copy link
Copy Markdown
Contributor

@camilamacedo86 camilamacedo86 Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford should we get it merged?
It might be nice have the design of the tests but should this dir be the correct one/
I think it was generated by CLAUDE to address the need.

Copy link
Copy Markdown
Member Author

@joelanford joelanford Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a directory for this kind of thing? I do think we should merge this kind of doc, and I was definitely intending to have this exact discussion.

```
Scenario starts
-> Generate parameterized bundle manifests (CRD names, deployments, etc. include scenario ID)
-> Build + push bundle OCI images to e2e registry via go-containerregistry
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmshort WDYT? Can it work on downstream?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on that! #2651 (comment)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got this PR updated to keep around the essentials needed to avoid (hopefully) avoid breaking downstream. I've also got a new downstream PR that contains no downstream changes (just cherry-pick from upstream, resolve conflicts, and vendor): openshift/operator-framework-operator-controller#700

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I’m okay with the proposed changes.

We should sync with @tmshort to confirm whether there’s any reason this might not work downstream. Additionally, I suggest holding off on merging this until we can coordinate an upstream/downstream sync. That way, we can have a dedicated PR downstream containing only this change, which will make it easier to isolate and address any issues if they arise. PS> I saw either: openshift/operator-framework-operator-controller#699 to evaluate it in downstream which is a great idea 🚀

It also looks like the following doc may have been added unintentionally and should be reviewed/removed if not needed:
docs/superpowers/specs/2026-04-13-e2e-test-isolation-design.md

Lastly, looping in @pedjak for review as well, since this area is now primarily owned by him too.

@joelanford joelanford force-pushed the e2e-test-isolation-pr branch from 9d85b1b to cb8d120 Compare April 16, 2026 16:18
@joelanford
Copy link
Copy Markdown
Member Author

I've got a downstream equivalent PR here to make sure I don't break downstream: openshift/operator-framework-operator-controller#699

This commit has the downstream changes that are required: openshift/operator-framework-operator-controller@1964d6b

We can probably step this in by:

  1. Leaving the old registry setup in place in this PR
  2. Syncing this PR downstream
  3. Update openshift/release top drop the registry image build
  4. Delete the old registry upstream
  5. Sync that PR downstream.

@joelanford joelanford force-pushed the e2e-test-isolation-pr branch from cb8d120 to 1b1ef51 Compare April 16, 2026 17:35
Copilot AI review requested due to automatic review settings April 16, 2026 17:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 82 out of 85 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +109
// Apply TLS certificate
cert := cmapplyconfig.Certificate(secretName, namespace).
WithSpec(cmapplyconfig.CertificateSpec().
WithSecretName(secretName).
WithIsCA(true).
WithDNSNames(
fmt.Sprintf("%s.%s.svc", name, namespace),
fmt.Sprintf("%s.%s.svc.cluster.local", name, namespace),
).
WithPrivateKey(cmapplyconfig.CertificatePrivateKey().
WithRotationPolicy(certmanagerv1.RotationPolicyAlways).
WithAlgorithm(certmanagerv1.ECDSAKeyAlgorithm).
WithSize(256),
).
WithIssuerRef(cmmetaapplyconfig.IssuerReference().
WithName("olmv1-ca").
WithKind("ClusterIssuer").
WithGroup("cert-manager.io"),
),
)
if err := c.Apply(ctx, cert, fieldOwner, client.ForceOwnership); err != nil {
return fmt.Errorf("failed to apply certificate: %w", err)
}

// Apply deployment
deploy := appsv1ac.Deployment(name, namespace).
WithLabels(map[string]string{"app": "registry"}).
WithSpec(appsv1ac.DeploymentSpec().
WithReplicas(1).
WithSelector(metav1ac.LabelSelector().
WithMatchLabels(map[string]string{"app": "registry"}),
).
WithTemplate(corev1ac.PodTemplateSpec().
WithLabels(map[string]string{"app": "registry"}).
WithSpec(corev1ac.PodSpec().
WithContainers(corev1ac.Container().
WithName("registry").
WithImage("registry:3").
WithImagePullPolicy(corev1.PullIfNotPresent).
WithVolumeMounts(corev1ac.VolumeMount().
WithName("certs-vol").
WithMountPath("/certs"),
).
WithEnv(
corev1ac.EnvVar().WithName("REGISTRY_HTTP_TLS_CERTIFICATE").WithValue("/certs/tls.crt"),
corev1ac.EnvVar().WithName("REGISTRY_HTTP_TLS_KEY").WithValue("/certs/tls.key"),
),
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployed registry is configured for TLS via REGISTRY_HTTP_TLS_CERTIFICATE/KEY, but the service port is named "http" and the rest of the test code pushes with crane.Insecure (HTTP). With TLS enabled, HTTP pushes will fail (and in-cluster pulls may also fail unless trust/insecure config is wired up). Either deploy the registry without TLS, or update the push/pull clients to use HTTPS with an appropriate trust policy/CA.

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +157
tag := fmt.Sprintf("%s/bundles/%s:v%s", localRegistry, paramPkgName, bd.version)
if err := crane.Push(img, tag, crane.Insecure); err != nil {
return nil, fmt.Errorf("failed to push bundle image %s: %w", tag, err)
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crane.Push is called with crane.Insecure, which forces plain HTTP. However the in-cluster registry deployment is configured with TLS, so these pushes are likely to fail. Either push over HTTPS with a transport that trusts the registry cert (or skips verification), or deploy the registry in HTTP mode to match crane.Insecure.

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +224
catalogTag := fmt.Sprintf("%s/e2e/%s-catalog-%s:%s", localRegistry, c.name, c.scenarioID, tag)
if err := crane.Push(catalogImg, catalogTag, crane.Insecure); err != nil {
return nil, fmt.Errorf("failed to push catalog image %s: %w", catalogTag, err)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catalog image push also uses crane.Insecure (HTTP). If the registry is serving HTTPS (as configured in test/internal/registry/registry.go), this will fail. Align registry deployment and crane push options (either HTTP everywhere, or HTTPS with trust configuration).

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +135
if cfg.staticBundleDir != "" {
files, err := readBundleDir(cfg.staticBundleDir)
if err != nil {
panic(fmt.Errorf("failed to read static bundle dir %s: %w", cfg.staticBundleDir, err))
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildBundle panics if StaticBundleDir cannot be read. Since this is test support code used from steps, a panic will abort the whole test run and skip cleanup, making failures harder to debug. Prefer returning an error from buildBundle (and threading it through callers) so the scenario fails cleanly with context.

Copilot uses AI. Check for mistakes.
Comment thread test/e2e/features_test.go
Comment on lines 17 to 22
var opts = godog.Options{
Format: "pretty",
Paths: []string{"features"},
Output: colors.Colored(os.Stdout),
Concurrency: 1,
Concurrency: 100,
NoColors: true,
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godog Concurrency is hard-coded to 100. This can easily overwhelm the API server and shared in-cluster components (registry, catalogd/operator-controller) and may cause non-deterministic flakes in CI. Consider defaulting to a conservative value and making higher concurrency opt-in via env/flag (e.g. GODOG_CONCURRENCY) once the suite is proven stable.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +93
// Push images via crane through the port-forward.
// setup.sh tags images with CLUSTER_REGISTRY_HOST. We export them
// from the container runtime via "save" and push through the
// port-forward with crane.
containerRuntime := os.Getenv("CONTAINER_RUNTIME")
bundlePath := "bundles/registry-v1/registry-bundle:v0.0.1"
for _, path := range []string{bundlePath, catalogTag} {
srcRef := fmt.Sprintf("%s/%s", clusterRegistryHost, path)
saveCmd := exec.Command(containerRuntime, "save", srcRef) //nolint:gosec
tarData, err := saveCmd.Output()
if err != nil {
panic(fmt.Sprintf("failed to save image %s: %v", srcRef, err))
}
tag, err := name.NewTag(srcRef)
if err != nil {
panic(fmt.Sprintf("failed to parse tag %s: %v", srcRef, err))
}
img, err := tarball.Image(func() (io.ReadCloser, error) {
return io.NopCloser(bytes.NewReader(tarData)), nil
}, &tag)
if err != nil {
panic(fmt.Sprintf("failed to load image %s from tarball: %v", srcRef, err))
}
pushRef := fmt.Sprintf("%s/%s", localAddr, path)
if err := crane.Push(img, pushRef, crane.Insecure); err != nil {
panic(fmt.Sprintf("failed to push image %s: %v", pushRef, err))
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This push uses crane.Insecure (HTTP). If the in-cluster registry is configured with TLS (as in test/internal/registry/registry.go), the push will fail. Align the registry deployment and crane push options (HTTP registry, or HTTPS push with trust/skip-verify transport).

Copilot uses AI. Check for mistakes.
joelanford and others added 5 commits April 16, 2026 13:55
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joelanford joelanford force-pushed the e2e-test-isolation-pr branch from 1b1ef51 to 742aae9 Compare April 16, 2026 19:45
Replace the localhost:30000 hostPort-based registry access with
Kubernetes port-forwarding. This makes the test runner work regardless
of the cluster network topology (not just kind with extraPortMappings).

- Remove port 30000 extraPortMappings from kind configs
- Add PortForward() to the registry package using SPDY
- Use port-forward in e2e steps and extension-developer-e2e
- Remove LOCAL_REGISTRY_HOST env var (no longer needed)
- Keep NodePort service for containerd on kind nodes (hosts.toml)
Copilot AI review requested due to automatic review settings April 16, 2026 20:36
@joelanford joelanford force-pushed the e2e-test-isolation-pr branch from 742aae9 to b7a6f66 Compare April 16, 2026 20:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 82 out of 85 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/e2e/steps/hooks.go
Comment on lines 200 to +214
@@ -205,5 +211,6 @@ func ScenarioCleanup(ctx context.Context, _ *godog.Scenario, err error) (context
}
}(r)
}
wg.Wait()
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScenarioCleanup spawns one goroutine (and one external kubectl process) per resource deletion, with no concurrency limit. With high scenario concurrency this can create a large burst of parallel kubectl calls and lead to throttling/flakiness. Consider deleting sequentially or using a bounded worker pool/semaphore for deletions.

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +177
func PortForward(ctx context.Context, cfg *rest.Config, namespace, name string) (string, func(), error) {
clientset, err := kubernetes.NewForConfig(cfg)
if err != nil {
return "", nil, fmt.Errorf("failed to create kubernetes client: %w", err)
}

pods, err := clientset.CoreV1().Pods(namespace).List(ctx, metav1.ListOptions{
LabelSelector: "app=registry",
FieldSelector: "status.phase=Running",
})
if err != nil {
return "", nil, fmt.Errorf("failed to list registry pods: %w", err)
}
if len(pods.Items) == 0 {
return "", nil, fmt.Errorf("no running registry pods found in %s", namespace)
}
podName := pods.Items[0].Name
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PortForward ignores the name parameter and selects the first Running pod with label app=registry in the namespace. If multiple registries (or pods sharing that label) exist, this can port-forward to the wrong pod. Consider labeling the Deployment/Pod template with an additional selector including name (or using a label like app.kubernetes.io/name) and using that label selector here.

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +140
// Static bundle: read files from disk without parameterization
if cfg.staticBundleDir != "" {
files, err := readBundleDir(cfg.staticBundleDir)
if err != nil {
panic(fmt.Errorf("failed to read static bundle dir %s: %w", cfg.staticBundleDir, err))
}
return bundleSpec{
version: version,
files: files,
clusterRegistryOverride: cfg.clusterRegistryOverride,
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildBundle panics if reading a StaticBundleDir fails. This makes a missing/mis-typed path crash the entire test process instead of returning a normal error from Catalog.Build. Consider changing buildBundle/readBundleDir to return an error and propagate it up through Catalog.Build.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +76
// Push images via crane through the port-forward.
// setup.sh tags images with CLUSTER_REGISTRY_HOST. We export them
// from the container runtime via "save" and push through the
// port-forward with crane.
containerRuntime := os.Getenv("CONTAINER_RUNTIME")
bundlePath := "bundles/registry-v1/registry-bundle:v0.0.1"
for _, path := range []string{bundlePath, catalogTag} {
srcRef := fmt.Sprintf("%s/%s", clusterRegistryHost, path)
saveCmd := exec.Command(containerRuntime, "save", srcRef) //nolint:gosec
tarData, err := saveCmd.Output()
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMain relies on CONTAINER_RUNTIME but doesn't validate it is set before calling exec.Command(containerRuntime, ...). If it's empty/misconfigured, the failure mode will be unclear. Consider validating CONTAINER_RUNTIME (and other setup.sh-required env vars) early with a clear error message.

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +89
for _, path := range []string{bundlePath, catalogTag} {
srcRef := fmt.Sprintf("%s/%s", clusterRegistryHost, path)
saveCmd := exec.Command(containerRuntime, "save", srcRef) //nolint:gosec
tarData, err := saveCmd.Output()
if err != nil {
panic(fmt.Sprintf("failed to save image %s: %v", srcRef, err))
}
tag, err := name.NewTag(srcRef)
if err != nil {
panic(fmt.Sprintf("failed to parse tag %s: %v", srcRef, err))
}
img, err := tarball.Image(func() (io.ReadCloser, error) {
return io.NopCloser(bytes.NewReader(tarData)), nil
}, &tag)
if err != nil {
panic(fmt.Sprintf("failed to load image %s from tarball: %v", srcRef, err))
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMain loads the entire output of containerRuntime save into memory (tarData) before parsing it with tarball.Image. Image tarballs can be large and this can cause high memory usage or OOM in CI. Prefer streaming (StdoutPipe) into tarball.Image / tarball.Load, or writing to a temp file and reading it back.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants