|
| 1 | +# E2E Test Isolation: Per-Scenario Catalogs via Dynamic OCI Image Building |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +E2E test scenarios previously shared cluster-scoped resources (ClusterCatalogs, CRDs, packages), |
| 6 | +causing cascading failures when one scenario left state behind. Parallelism was impossible because |
| 7 | +scenarios conflicted on shared resource names. |
| 8 | + |
| 9 | +## Solution |
| 10 | + |
| 11 | +Each scenario dynamically builds and pushes its own bundle and catalog OCI images at test time, |
| 12 | +parameterized by scenario ID. All cluster-scoped resource names include the scenario ID, making |
| 13 | +conflicts structurally impossible. |
| 14 | + |
| 15 | +``` |
| 16 | +Scenario starts |
| 17 | + -> Generate parameterized bundle manifests (CRD names, deployments, etc. include scenario ID) |
| 18 | + -> Build + push bundle OCI images to e2e registry via go-containerregistry |
| 19 | + -> Generate FBC catalog config referencing those bundle image refs |
| 20 | + -> Build + push catalog OCI image to e2e registry |
| 21 | + -> Create ClusterCatalog pointing at the catalog image |
| 22 | + -> Run scenario steps |
| 23 | + -> Cleanup all resources (including catalog) |
| 24 | +``` |
| 25 | + |
| 26 | +### Key Properties |
| 27 | + |
| 28 | +- Every cluster-scoped resource name includes the scenario ID -- no conflicts by construction. |
| 29 | +- Failed scenario state is preserved for debugging without affecting other scenarios. |
| 30 | +- Parallelism (`Concurrency > 1`) is safe without further changes. |
| 31 | +- Adding new scenarios requires zero coordination with existing ones. |
| 32 | + |
| 33 | +## Builder API (`test/e2e/catalog/`) |
| 34 | + |
| 35 | +Bundles are defined as components of a catalog. A single `Build()` call builds and pushes |
| 36 | +all bundle images, generates the FBC, and pushes the catalog image: |
| 37 | + |
| 38 | +```go |
| 39 | +cat := catalog.NewCatalog("test", scenarioID, |
| 40 | + catalog.WithPackage("test", |
| 41 | + catalog.Bundle("1.0.0", catalog.WithCRD(), catalog.WithDeployment(), catalog.WithConfigMap()), |
| 42 | + catalog.Bundle("1.2.0", catalog.WithCRD(), catalog.WithDeployment()), |
| 43 | + catalog.Channel("beta", catalog.Entry("1.0.0"), catalog.Entry("1.2.0")), |
| 44 | + ), |
| 45 | +) |
| 46 | +result, err := cat.Build(ctx, "v1", localRegistry, clusterRegistry) |
| 47 | +// result.CatalogName = "test-catalog-{scenarioID}" |
| 48 | +// result.CatalogImageRef = "{clusterRegistry}/e2e/test-catalog-{scenarioID}:v1" |
| 49 | +// result.PackageNames = {"test": "test-{scenarioID}"} |
| 50 | +``` |
| 51 | + |
| 52 | +### Bundle Options |
| 53 | + |
| 54 | +- `WithCRD()` -- CRD with group `e2e-{id}.e2e.operatorframework.io` |
| 55 | +- `WithDeployment()` -- Deployment named `test-operator-{id}` (includes CSV, script ConfigMap, NetworkPolicy) |
| 56 | +- `WithConfigMap()` -- additional test ConfigMap |
| 57 | +- `WithInstallMode(modes...)` -- sets supported install modes on the CSV |
| 58 | +- `WithLargeCRD(fieldCount)` -- CRD with many fields for large bundle testing |
| 59 | +- `WithClusterRegistry(host)` -- overrides the cluster-side registry host (for mirror testing) |
| 60 | +- `StaticBundleDir(dir)` -- reads pre-built bundle manifests without parameterization (e.g. webhook-operator) |
| 61 | +- `BadImage()` -- uses an invalid container image to trigger ImagePullBackOff |
| 62 | +- `WithBundleProperty(type, value)` -- adds a property to bundle metadata |
| 63 | + |
| 64 | +## Feature File Conventions |
| 65 | + |
| 66 | +Feature files define catalogs inline via data tables: |
| 67 | + |
| 68 | +```gherkin |
| 69 | +Background: |
| 70 | + Given OLM is available |
| 71 | + And an image registry is available |
| 72 | + And a catalog "test" with packages: |
| 73 | + | package | version | channel | replaces | contents | |
| 74 | + | test | 1.0.0 | alpha | | CRD, Deployment, ConfigMap | |
| 75 | + | test | 1.0.1 | alpha | 1.0.0 | CRD, Deployment, ConfigMap | |
| 76 | + | test | 1.2.0 | beta | | CRD, Deployment | |
| 77 | +``` |
| 78 | + |
| 79 | +### Variable Substitution |
| 80 | + |
| 81 | +Templates in feature file YAML use these variables: |
| 82 | + |
| 83 | +| Variable | Expansion | Example | |
| 84 | +|----------|-----------|---------| |
| 85 | +| `${NAME}` | ClusterExtension name | `ce-abc123` | |
| 86 | +| `${TEST_NAMESPACE}` | Scenario namespace | `ns-abc123` | |
| 87 | +| `${SCENARIO_ID}` | Unique scenario identifier | `abc123` | |
| 88 | +| `${PACKAGE:<name>}` | Parameterized package name | `test-abc123` | |
| 89 | +| `${CATALOG:<name>}` | ClusterCatalog resource name | `test-catalog-abc123` | |
| 90 | +| `${COS_NAME}` | ClusterObjectSet name | `cos-abc123` | |
| 91 | + |
| 92 | +### Naming Conventions |
| 93 | + |
| 94 | +| Resource | Pattern | |
| 95 | +|----------|---------| |
| 96 | +| CRD group | `e2e-{id}.e2e.operatorframework.io` | |
| 97 | +| Deployment | `test-operator-{id}` | |
| 98 | +| Package name (FBC) | `{package}-{id}` | |
| 99 | +| Bundle image | `{registry}/bundles/{package}-{id}:v{version}` | |
| 100 | +| Catalog image | `{registry}/e2e/{name}-catalog-{id}:{tag}` | |
| 101 | +| ClusterCatalog | `{name}-catalog-{id}` | |
| 102 | +| Namespace | `ns-{id}` | |
| 103 | +| ClusterExtension | `ce-{id}` | |
| 104 | + |
| 105 | +## Registry Access |
| 106 | + |
| 107 | +An in-cluster OCI registry (`test/internal/registry/`) stores bundle and catalog images. |
| 108 | +The registry runs as a ClusterIP Service; there is no NodePort or kind `extraPortMappings`. |
| 109 | + |
| 110 | +The test runner reaches the registry via **Kubernetes port-forward** (SPDY through the API |
| 111 | +server), which works regardless of the cluster's network topology. A `sync.OnceValues` in the |
| 112 | +step definitions starts the port-forward once and returns the dynamically assigned |
| 113 | +`localhost:<port>` address used for all `crane.Push` / `crane.Tag` calls. |
| 114 | + |
| 115 | +In-cluster components (e.g. the catalog unpacker) pull images using the Service DNS name |
| 116 | +(`docker-registry.operator-controller-e2e.svc.cluster.local:5000`), resolved by CoreDNS. |
| 117 | +Containerd on the node is never involved because the registry only holds OCI artifacts |
| 118 | +consumed by Go code, not container images for pods. |
0 commit comments