Skip to content

Commit a5c8d8d

Browse files
Foo BarCopilot
andcommitted
docs: add CI E2E parallel execution design spec
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 39e1c91 commit a5c8d8d

1 file changed

Lines changed: 186 additions & 0 deletions

File tree

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# CI E2E Parallel Execution Design
2+
3+
## Problem
4+
5+
The `API7EE E2E Test` CI workflow (`e2e-test.yml`) takes ~1 hour, far exceeding the 30-minute target. The bottleneck is test case execution: ~199 E2E tests run serially, split across 3 matrix jobs (apisix.apache.org ~89 tests, networking.k8s.io ~81 tests, webhook ~28 tests).
6+
7+
## Approach
8+
9+
Enable ginkgo parallel execution (`--nodes=2`) within each matrix job by converting `BeforeSuite`/`AfterSuite` to ginkgo's `SynchronizedBeforeSuite`/`SynchronizedAfterSuite`. This ensures the API7EE control plane is deployed only once per job while each ginkgo node independently manages its own dashboard connection.
10+
11+
Scope: `e2e-test.yml` only. Other workflows are not changed.
12+
13+
## Architecture
14+
15+
### Current (serial)
16+
17+
```
18+
Kind Cluster
19+
├── api7-ee-e2e namespace (BeforeSuite, once per job)
20+
│ ├── api7ee3-dashboard (NodePort 7080/7443)
21+
│ ├── api7ee3-dp-manager (7943)
22+
│ └── api7-postgresql
23+
└── ingress-apisix-e2e-tests-default-{ns} (each It() BeforeEach)
24+
├── api7ee3-apisix-gateway-mtls (data plane pod)
25+
├── api7-ingress-controller
26+
└── httpbin
27+
28+
Test process → kubectl port-forward → fixed local:{NodePort} → dashboard:7080
29+
```
30+
31+
### Target (parallel, --nodes=2)
32+
33+
```
34+
Kind Cluster
35+
├── api7-ee-e2e namespace (SynchronizedBeforeSuite node 1, once per job)
36+
│ ├── api7ee3-dashboard / dp-manager / postgresql
37+
├── ingress-apisix-e2e-tests-default-{nsA} (node 1 BeforeEach)
38+
│ ├── gateway pod, ingress controller, httpbin
39+
│ └── API7EE Gateway Group A (UUID)
40+
└── ingress-apisix-e2e-tests-default-{nsB} (node 2 BeforeEach, concurrent)
41+
├── gateway pod, ingress controller, httpbin
42+
└── API7EE Gateway Group B (UUID)
43+
44+
ginkgo node 1 process → kubectl port-forward → local:{autoPort1} → dashboard:7080
45+
ginkgo node 2 process → kubectl port-forward → local:{autoPort2} → dashboard:7080
46+
```
47+
48+
## Concurrency Safety Analysis
49+
50+
| Resource | Isolation mechanism | Safe? |
51+
|---|---|---|
52+
| k8s namespace | `ingress-apisix-e2e-tests-{name}-{nanosecond}` ||
53+
| GatewayClass | name = namespace name (unique) ||
54+
| API7EE Gateway Group | `uuid.NewString()` ||
55+
| Ingress controller | controllerName contains namespace ||
56+
| Dashboard API calls | idempotent (UploadLicense, GetAdminKey) ||
57+
| Dashboard tunnel | fixed NodePort used as local port → conflict |**needs fix** |
58+
59+
## Code Changes
60+
61+
### 1. `test/e2e/e2e_test.go`
62+
63+
Replace `BeforeSuite`/`AfterSuite` with the synchronized variants:
64+
65+
```go
66+
SynchronizedBeforeSuite(f.DeployAPI7EE, f.InitNodeConnections)
67+
SynchronizedAfterSuite(f.CloseNodeConnections, f.TeardownInfrastructure)
68+
```
69+
70+
`DeployAPI7EE` (node 1 only): deploys the API7EE control plane once.
71+
`InitNodeConnections` (all nodes): each node creates its own dashboard port-forward tunnel.
72+
`CloseNodeConnections` (all nodes): each node closes its own tunnel.
73+
`TeardownInfrastructure` (node 1 only): no-op for now (cluster is torn down by CI).
74+
75+
### 2. `test/e2e/framework/api7_framework.go`
76+
77+
Split `BeforeSuite` into:
78+
79+
**`DeployAPI7EE() []byte`** (node 1 only):
80+
- Init `API7EELicense` and `dashboardVersion` from env
81+
- Delete and recreate `api7-ee-e2e` namespace
82+
- Helm install `api7ee3` chart
83+
- Wait for pods to be ready (`time.Sleep(1 * time.Minute)`)
84+
- Create a temporary tunnel (with `findFreePort()`)
85+
- Call `UploadLicense()` and `setDpManagerEndpoints()`
86+
- Close the temporary tunnel
87+
- Return `[]byte("ready")`
88+
89+
**`InitNodeConnections(_ []byte)`** (all nodes):
90+
- Init `API7EELicense` from env (needed for per-test `UploadLicense` calls)
91+
- Call `f.newDashboardTunnel()` to create a per-node tunnel
92+
93+
**`CloseNodeConnections()`** (all nodes):
94+
- Call `f.shutdownDashboardTunnel()`
95+
96+
**`TeardownInfrastructure()`** (node 1 only):
97+
- No-op (Kind cluster is deleted by `make kind-down` or CI teardown)
98+
99+
### 3. Fix dashboard tunnel port conflict
100+
101+
`newDashboardTunnel()` currently uses the k8s NodePort value as the local bind port. With parallel processes on the same machine, this causes `address already in use` errors.
102+
103+
Replace fixed-port logic with a `findFreePort()` helper:
104+
105+
```go
106+
func findFreePort() int {
107+
ln, err := net.Listen("tcp", ":0")
108+
if err != nil {
109+
panic(fmt.Sprintf("finding free port: %v", err))
110+
}
111+
port := ln.Addr().(*net.TCPAddr).Port
112+
_ = ln.Close()
113+
return port
114+
}
115+
```
116+
117+
Use `findFreePort()` for both HTTP and HTTPS tunnels:
118+
119+
```go
120+
localHTTPPort := findFreePort()
121+
localHTTPSPort := findFreePort()
122+
_dashboardHTTPTunnel = k8s.NewTunnel(..., localHTTPPort, httpPort)
123+
_dashboardHTTPSTunnel = k8s.NewTunnel(..., localHTTPSPort, httpsPort)
124+
```
125+
126+
Note: there is a small TOCTOU window between `ln.Close()` and `kubectl port-forward` binding the port. In practice this is safe on a CI machine. If it becomes an issue, retry logic can be added.
127+
128+
### 4. `Makefile`
129+
130+
Add `ginkgo-api7ee-e2e-test` target:
131+
132+
```makefile
133+
.PHONY: ginkgo-api7ee-e2e-test
134+
ginkgo-api7ee-e2e-test: adc
135+
@ginkgo -cover -coverprofile=coverage.txt -r --randomize-all --randomize-suites \
136+
--trace --nodes=$(E2E_NODES) --label-filter="$(TEST_LABEL)" ./test/e2e/
137+
```
138+
139+
### 5. `.github/workflows/e2e-test.yml`
140+
141+
Add `install-ginkgo` step. Replace `make e2e-test` with ginkgo parallel invocation:
142+
143+
```yaml
144+
- name: Install ginkgo
145+
run: make install-ginkgo
146+
147+
- name: Run E2E test suite
148+
env:
149+
API7_EE_LICENSE: ${{ secrets.API7_EE_LICENSE }}
150+
PROVIDER_TYPE: api7ee
151+
TEST_LABEL: ${{ matrix.cases_subset }}
152+
TEST_ENV: CI
153+
run: |
154+
if [[ "${{ matrix.cases_subset }}" == "webhook" ]]; then
155+
E2E_NODES=1 make ginkgo-api7ee-e2e-test
156+
else
157+
E2E_NODES=2 make ginkgo-api7ee-e2e-test
158+
fi
159+
```
160+
161+
## Expected Outcome
162+
163+
| Job | Tests | Before | After (N=2) |
164+
|---|---|---|---|
165+
| apisix.apache.org | ~89 | ~45 min | ~22 min |
166+
| networking.k8s.io | ~81 | ~40 min | ~20 min |
167+
| webhook | ~28 | ~10 min | ~10 min (serial) |
168+
| **Total (longest)** | | **~60 min** | **~22-25 min** |
169+
170+
Target of 30 minutes is achieved.
171+
172+
## Risk and Rollback
173+
174+
**Risk**: Resource contention on GitHub-hosted runners (2 CPUs, 7GB RAM). With 2 parallel test stacks (2 gateway pods + 2 ingress controllers + 2 httpbins) plus the shared API7EE control plane, memory usage may approach limits.
175+
176+
**Mitigation**: Start with `E2E_NODES=2`. Monitor CI run times and failure rates. Roll back to `E2E_NODES=1` (equivalent to current behavior) if instability is observed.
177+
178+
**Rollback**: Single line change in the workflow — set `E2E_NODES=1` for all matrix jobs, which is functionally identical to the current `make e2e-test`.
179+
180+
## Testing
181+
182+
After implementation, verify:
183+
1. All 3 matrix jobs pass with the new ginkgo invocation
184+
2. Each parallel node creates an independent namespace and gateway group
185+
3. No port conflicts in dashboard tunnel creation
186+
4. Serial mode (`E2E_NODES=1`) still works for webhook tests

0 commit comments

Comments
 (0)