test(e2e): prewarm environment pool to hide per-spec deploy latency by AlinsRan · Pull Request #425 · api7/api7-ingress-controller

AlinsRan · 2026-06-15T06:20:22Z

Each APISIX-standalone spec deploys a full environment (data plane, controller, httpbin, port-forwards) synchronously in BeforeEach, so the ~30-80s deploy/readiness latency sits on every spec's critical path.

What this does

Adds a per-process prewarm pool: a small double-buffer of ready environments is provisioned by background workers, so while one spec runs, the next environment is built concurrently. BeforeEach then picks up a ready environment instead of deploying inline.

Per-spec namespace isolation is preserved.
The Deployer interface is untouched.
Only the default profile is pooled; webhook/custom-keyed environments and any provisioning failure fall back to synchronous deploy.
Stream-route specs (Gateway API TCP/TLS/UDP and ApisixRoute stream routes) also fall back to synchronous deploy: a stream data plane served from a prewarmed environment is sensitive to the background provisioning contention.
Tunnels are (re)created at acquire time against the fully-provisioned data plane (reusing createDataplaneTunnels/createAdminTunnel), so a port-forward is never opened before its listener is serving and then left idle in the buffer.

Knobs

E2E_PREWARM — on by default; set false to disable (kill switch).
E2E_PREWARM_DEPTH — pool depth per profile per process, default 1 (double buffer).

The cluster footprint is governed by E2E_NODES × (1 + depth); at the default depth this adds only one in-flight environment per process.

Also included

Pod-readiness polling now uses a fixed 2s interval instead of an exponential backoff that polled sparsely (7.5/15.5/31.5s) exactly during the 10-30s window pods become ready — worth up to ~15s per wait on its own.

Files

scaffold/envpool.go (new) — generic, provider-agnostic pool.
scaffold/apisix_prewarm.go (new) — default-profile provisioning + load onto scaffold.
scaffold/apisix_deployer.go — BeforeEach prewarm fast path + sync fallback.
scaffold/scaffold.go, apisix/e2e_test.go — const + AfterSuite pool teardown.
framework/k8s.go — readiness polling fix + EnsureServiceReadyE.

Validation

go build ./test/..., go vet, gofmt all clean. Live-cluster behavior and peak resource usage need this PR's CI run to validate; E2E_PREWARM=false is the kill switch.

Summary by CodeRabbit

Tests
- Implemented background prewarming pools to accelerate e2e test startup and reduce wait times
- Added automatic cleanup of test environments and resources upon suite completion
- Enhanced service readiness monitoring with improved error handling and configurable retry strategies
- Optimized test setup with environment pooling support and configurable pool depth
- Strengthened test isolation and infrastructure resource management

Each APISIX-standalone spec currently deploys a full environment (data plane, controller, httpbin, port-forwards) synchronously in BeforeEach, so the ~30-80s deploy/readiness latency sits on every spec's critical path. Add a per-process prewarm pool: a small double-buffer of ready environments is provisioned by background workers, so while one spec runs the next environment is built concurrently. BeforeEach picks up a ready environment instead of deploying inline. Per-spec namespace isolation is preserved and the Deployer interface is untouched. Details: - envpool.go: generic, provider-agnostic pool (channel + workers, panic-safe background provisioning via a minimal terratest TestingT, AfterSuite cleanup registered by the suite root rather than the shared scaffold package). - apisix_prewarm.go: error-style provisioning of the default profile and loading it onto the scaffold. Tunnels are (re)created at acquire time against the fully-provisioned data plane, reusing createDataplaneTunnels/ createAdminTunnel, so a port-forward is never opened before its listener is serving and then left idle in the buffer. - Only the default profile is pooled; webhook/custom-keyed environments and any provisioning failure fall back to synchronous deploy. Stream-route specs (TCP/TLS/UDP and ApisixRoute stream routes) also fall back, since a stream data plane served from a prewarmed environment is sensitive to the background provisioning contention. - Knobs: E2E_PREWARM (default on), E2E_PREWARM_DEPTH (default 1). - k8s.go: poll pod readiness at a fixed 2s interval instead of an exponential backoff that polled sparsely during the 10-30s window pods become ready.

CLAassistant · 2026-06-15T06:20:32Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Foo Bar seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-06-15T06:20:37Z

📝 Walkthrough

Walkthrough

Introduces a background environment prewarming pool for e2e tests. A new envpool.go manages worker goroutines that provision complete APISIX Kubernetes environments ahead of time. apisix_prewarm.go adds eligibility checks, provisioning functions, and deployer wiring. BeforeEach gains a fast path to acquire pooled environments. A suite-level AfterSuite hook calls ShutdownAllPools for cleanup.

Changes

E2E Environment Prewarming Pool

Layer / File(s)	Summary
Pool infrastructure: envPool, workers, and lifecycle `test/e2e/scaffold/envpool.go`	Defines `envPool` with background provisioning workers, panic-to-error wrapping via `safeProvision`, namespace-deletion cleanup in `destroyPooledEnv`, a global profile-keyed pool registry via `getOrStartPool`, exported `ShutdownAllPools`, and `E2E_PREWARM`/`E2E_PREWARM_DEPTH` env-var configuration helpers.
Prewarm eligibility checks and environment provisioning `test/e2e/scaffold/apisix_prewarm.go`	Adds `specPrewarmable` (excludes stream-route/Gateway TCP/TLS/UDP specs by label and container hierarchy), `isPoolable`/`profileKey`/`formatRegistry` helpers, and the full provisioning chain: `provisionAPISIXEnv` → `provisionDataplane` (with optional etcd apply) + `provisionIngress` (webhook disabled, controller-manager pod wait) + `provisionHTTPBIN` (deployment apply, endpoint readiness check).
Deployer BeforeEach fast path and loadPooledEnv `test/e2e/scaffold/apisix_deployer.go`, `test/e2e/scaffold/apisix_prewarm.go`	`BeforeEach` attempts pool acquisition and calls `loadPooledEnv` on success, falling back to `beforeEachSync` on failure. `loadPooledEnv` copies controller/admin/dataplane/httpbin settings into the deployer, clears finalizers, and creates tunnels.
Framework support, defaultProfileName, and suite teardown `test/e2e/framework/k8s.go`, `test/e2e/scaffold/scaffold.go`, `test/e2e/apisix/e2e_test.go`	Adds `EnsureServiceReadyE` (error-returning service readiness used by `provisionHTTPBIN`), changes `waitExponentialBackoff` to a fixed 2s/90-step schedule, updates `NewScaffold` to use the `defaultProfileName` constant, and registers the `AfterSuite` hook calling `ShutdownAllPools`.

Sequence Diagram(s)

sequenceDiagram
  participant BeforeEach as APISIXDeployer.BeforeEach
  participant getOrStartPool
  participant envPool
  participant provisionAPISIXEnv
  participant loadPooledEnv
  participant AfterSuite

  rect rgba(100, 149, 237, 0.5)
    Note over BeforeEach,loadPooledEnv: Fast path (prewarm enabled, spec eligible)
    BeforeEach->>getOrStartPool: isPoolable + specPrewarmable check
    getOrStartPool->>envPool: start workers if new profile
    envPool->>provisionAPISIXEnv: background goroutine provisions env
    provisionAPISIXEnv-->>envPool: pooledEnv buffered in channel
    BeforeEach->>envPool: acquire() → pooledEnv
    BeforeEach->>loadPooledEnv: wire env into deployer (tunnels, settings)
    loadPooledEnv-->>BeforeEach: return (test proceeds)
  end

  rect rgba(255, 160, 122, 0.5)
    Note over AfterSuite,envPool: Suite teardown
    AfterSuite->>ShutdownAllPools: registered via Ginkgo AfterSuite
    ShutdownAllPools->>envPool: shutdown() for each pool
    envPool->>destroyPooledEnv: delete namespace of buffered envs
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: introducing a prewarm environment pool for e2e tests to hide per-spec deployment latency, which matches the core objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
E2e Test Quality Review	✅ Passed	PR introduces test infrastructure optimization (e2e pool prewarming), not new tests. Code has comprehensive error handling, proper concurrency safety (mutex/waitgroup/channels), correct fallback lo...
Security Check	✅ Passed	Comprehensive security review of all 7 categories completed. No sensitive data exposure, secrets handling is appropriate for test infrastructure, proper resource isolation with unique namespaces, a...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/e2e/scaffold/apisix_deployer.go (1)
111-111: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Admin API keys logged in plaintext at multiple locations. Both locations log the APISIX admin API key without redaction. Per coding guidelines, API keys should not be logged without redaction, even in test infrastructure, as logs may be accessible in CI systems.

test/e2e/scaffold/apisix_deployer.go#L111-L111: Redact or remove the admin key log in beforeEachSync

test/e2e/scaffold/apisix_deployer.go#L395-L395: Redact or remove the admin key log in CreateAdditionalGatewayWithOptions
Suggested fix
-	s.Logf("apisix admin api key: %s", s.runtimeOpts.APISIXAdminAPIKey)
+	// Redact middle portion for debugging while protecting the full key
+	key := s.runtimeOpts.APISIXAdminAPIKey
+	if len(key) > 8 {
+		s.Logf("apisix admin api key: %s...%s", key[:4], key[len(key)-4:])
+	}
As per coding guidelines: "Do not log, serialize, or return API keys, tokens... without redaction".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/scaffold/apisix_deployer.go` at line 111, API keys are being logged
in plaintext at two locations in test/e2e/scaffold/apisix_deployer.go, violating
coding guidelines. In the `beforeEachSync` function at line 111-111, redact or
remove the s.runtimeOpts.APISIXAdminAPIKey from the Logf call. Similarly, in the
`CreateAdditionalGatewayWithOptions` function at line 395-395, redact or remove
the admin API key from the corresponding log statement. Either remove these log
lines entirely or redact the sensitive key value (for example, by logging only a
partial hash or masked value) before logging to ensure API keys are never
exposed in plaintext in test logs.
Source: Coding guidelines

🧹 Nitpick comments (2)

test/e2e/framework/k8s.go (1)

299-310: 💤 Low value

Function name waitExponentialBackoff no longer matches behavior.

The function now uses a fixed 2s interval rather than exponential backoff. Consider renaming to waitWithBackoff or waitFixedBackoff to avoid confusion.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/framework/k8s.go` around lines 299 - 310, The function
`waitExponentialBackoff` is misleadingly named since it no longer implements
exponential backoff behavior. The backoff configuration uses Factor: 1, which
means the duration stays fixed at 2 seconds throughout all 90 steps, rather than
growing exponentially. Rename the function `waitExponentialBackoff` to a name
that accurately reflects its fixed-interval behavior, such as `waitWithBackoff`
or `waitFixedBackoff`. Make sure to update the function name definition and any
call sites that reference this function.

test/e2e/scaffold/envpool.go (1)

152-159: 💤 Low value

Consider logging namespace deletion failures for debugging.

destroyPooledEnv discards the error from RunKubectlE. While this is cleanup code where failures aren't actionable, logging the error would help diagnose orphaned namespace issues in CI.

 func destroyPooledEnv(env *pooledEnv) {
 	if env == nil {
 		return
 	}
 	if env.namespace != "" && env.kubectlOptions != nil {
-		_ = k8s.RunKubectlE(&bgTestingT{}, env.kubectlOptions, "delete", "namespace", env.namespace, "--wait=false")
+		if err := k8s.RunKubectlE(&bgTestingT{}, env.kubectlOptions, "delete", "namespace", env.namespace, "--wait=false"); err != nil {
+			fmt.Printf("prewarm: failed to delete namespace %s: %v\n", env.namespace, err)
+		}
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/scaffold/envpool.go` around lines 152 - 159, The `destroyPooledEnv`
function discards the error returned by `k8s.RunKubectlE` using the blank
identifier, which prevents visibility into namespace deletion failures. Instead
of ignoring this error, capture the error return value from the `RunKubectlE`
call and log it when deletion fails. This will help diagnose orphaned namespace
issues during CI debugging, even though the error may not be actionable during
cleanup.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/e2e/scaffold/apisix_deployer.go`:
- Line 111: API keys are being logged in plaintext at two locations in
test/e2e/scaffold/apisix_deployer.go, violating coding guidelines. In the
`beforeEachSync` function at line 111-111, redact or remove the
s.runtimeOpts.APISIXAdminAPIKey from the Logf call. Similarly, in the
`CreateAdditionalGatewayWithOptions` function at line 395-395, redact or remove
the admin API key from the corresponding log statement. Either remove these log
lines entirely or redact the sensitive key value (for example, by logging only a
partial hash or masked value) before logging to ensure API keys are never
exposed in plaintext in test logs.

---

Nitpick comments:
In `@test/e2e/framework/k8s.go`:
- Around line 299-310: The function `waitExponentialBackoff` is misleadingly
named since it no longer implements exponential backoff behavior. The backoff
configuration uses Factor: 1, which means the duration stays fixed at 2 seconds
throughout all 90 steps, rather than growing exponentially. Rename the function
`waitExponentialBackoff` to a name that accurately reflects its fixed-interval
behavior, such as `waitWithBackoff` or `waitFixedBackoff`. Make sure to update
the function name definition and any call sites that reference this function.

In `@test/e2e/scaffold/envpool.go`:
- Around line 152-159: The `destroyPooledEnv` function discards the error
returned by `k8s.RunKubectlE` using the blank identifier, which prevents
visibility into namespace deletion failures. Instead of ignoring this error,
capture the error return value from the `RunKubectlE` call and log it when
deletion fails. This will help diagnose orphaned namespace issues during CI
debugging, even though the error may not be actionable during cleanup.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 81665ee1-05a8-4dab-9236-a9367709a5d3

📥 Commits

Reviewing files that changed from the base of the PR and between ecbb5d5 and dc63f19.

📒 Files selected for processing (6)

test/e2e/apisix/e2e_test.go
test/e2e/framework/k8s.go
test/e2e/scaffold/apisix_deployer.go
test/e2e/scaffold/apisix_prewarm.go
test/e2e/scaffold/envpool.go
test/e2e/scaffold/scaffold.go

AlinsRan · 2026-06-15T07:46:23Z

Reopening as a same-repository PR: fork-based PRs cannot access the registry login secret, so every e2e job fails at the "Login to Registry" setup step before any test runs. Superseded by a branch pushed directly to this repo.

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

AlinsRan closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): prewarm environment pool to hide per-spec deploy latency#425

test(e2e): prewarm environment pool to hide per-spec deploy latency#425
AlinsRan wants to merge 1 commit into
api7:masterfrom
AlinsRan:feat/e2e-prewarm-pool

AlinsRan commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

CLAassistant commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

AlinsRan commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlinsRan commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Knobs

Also included

Files

Validation

Summary by CodeRabbit

Uh oh!

CLAassistant commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

AlinsRan commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlinsRan commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading