test(e2e): prewarm environment pool to hide per-spec deploy latency#425
test(e2e): prewarm environment pool to hide per-spec deploy latency#425AlinsRan wants to merge 1 commit into
Conversation
Each APISIX-standalone spec currently deploys a full environment (data plane, controller, httpbin, port-forwards) synchronously in BeforeEach, so the ~30-80s deploy/readiness latency sits on every spec's critical path. Add a per-process prewarm pool: a small double-buffer of ready environments is provisioned by background workers, so while one spec runs the next environment is built concurrently. BeforeEach picks up a ready environment instead of deploying inline. Per-spec namespace isolation is preserved and the Deployer interface is untouched. Details: - envpool.go: generic, provider-agnostic pool (channel + workers, panic-safe background provisioning via a minimal terratest TestingT, AfterSuite cleanup registered by the suite root rather than the shared scaffold package). - apisix_prewarm.go: error-style provisioning of the default profile and loading it onto the scaffold. Tunnels are (re)created at acquire time against the fully-provisioned data plane, reusing createDataplaneTunnels/ createAdminTunnel, so a port-forward is never opened before its listener is serving and then left idle in the buffer. - Only the default profile is pooled; webhook/custom-keyed environments and any provisioning failure fall back to synchronous deploy. Stream-route specs (TCP/TLS/UDP and ApisixRoute stream routes) also fall back, since a stream data plane served from a prewarmed environment is sensitive to the background provisioning contention. - Knobs: E2E_PREWARM (default on), E2E_PREWARM_DEPTH (default 1). - k8s.go: poll pod readiness at a fixed 2s interval instead of an exponential backoff that polled sparsely during the 10-30s window pods become ready.
|
Foo Bar seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
📝 WalkthroughWalkthroughIntroduces a background environment prewarming pool for e2e tests. A new ChangesE2E Environment Prewarming Pool
Sequence Diagram(s)sequenceDiagram
participant BeforeEach as APISIXDeployer.BeforeEach
participant getOrStartPool
participant envPool
participant provisionAPISIXEnv
participant loadPooledEnv
participant AfterSuite
rect rgba(100, 149, 237, 0.5)
Note over BeforeEach,loadPooledEnv: Fast path (prewarm enabled, spec eligible)
BeforeEach->>getOrStartPool: isPoolable + specPrewarmable check
getOrStartPool->>envPool: start workers if new profile
envPool->>provisionAPISIXEnv: background goroutine provisions env
provisionAPISIXEnv-->>envPool: pooledEnv buffered in channel
BeforeEach->>envPool: acquire() → pooledEnv
BeforeEach->>loadPooledEnv: wire env into deployer (tunnels, settings)
loadPooledEnv-->>BeforeEach: return (test proceeds)
end
rect rgba(255, 160, 122, 0.5)
Note over AfterSuite,envPool: Suite teardown
AfterSuite->>ShutdownAllPools: registered via Ginkgo AfterSuite
ShutdownAllPools->>envPool: shutdown() for each pool
envPool->>destroyPooledEnv: delete namespace of buffered envs
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/scaffold/apisix_deployer.go (1)
111-111:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdmin API keys logged in plaintext at multiple locations. Both locations log the APISIX admin API key without redaction. Per coding guidelines, API keys should not be logged without redaction, even in test infrastructure, as logs may be accessible in CI systems.
test/e2e/scaffold/apisix_deployer.go#L111-L111: Redact or remove the admin key log inbeforeEachSynctest/e2e/scaffold/apisix_deployer.go#L395-L395: Redact or remove the admin key log inCreateAdditionalGatewayWithOptionsSuggested fix
- s.Logf("apisix admin api key: %s", s.runtimeOpts.APISIXAdminAPIKey) + // Redact middle portion for debugging while protecting the full key + key := s.runtimeOpts.APISIXAdminAPIKey + if len(key) > 8 { + s.Logf("apisix admin api key: %s...%s", key[:4], key[len(key)-4:]) + }As per coding guidelines: "Do not log, serialize, or return API keys, tokens... without redaction".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scaffold/apisix_deployer.go` at line 111, API keys are being logged in plaintext at two locations in test/e2e/scaffold/apisix_deployer.go, violating coding guidelines. In the `beforeEachSync` function at line 111-111, redact or remove the s.runtimeOpts.APISIXAdminAPIKey from the Logf call. Similarly, in the `CreateAdditionalGatewayWithOptions` function at line 395-395, redact or remove the admin API key from the corresponding log statement. Either remove these log lines entirely or redact the sensitive key value (for example, by logging only a partial hash or masked value) before logging to ensure API keys are never exposed in plaintext in test logs.Source: Coding guidelines
🧹 Nitpick comments (2)
test/e2e/framework/k8s.go (1)
299-310: 💤 Low valueFunction name
waitExponentialBackoffno longer matches behavior.The function now uses a fixed 2s interval rather than exponential backoff. Consider renaming to
waitWithBackofforwaitFixedBackoffto avoid confusion.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/framework/k8s.go` around lines 299 - 310, The function `waitExponentialBackoff` is misleadingly named since it no longer implements exponential backoff behavior. The backoff configuration uses Factor: 1, which means the duration stays fixed at 2 seconds throughout all 90 steps, rather than growing exponentially. Rename the function `waitExponentialBackoff` to a name that accurately reflects its fixed-interval behavior, such as `waitWithBackoff` or `waitFixedBackoff`. Make sure to update the function name definition and any call sites that reference this function.test/e2e/scaffold/envpool.go (1)
152-159: 💤 Low valueConsider logging namespace deletion failures for debugging.
destroyPooledEnvdiscards the error fromRunKubectlE. While this is cleanup code where failures aren't actionable, logging the error would help diagnose orphaned namespace issues in CI.func destroyPooledEnv(env *pooledEnv) { if env == nil { return } if env.namespace != "" && env.kubectlOptions != nil { - _ = k8s.RunKubectlE(&bgTestingT{}, env.kubectlOptions, "delete", "namespace", env.namespace, "--wait=false") + if err := k8s.RunKubectlE(&bgTestingT{}, env.kubectlOptions, "delete", "namespace", env.namespace, "--wait=false"); err != nil { + fmt.Printf("prewarm: failed to delete namespace %s: %v\n", env.namespace, err) + } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scaffold/envpool.go` around lines 152 - 159, The `destroyPooledEnv` function discards the error returned by `k8s.RunKubectlE` using the blank identifier, which prevents visibility into namespace deletion failures. Instead of ignoring this error, capture the error return value from the `RunKubectlE` call and log it when deletion fails. This will help diagnose orphaned namespace issues during CI debugging, even though the error may not be actionable during cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/e2e/scaffold/apisix_deployer.go`:
- Line 111: API keys are being logged in plaintext at two locations in
test/e2e/scaffold/apisix_deployer.go, violating coding guidelines. In the
`beforeEachSync` function at line 111-111, redact or remove the
s.runtimeOpts.APISIXAdminAPIKey from the Logf call. Similarly, in the
`CreateAdditionalGatewayWithOptions` function at line 395-395, redact or remove
the admin API key from the corresponding log statement. Either remove these log
lines entirely or redact the sensitive key value (for example, by logging only a
partial hash or masked value) before logging to ensure API keys are never
exposed in plaintext in test logs.
---
Nitpick comments:
In `@test/e2e/framework/k8s.go`:
- Around line 299-310: The function `waitExponentialBackoff` is misleadingly
named since it no longer implements exponential backoff behavior. The backoff
configuration uses Factor: 1, which means the duration stays fixed at 2 seconds
throughout all 90 steps, rather than growing exponentially. Rename the function
`waitExponentialBackoff` to a name that accurately reflects its fixed-interval
behavior, such as `waitWithBackoff` or `waitFixedBackoff`. Make sure to update
the function name definition and any call sites that reference this function.
In `@test/e2e/scaffold/envpool.go`:
- Around line 152-159: The `destroyPooledEnv` function discards the error
returned by `k8s.RunKubectlE` using the blank identifier, which prevents
visibility into namespace deletion failures. Instead of ignoring this error,
capture the error return value from the `RunKubectlE` call and log it when
deletion fails. This will help diagnose orphaned namespace issues during CI
debugging, even though the error may not be actionable during cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 81665ee1-05a8-4dab-9236-a9367709a5d3
📒 Files selected for processing (6)
test/e2e/apisix/e2e_test.gotest/e2e/framework/k8s.gotest/e2e/scaffold/apisix_deployer.gotest/e2e/scaffold/apisix_prewarm.gotest/e2e/scaffold/envpool.gotest/e2e/scaffold/scaffold.go
|
Reopening as a same-repository PR: fork-based PRs cannot access the registry login secret, so every e2e job fails at the "Login to Registry" setup step before any test runs. Superseded by a branch pushed directly to this repo. |
Each APISIX-standalone spec deploys a full environment (data plane, controller, httpbin, port-forwards) synchronously in
BeforeEach, so the ~30-80s deploy/readiness latency sits on every spec's critical path.What this does
Adds a per-process prewarm pool: a small double-buffer of ready environments is provisioned by background workers, so while one spec runs, the next environment is built concurrently.
BeforeEachthen picks up a ready environment instead of deploying inline.Deployerinterface is untouched.createDataplaneTunnels/createAdminTunnel), so a port-forward is never opened before its listener is serving and then left idle in the buffer.Knobs
E2E_PREWARM— on by default; setfalseto disable (kill switch).E2E_PREWARM_DEPTH— pool depth per profile per process, default1(double buffer).The cluster footprint is governed by
E2E_NODES × (1 + depth); at the default depth this adds only one in-flight environment per process.Also included
Files
scaffold/envpool.go(new) — generic, provider-agnostic pool.scaffold/apisix_prewarm.go(new) — default-profile provisioning + load onto scaffold.scaffold/apisix_deployer.go—BeforeEachprewarm fast path + sync fallback.scaffold/scaffold.go,apisix/e2e_test.go— const +AfterSuitepool teardown.framework/k8s.go— readiness polling fix +EnsureServiceReadyE.Validation
go build ./test/...,go vet,gofmtall clean. Live-cluster behavior and peak resource usage need this PR's CI run to validate;E2E_PREWARM=falseis the kill switch.Summary by CodeRabbit