Skip to content

vault e2e: 11% test speedup by replacing fixed allowlist sleeps with on-demand retry#21707

Merged
prashantkumar1982 merged 1 commit intodevelopfrom
vault-perf-tuning
Mar 25, 2026
Merged

vault e2e: 11% test speedup by replacing fixed allowlist sleeps with on-demand retry#21707
prashantkumar1982 merged 1 commit intodevelopfrom
vault-perf-tuning

Conversation

@prashantkumar1982
Copy link
Copy Markdown
Contributor

@prashantkumar1982 prashantkumar1982 commented Mar 25, 2026

Summary

Replace the fixed time.Sleep(6s) after every allowlistRequest() call with on-demand retry logic in sendVaultRequestToGateway(). The test suite calls allowlistRequest() 14 times per run, totaling 84 seconds of unconditional sleep. This PR eliminates that by retrying only when the gateway hasn't yet synced the allowlist.

Changes

  • Removed time.Sleep(6 * time.Second) from allowlistRequest() in v2_vault_don_test.go
  • Added retry logic in sendVaultRequestToGateway() (v2_vault_don_test_helpers.go): if the gateway returns a "request not allowlisted" error, retry every 2s up to 7 times
  • Added isGatewayNotAllowlistedError() helper that correctly distinguishes gateway-level rejections (retryable: method="", code -32600) from node-level rejections (not retryable: method is set, code -32603, gateway already consumed the request)

Performance Results

Both runs used workflow-gateway-capabilities topology on local CRE, same branch (develop), same hardware.

Metric Before After Delta
Total test time 500.11s 445.52s -54.59s (-10.9%)
basic_crud subtest 146.89s 108.29s -38.60s (-26.3%)
cross_namespace subtest 274.25s 243.24s -31.01s (-11.3%)
Allowlist calls 14 14
Forced sleep 84s (14 × 6s) 0s -84s
Actual retry wait 0s ~40s (20 retries × 2s) +40s
Net allowlist saving ~44s

How the retry works

Most requests need only 1–2 retries (2–4s wait) before the allowlist syncer propagates the on-chain data. A few need 3 retries (6s). The worst case (7 retries = 14s) is still better than the unconditional 6s sleep that was paid on every call.

Why gateway vs node distinction matters

When the allowlist syncer on the gateway has caught up but individual nodes haven't, the gateway accepts and forwards the request. If nodes reject it, the gateway has already "consumed" the authorization. Retrying would get "request already authorized previously". The isGatewayNotAllowlistedError() function detects this by checking for the empty method field in the JSON-RPC error response (gateway-level rejection) vs a populated method field (node-level rejection).

Remove the 6-second time.Sleep after every allowlistRequest() call
(14 calls × 6s = 84s of forced sleep per test run). Instead, add
retry logic in sendVaultRequestToGateway() that detects gateway-level
"request not allowlisted" rejections and retries every 2s up to 7
times. Node-level rejections (where the gateway already consumed the
request) are correctly distinguished and not retried.

Measured improvement: total test time 500s → 446s (-11%).
@github-actions
Copy link
Copy Markdown
Contributor

👋 prashantkumar1982, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

✅ No conflicts with other open PRs targeting develop

@prashantkumar1982 prashantkumar1982 changed the title vault e2e: replace fixed allowlist sleep with on-demand retry vault e2e: 11% test speedup by replacing fixed allowlist sleeps with on-demand retry Mar 25, 2026
@cl-sonarqube-production
Copy link
Copy Markdown

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Mar 25, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@prashantkumar1982 prashantkumar1982 added this pull request to the merge queue Mar 25, 2026
Merged via the queue into develop with commit b778c77 Mar 25, 2026
175 of 177 checks passed
@prashantkumar1982 prashantkumar1982 deleted the vault-perf-tuning branch March 25, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants