Skip to content

WIP: OCPBUGS-77056: add external certs benchmark scripts#1374

Draft
bentito wants to merge 1 commit into
openshift:masterfrom
bentito:perf-benchmark-scripts
Draft

WIP: OCPBUGS-77056: add external certs benchmark scripts#1374
bentito wants to merge 1 commit into
openshift:masterfrom
bentito:perf-benchmark-scripts

Conversation

@bentito
Copy link
Copy Markdown
Contributor

@bentito bentito commented Mar 3, 2026

This PR adds an orchestrator (00-run-all.sh) and supporting scripts to benchmark the OpenShift router's startup time when handling spec.tls.externalCertificate routes, aimed at validating fixes for OCPBUGS-77056.

The benchmark harness tests how quickly the router can load N external certificates, comparing the existing bug (where a global write lock causes serial registrations) against a patched image that allows concurrent processing.

Signed-off-by: Brett Tofel <btofel@redhat.com>
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 3, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 3, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rikatz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bentito bentito changed the title WIP: hack: add external certs benchmark scripts WIP: OCPBUGS-77056: add external certs benchmark scripts Mar 3, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Mar 3, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@bentito: This pull request references Jira Issue OCPBUGS-77056, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This PR adds an orchestrator (00-run-all.sh) and supporting scripts to benchmark the OpenShift router's startup time when handling spec.tls.externalCertificate routes, aimed at validating fixes for OCPBUGS-77056.

The benchmark harness tests how quickly the router can load N external certificates, comparing the existing bug (where a global write lock causes serial registrations) against a patched image that allows concurrent processing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Mar 3, 2026
@openshift-ci openshift-ci Bot requested a review from lihongan March 3, 2026 23:37
@bentito
Copy link
Copy Markdown
Contributor Author

bentito commented Mar 3, 2026

The fix for OCPBUGS-77056 that enables faster startup behavior benchmarked in this PR is implemented and proposed in the following linked pulls:

@bentito
Copy link
Copy Markdown
Contributor Author

bentito commented Mar 3, 2026

This PR is primarily to tie together changes in router and library-go and to illustrate a benchmarking scheme for the Perf/Scale team.

@SachinNinganure please let me know if the benchmarking scripts helps test changes to the router made in the linked PRs.

We are hoping you're able to work testing with these PRs in place into some of your work on https://issues.redhat.com/browse/CORENET-6842

Primarily we're looking to make sure that parallelizing these calls to speed up the router's handling of adding routes with external certs, is not simply shifting the problem to the API server or elsewhere. So when we save time in the router pod are we causing CPU or memory problems elsewhere on the cluster. Thanks!

@SachinNinganure
Copy link
Copy Markdown

This PR is primarily to tie together changes in router and library-go and to illustrate a benchmarking scheme for the Perf/Scale team.

@SachinNinganure please let me know if the benchmarking scripts helps test changes to the router made in the linked PRs.

We are hoping you're able to work testing with these PRs in place into some of your work on https://issues.redhat.com/browse/CORENET-6842

Primarily we're looking to make sure that parallelizing these calls to speed up the router's handling of adding routes with external certs, is not simply shifting the problem to the API server or elsewhere. So when we save time in the router pod are we causing CPU or memory problems elsewhere on the cluster. Thanks!

I have started putting some note in this doc. https://docs.google.com/document/d/1tEZBNEaKLvXI8l1KIcHKCQgbAzkv0_YQSzkXcrlq56Y/edit?tab=t.0

I have just tried to understand the benchmark scripts and yes they directly test both the pr-s.
the flow is ...1>It Creates 200 routes with external certificates (configurable) 2>Restarts router with current code → measures startup time 3>Deploys patched router with async changes → measures startup time...4> Compares the perf! I will additionally monitor the resources and their utilization.

for 6842 I will be running our ingress perf test, you may take a look at doc for more info

I will be running the benchmark scripts you created for the linked pr verification and additionally , create the perf-scale-grafana boards for the resource utilization and other verification. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants