fix(ci): bootstrap virtualization print nodes for debug by universal-itengineer · Pull Request #2317 · deckhouse/virtualization

universal-itengineer · 2026-05-05T08:57:00Z

Description

Make the nested E2E bootstrap diagnostics more tolerant to temporary Kubernetes API failures.

This change keeps debug output commands from failing the workflow when the nested cluster API temporarily returns errors, and updates the virt-handler readiness loop to recalculate worker nodes on every attempt. If worker nodes cannot be listed, the loop now keeps waiting instead of treating 0/0 as a successful virt-handler readiness state.

Why do we need it, and what problem does it solve?

Nested E2E clusters can briefly lose API availability while the control plane is recovering, for example when etcd or kube-apiserver is under load or restarting. In that window, diagnostic kubectl get node calls may return errors such as etcdserver: request timed out or 502 Bad Gateway and fail the CI job even though the cluster can recover shortly afterward.

Example error

E0505 03:17:44.603556    2966 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: an error on the server (\"<html>\\r\\n<head><title>502 Bad Gateway</title></head>\\r\\n<body>\\r\\n<center><h1>502 Bad Gateway</h1></center>\\r\\n<hr><center>nginx</center>\\r\\n</body>\\r\\n</html>\") has prevented the request from succeeding"
  .... has prevented the request from succeeding (get nodes)

The virt-handler readiness check also previously calculated the worker count only once before the retry loop. If that initial request failed and produced 0 workers, the check could incorrectly succeed with 0/0 ready handlers.

What is the expected result?

Run the nested E2E bootstrap workflow.
If the nested cluster API temporarily fails during diagnostic output, the workflow logs a warning and continues collecting available information.
If worker nodes cannot be listed during virt-handler readiness checks, the workflow waits for the next retry instead of reporting success with 0/0 handlers.

Checklist

The code is covered by unit tests.
e2e tests passed.
Documentation updated according to the changes.
Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: ci
type: fix
summary: Make nested E2E bootstrap diagnostics tolerate temporary Kubernetes API failures.
impact_level: low

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>

github-actions Bot assigned universal-itengineer May 5, 2026

universal-itengineer marked this pull request as ready for review May 5, 2026 09:09

universal-itengineer requested a review from nevermarine as a code owner May 5, 2026 09:09

universal-itengineer force-pushed the fix/ci/nested-e2e-bootstrap branch 2 times, most recently from aa93115 to cebef57 Compare May 5, 2026 09:12

fix(ci): bootstrap virtualization print nodes for debug

3295cbd

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>

universal-itengineer force-pushed the fix/ci/nested-e2e-bootstrap branch from cebef57 to 3295cbd Compare May 5, 2026 09:13

nevermarine approved these changes May 5, 2026

View reviewed changes

universal-itengineer added this to the v1.9.0 milestone May 5, 2026

universal-itengineer merged commit 077b199 into main May 5, 2026
26 of 29 checks passed

universal-itengineer deleted the fix/ci/nested-e2e-bootstrap branch May 5, 2026 09:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): bootstrap virtualization print nodes for debug#2317

fix(ci): bootstrap virtualization print nodes for debug#2317
universal-itengineer merged 1 commit into
mainfrom
fix/ci/nested-e2e-bootstrap

universal-itengineer commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

universal-itengineer commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why do we need it, and what problem does it solve?

What is the expected result?

Checklist

Changelog entries

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

universal-itengineer commented May 5, 2026 •

edited

Loading