Skip to content

test(e2e): enforce rebuild-revision parity across GPU OS variants#8660

Open
surajssd wants to merge 1 commit into
mainfrom
suraj/dcgm-validate-components-consider-build-version
Open

test(e2e): enforce rebuild-revision parity across GPU OS variants#8660
surajssd wants to merge 1 commit into
mainfrom
suraj/dcgm-validate-components-consider-build-version

Conversation

@surajssd

@surajssd surajssd commented Jun 8, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it:

Strengthens the version-consistency job in the "Validate Components" check so it catches partial-OS GPU package updates that previously slipped through.

Problem: Renovate PRs sometimes bump a package for only one OS. For example, #8659 moved dcgm-exporter to 4.8.2-ubuntu24.04u2 / 4.8.2-ubuntu22.04u2 for Ubuntu while leaving Azure Linux at 4.8.2-1.azl3. The check passed anyway because Test_Version_Consistency_GPU_Managed_Components (in e2e/scenario_gpu_managed_experience_test.go) compared only the major.minor.patch part via extractMajorMinorPatchVersion, which strips the trailing rebuild suffix — so all three variants collapsed to 4.8.2 and matched. The rebuild-revision skew (u2 vs azl3's 1) — the only thing that differed — was invisible to the test.

Fix: assert that the trailing distro rebuild revision also stays in lockstep across the Ubuntu and Azure Linux variants of each GPU package (nvidia-device-plugin, datacenter-gpu-manager-4-core, datacenter-gpu-manager-4-proprietary, dcgm-exporter). This is a strict gate — a one-OS bump now fails CI until all OS entries in parts/common/components.json are aligned (or the partial bump is reverted). This matches the repo's "no partial OS updates" philosophy and is backed by git history, where the rebuild revision has always moved together across OS variants.

Changes, all within e2e/scenario_gpu_managed_experience_test.go:

  • Add extractPackageRevision, a helper that parses the rebuild-revision counter from the trailing token after the last -, handling both the Ubuntu scheme (4.8.2-ubuntu24.04u22) and the Azure Linux / plain scheme (4.8.2-1.azl31, 1:4.5.3-11, including epoch prefixes).
  • Extend Test_Version_Consistency_GPU_Managed_Components with an expectedRevision accumulator and a require.Equalf parity assertion that emits an actionable "Partial OS update detected" message pointing at the offending os.release.
  • Add Test_extractPackageRevision, a table-driven unit test covering both suffix schemes, epoch prefixes, multi-digit counters (...u1010), and edge cases (empty string, no revision).

Scope / impact: test-only change; no production code, no components.json, no renovate.json, and no workflow files are touched. Verified locally: the test passes on the current main and fails when #8659's partial update is simulated, with the expected message naming azurelinux.v3.0.

Note

Renovate grouping (nvidia-dcgm in .github/renovate.json) already exists and was in place when #8659 was raised one-OS-only — grouping only batches updates that are simultaneously available on the package feeds, so it cannot prevent this class of skew. The test is the real gate; no Renovate change is included.

`Test_Version_Consistency_GPU_Managed_Components` only compared
`major.minor.patch`, so a Renovate bump touching a single OS (e.g.
`dcgm-exporter` `4.8.2-ubuntu24.04u2` while Azure Linux stayed at
`4.8.2-1.azl3`) slipped through the check.

- add `extractPackageRevision` helper parsing the trailing rebuild
  counter from both Ubuntu (`...uN`) and Azure Linux (`-N.azl3`) schemes
- assert the rebuild revision stays in lockstep across all OS variants,
  failing the build on partial-OS package updates
- add `Test_extractPackageRevision` unit test covering both schemes and
  epoch prefixes

Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens the E2E “version-consistency” coverage for GPU managed components by ensuring OS-specific package rebuild revisions (e.g., Ubuntu ...u2 vs Azure Linux ...-1.azl3) stay aligned, preventing partial-OS Renovate bumps from passing CI unnoticed.

Changes:

  • Added extractPackageRevision to parse the rebuild-revision counter from package version strings (including epoch-prefixed versions).
  • Extended Test_Version_Consistency_GPU_Managed_Components to assert rebuild-revision parity across Ubuntu 22.04/24.04 and Azure Linux 3.0 variants for the GPU package set.
  • Added a table-driven unit test Test_extractPackageRevision to validate parsing across supported version formats and edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants