test(e2e): enforce rebuild-revision parity across GPU OS variants#8660
Open
surajssd wants to merge 1 commit into
Open
test(e2e): enforce rebuild-revision parity across GPU OS variants#8660surajssd wants to merge 1 commit into
surajssd wants to merge 1 commit into
Conversation
`Test_Version_Consistency_GPU_Managed_Components` only compared `major.minor.patch`, so a Renovate bump touching a single OS (e.g. `dcgm-exporter` `4.8.2-ubuntu24.04u2` while Azure Linux stayed at `4.8.2-1.azl3`) slipped through the check. - add `extractPackageRevision` helper parsing the trailing rebuild counter from both Ubuntu (`...uN`) and Azure Linux (`-N.azl3`) schemes - assert the rebuild revision stays in lockstep across all OS variants, failing the build on partial-OS package updates - add `Test_extractPackageRevision` unit test covering both schemes and epoch prefixes Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR strengthens the E2E “version-consistency” coverage for GPU managed components by ensuring OS-specific package rebuild revisions (e.g., Ubuntu ...u2 vs Azure Linux ...-1.azl3) stay aligned, preventing partial-OS Renovate bumps from passing CI unnoticed.
Changes:
- Added
extractPackageRevisionto parse the rebuild-revision counter from package version strings (including epoch-prefixed versions). - Extended
Test_Version_Consistency_GPU_Managed_Componentsto assert rebuild-revision parity across Ubuntu 22.04/24.04 and Azure Linux 3.0 variants for the GPU package set. - Added a table-driven unit test
Test_extractPackageRevisionto validate parsing across supported version formats and edge cases.
1 task
ganeshkumarashok
approved these changes
Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Strengthens the
version-consistencyjob in the "Validate Components" check so it catches partial-OS GPU package updates that previously slipped through.Problem: Renovate PRs sometimes bump a package for only one OS. For example, #8659 moved
dcgm-exporterto4.8.2-ubuntu24.04u2/4.8.2-ubuntu22.04u2for Ubuntu while leaving Azure Linux at4.8.2-1.azl3. The check passed anyway becauseTest_Version_Consistency_GPU_Managed_Components(ine2e/scenario_gpu_managed_experience_test.go) compared only themajor.minor.patchpart viaextractMajorMinorPatchVersion, which strips the trailing rebuild suffix — so all three variants collapsed to4.8.2and matched. The rebuild-revision skew (u2vsazl3's1) — the only thing that differed — was invisible to the test.Fix: assert that the trailing distro rebuild revision also stays in lockstep across the Ubuntu and Azure Linux variants of each GPU package (
nvidia-device-plugin,datacenter-gpu-manager-4-core,datacenter-gpu-manager-4-proprietary,dcgm-exporter). This is a strict gate — a one-OS bump now fails CI until all OS entries inparts/common/components.jsonare aligned (or the partial bump is reverted). This matches the repo's "no partial OS updates" philosophy and is backed by git history, where the rebuild revision has always moved together across OS variants.Changes, all within
e2e/scenario_gpu_managed_experience_test.go:extractPackageRevision, a helper that parses the rebuild-revision counter from the trailing token after the last-, handling both the Ubuntu scheme (4.8.2-ubuntu24.04u2→2) and the Azure Linux / plain scheme (4.8.2-1.azl3→1,1:4.5.3-1→1, including epoch prefixes).Test_Version_Consistency_GPU_Managed_Componentswith anexpectedRevisionaccumulator and arequire.Equalfparity assertion that emits an actionable "Partial OS update detected" message pointing at the offendingos.release.Test_extractPackageRevision, a table-driven unit test covering both suffix schemes, epoch prefixes, multi-digit counters (...u10→10), and edge cases (empty string, no revision).Scope / impact: test-only change; no production code, no
components.json, norenovate.json, and no workflow files are touched. Verified locally: the test passes on the currentmainand fails when#8659's partial update is simulated, with the expected message namingazurelinux.v3.0.Note
Renovate grouping (
nvidia-dcgmin.github/renovate.json) already exists and was in place when#8659was raised one-OS-only — grouping only batches updates that are simultaneously available on the package feeds, so it cannot prevent this class of skew. The test is the real gate; no Renovate change is included.