mig faker#205
Open
iris-shain-runai wants to merge 4 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When
migStrategy: mixedwas configured,mig-fakerfaked the MIG hardware metadata (UUIDs,nvidia.com/mig.config.state: success) but no component ever advertisednvidia.com/mig-*resources to kubelet — thedevice-pluginwas hardcoded to only registernvidia.com/gpu. As a result MIG slices never appeared innode.status.allocatableand MIG-based scheduling was impossible.This PR wires the MIG path end-to-end:
GpuDetailsgainsMigEnabled+MigInstances(internal/common/topology/types.go), and a newtopology.AdvertisedResources()computes the resources a node should expose fornone/single/mixedstrategies. It's the single place that decides "what does this node advertise," now reused bydevice-plugin, the fake-node plugin, and the KWOK config-map handler.device-pluginadvertises MIG. It builds one plugin per advertised resource (each on its own socket), andCleanupStaleSockets()removes orphaned sockets on startup so a profile change can't leave stale resources behind.mig-fakerpublishes MIG state. It writesMigInstancesinto the per-node topology ConfigMap (viatopology.UpdateNodeTopologyCM, server-side apply — following the repo convention) and restarts the device-plugin pod so kubelet picks up the new allocatable resources. Thedevices: [all]parsing crash is also fixed.mig-fakerClusterRole now allowspods: [list, delete](for the restart) andconfigmaps: patch(required by server-side apply).migStrategy, thenode-role.kubernetes.io/runai-dynamic-miglabel, and therun.ai/mig.configannotation format — all previously undocumented (called out in mig-faker does not register a kubelet device plugin — nvidia.com/mig-* resources never appear in node allocatable #177).MIG strategies
AdvertisedResourcesimplements all three strategies:migStrategynone(default)nvidia.com/gpuper physical GPU (MIG ignored)singlenvidia.com/gpu: N(slices counted as plain GPUs)mixednvidia.com/mig-<profile>per slice; the GPU drops out ofnvidia.com/gpuCall graph
mig-faker(triggered when therun.ai/mig.confignode annotation changes). The numbered nodes areFakeMapping's steps in execution order; dotted edges arebuildMigState's internal helper calls:device-plugin(on every (re)start; the restart in step 6 above is what makes it re-read MIG state). Numbered nodes aremain's steps in order; dotted edges areNewDevicePlugins's internal calls:The two processes are decoupled through the per-node topology ConfigMap:
mig-fakerwrites MIG state (step 5) and bounces the pod (step 6); the restarteddevice-pluginre-reads the CM (step 1) andAdvertisedResourcesnow yields the MIG resources.Related Issues
Fixes #177
Checklist
CHANGELOG.mdunder## [Unreleased]Testing
AdvertisedResourcesstrategy matrix (none/single/mixed), device-plugin construction per MIG profile,resourceSocket, andFakeMapping(includingallexpansion + topology CM update + device-plugin pod restart).test/e2e/device-plugin/(make e2e-device-plugin, wired into CI): on a real kind worker it enables the device-plugin + mig-faker, applies arun.ai/mig.config, and assertsnvidia.com/mig-1g.5gbreaches allocatable, the topology CM is updated, a pod requesting the MIG resource runs, and reconfiguration to a different profile replaces the old resources. Ran locally — 6/6 specs pass. This suite caught the missingconfigmaps: patchRBAC verb.Breaking Changes
None.
Additional Notes
device-pluginanddraPluginremain mutually exclusive; the MIG path is device-plugin only, so the new e2e suite is isolated (its own kind cluster) likee2e-mock.status-updateryet (scheduling works; per-slice metrics/nvidia-smifidelity is a follow-up).