Add KEP-4680 device health reporting with message field support#127
Add KEP-4680 device health reporting with message field support#127harche wants to merge 1 commit into
Conversation
|
Welcome @harche! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: harche The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/hold for the implementation PRs to merge, |
|
I will update once 1.36 beta is cut |
/hold cancel |
|
/hold I am adding a feature to trigger the health status update in this example driver. |
|
As mentioned before, I don't like that DRA drivers have to interact with the gRPC interface (drahealthv1alpha1 currently, despite the beta graduation of the feature?!). Can we first improve the kubeletplugin support for this feature (abstract from underlying gRPC interface similar to how it's done for the main DRA kubelet interace, better documentation) and only then make the DRA example driver use that improved support? |
Summary
Adds KEP-4680 device health reporting to the example driver. The driver implements the
DRAResourceHealthServergRPC interface (dra-health/v1alpha1), allowing kubelet to stream device health updates that appear inpod.status.containerStatuses[].allocatedResourcesStatus.Verification
[ { "name": "claim:gpu", "resources": [ { "health": "Healthy", "message": "Device gpu-0 operating normally, temperature: 46°C", "resourceID": "k8s.gpu.example.com/gpu=common" } ] } ]Cluster requirements
ResourceHealthStatusandResourceHealthStatusMessagefeature gates are Beta and enabled by default)