Initial cut of external plugin support for NPD#1281
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dims The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
fbf84fa to
08a4e88
Compare
|
/retest |
08a4e88 to
49639d2
Compare
|
/retest |
|
@dims: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest |
|
Tests should be fixed by #1294. |
|
/retest |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dims The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The GCE cluster e2e jobs now deploy NPD via test/e2e/manifests/deploy-npd.sh, which kustomize-builds the production deployment/ manifests and waits for the node-problem-detector DaemonSet to roll out. This PR's deployment/node-problem-detector.yaml wires in the example GPU monitor sidecar (gpu-monitor:latest image, NVIDIA host devices, a gRPC socket) and adds --config.external-monitor on the main container. None of that is available on the GPU-less GCE e2e nodes: the sidecar image cannot be pulled, so the DaemonSet never becomes Ready and deploy-npd.sh's rollout wait fails. Patch the e2e overlay to drop the GPU sidecar, its NVIDIA volumes, and the --config.external-monitor flag so e2e exercises only the standard system-log monitors. The production deployment manifest is unchanged. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
19ebf56 to
9b449e0
Compare
xref: #833
As a first step to GPU support, we want to avoid running scripts repeatedly and have something running that NPD can talk to over GRPC. I did some experiments in:
https://github.com/dims/npd-ext
Filing this PR as a first step. I've included a sample
examples/external-plugins/gpu-monitorthat just runs nvidia-smi.NOTE that the protos are based off the same handshake that external plugins currently use. So this mechanism is NOT just for GPUs in any way.
Please see
docs/external-plugins.mdfor more info.