Skip to content
This repository was archived by the owner on Oct 22, 2024. It is now read-only.
This repository was archived by the owner on Oct 22, 2024. It is now read-only.

test flake: spontaneous node reboot #1055

@pohly

Description

@pohly

A worker node spontaneously rebooted, causing container restarts and thus test failures.

Seen in https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-1054/4/

https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-1054/4/artifact/joblog-jenkins-pmem-csi-PR-1054-4-test-1.19.log:

Dec  7 00:33:17.367: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
[AfterEach] direct-production
  /mnt/workspace/pmem-csi_PR-1054/test/e2e/deploy/deploy.go:1112
�[1mSTEP�[0m: checking for test "direct-production Deployment Kata Containers [Testpattern: CSI Ephemeral-volume (ext4)] dax should support MAP_SYNC" in namespace default, test success
pmem-csi-intel-com-controller-d875b774-r6shd/pmem-driver@pmem..ker1: ==== end of pod log ====
WARNING: pod log: pmem-csi-intel-com-controller-d875b774-r6shd/pmem-driver: Get "https://172.17.0.5:10250/containerLogs/default/pmem-csi-intel-com-controller-d875b774-r6shd/pmem-driver?follow=true": dial tcp 172.17.0.5:10250: connect: connection refused
...
Dec  7 00:34:37.493: INFO: Done with waiting, PMEM-CSI driver v1.0.0-48-g858d2ca0 is ready.
Dec  7 00:34:37.514: FAIL: container "pmem-driver" in pod "pmem-csi-intel-com-controller-d875b774-r6shd" restarted 1 times, last state: {Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Unknown,Message:,StartedAt:2021-12-06 23:23:32 +0000 UTC,FinishedAt:2021-12-07 00:33:58 +0000 UTC,ContainerID:containerd://c103cca52585e83c30b0afa64ce57b8048fb90998e36bf7a96bafeafaec4ecb3,}}

https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-1054/4/artifact/joblog-jenkins-pmem-csi-PR-1054-4-kubeletlogs-1.19.log:

Dec 07 00:30:52 pmem-csi-govm-worker1 kubelet[855]: E1207 00:30:52.749878     855 upgradeaware.go:387] Error proxying data from backend to client: tls: use of closed connection
-- Boot 2a48549ebe844612bb074c64784b43f9 --
Dec 07 00:33:59 pmem-csi-govm-worker1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Dec 07 00:34:01 pmem-csi-govm-worker1 kubelet[636]: I1207 00:34:01.312285     636 server.go:411] Version: v1.19.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions