Describe the bug
When we are creating a subsystem with AutoListener feature, the nListListener grpc call (that is called right after) returns empty listener list.
But if we query (via GW cli) it returns right answer.
Environment details
- Image/version of Ceph CSI driver :
canary
- Kubernetes cluster version : non relevant
- Ceph cluster version : ceph
main (which has AutoListener support)
Steps to reproduce
Steps to reproduce the behavior:
- create rook env and deploy the main ceph version
- deploy nvmeof csi provisioner
- create pvc
- you will see in the log the return list of listeners from the GW is empty.
- deploy the node plugin
6. create test-pod - you will get an error-- the list of listeners is empty!
Actual results
There is no error handling in the provisioner side because we cannot know which listeners (GWs) really exist in this cluster.
Then just when you create test-pod you will catch this error.
Describe what happened
Expected behavior
A clear and concise description of what you expected to happen.
Logs
I0208 15:25:22.975469 1 utils.go:350] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f GRPC call: /csi.v1.Controller/CreateVolume
I0208 15:25:22.975594 1 utils.go:351] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f GRPC request: {"capacity_range":{"required_bytes":134217728},"name":"pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f","csi.storage.k8s.io/pvc/name":"nvmeof-external-volume","csi.storage.k8s.io/pvc/namespace":"default","imageFeatures":"layering,deep-flatten,exclusive-lock,object-map,fast-diff","imageFormat":"2","networkMask":"10.128.0.0/14","nvmeofGatewayAddress":"172.30.113.58","nvmeofGatewayPort":"5500","pool":"nvmeof","subsystemNQN":"nqn.2016-06.io.spdk:cnode1.rook-ceph"},"secrets":"***stripped***","volume_capabilities":[{"access_mode":{"mode":"SINGLE_NODE_WRITER"},"mount":{"fs_type":"ext4"}}]}
..
..
I0208 15:25:23.174353 1 nvmeof.go:293] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Subsystem created successfully: nqn.2016-06.io.spdk:cnode1.rook-ceph
..
..
I0208 15:25:24.384533 1 nvmeof.go:504] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Listing listeners in subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph
I0208 15:25:24.407404 1 nvmeof.go:518] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Listed listeners in subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph successfully
I0208 15:25:24.407431 1 controllerserver.go:804] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Retrieved 0 auto-created listeners
Additional context
from script:
kubectl -n rook-ceph exec -it nvmeof-cli -- /tmp/nvmeof-cli subsystem add --subsystem nqn.2016-06.io.ceph:subsystem.test-integration --network-mask "10.128.0.0/14"
kubectl -n rook-ceph exec -it nvmeof-cli -- /tmp/nvmeof-cli listener list -n nqn.2016-06.io.ceph:subsystem.test-integration.group-a
log:
Listeners for nqn.2016-06.io.ceph:subsystem.test-integration.group-a:
╒═══════════════════════════╤═════════════╤══════════════════╤══════════════════╤══════════╤══════════╤══════════╕
│ Host │ Transport │ Address Family │ Address │ Secure │ Active │ Manual │
╞═══════════════════════════╪═════════════╪══════════════════╪══════════════════╪══════════╪══════════╪══════════╡
│ rook-ceph-nvmeof-nvmeof-a │ TCP │ IPv4 │ 10.131.0.31:4420 │ No │ Yes │ No │
╘═══════════════════════════╧═════════════╧══════════════════╧══════════════════╧══════════╧══════════╧══════════╛
Solution
I added a very short sleep right after subsystem creation and right before the ListListener grpc call and it worked.
probably there is some delay in the nvmeof GW update omap file.
I am going to add retry mechanism with few attempts.
Describe the bug
When we are creating a subsystem with AutoListener feature, the nListListener grpc call (that is called right after) returns empty listener list.
But if we query (via GW cli) it returns right answer.
Environment details
canarymain(which hasAutoListenersupport)Steps to reproduce
Steps to reproduce the behavior:
6. create test-pod - you will get an error-- the list of listeners is empty!
Actual results
There is no error handling in the provisioner side because we cannot know which listeners (GWs) really exist in this cluster.
Then just when you create test-pod you will catch this error.
Describe what happened
Expected behavior
A clear and concise description of what you expected to happen.
Logs
I0208 15:25:22.975469 1 utils.go:350] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f GRPC call: /csi.v1.Controller/CreateVolume I0208 15:25:22.975594 1 utils.go:351] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f GRPC request: {"capacity_range":{"required_bytes":134217728},"name":"pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f","csi.storage.k8s.io/pvc/name":"nvmeof-external-volume","csi.storage.k8s.io/pvc/namespace":"default","imageFeatures":"layering,deep-flatten,exclusive-lock,object-map,fast-diff","imageFormat":"2","networkMask":"10.128.0.0/14","nvmeofGatewayAddress":"172.30.113.58","nvmeofGatewayPort":"5500","pool":"nvmeof","subsystemNQN":"nqn.2016-06.io.spdk:cnode1.rook-ceph"},"secrets":"***stripped***","volume_capabilities":[{"access_mode":{"mode":"SINGLE_NODE_WRITER"},"mount":{"fs_type":"ext4"}}]} .. .. I0208 15:25:23.174353 1 nvmeof.go:293] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Subsystem created successfully: nqn.2016-06.io.spdk:cnode1.rook-ceph .. .. I0208 15:25:24.384533 1 nvmeof.go:504] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Listing listeners in subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph I0208 15:25:24.407404 1 nvmeof.go:518] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Listed listeners in subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph successfully I0208 15:25:24.407431 1 controllerserver.go:804] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Retrieved 0 auto-created listenersAdditional context
from script:
log:
Solution
I added a very short sleep right after subsystem creation and right before the ListListener grpc call and it worked.
probably there is some delay in the nvmeof GW update omap file.
I am going to add retry mechanism with few attempts.