Skip to content

nvmeof: ListListener grpc call returns empty #6037

@gadididi

Description

@gadididi

Describe the bug

When we are creating a subsystem with AutoListener feature, the nListListener grpc call (that is called right after) returns empty listener list.
But if we query (via GW cli) it returns right answer.

Environment details

  • Image/version of Ceph CSI driver : canary
  • Kubernetes cluster version : non relevant
  • Ceph cluster version : ceph main (which has AutoListener support)

Steps to reproduce

Steps to reproduce the behavior:

  1. create rook env and deploy the main ceph version
  2. deploy nvmeof csi provisioner
  3. create pvc
  4. you will see in the log the return list of listeners from the GW is empty.
  5. deploy the node plugin
    6. create test-pod - you will get an error-- the list of listeners is empty!

Actual results

There is no error handling in the provisioner side because we cannot know which listeners (GWs) really exist in this cluster.
Then just when you create test-pod you will catch this error.

Describe what happened

Expected behavior

A clear and concise description of what you expected to happen.

Logs

  • csi-provisioner:
I0208 15:25:22.975469       1 utils.go:350] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f GRPC call: /csi.v1.Controller/CreateVolume
I0208 15:25:22.975594       1 utils.go:351] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f GRPC request: {"capacity_range":{"required_bytes":134217728},"name":"pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f","csi.storage.k8s.io/pvc/name":"nvmeof-external-volume","csi.storage.k8s.io/pvc/namespace":"default","imageFeatures":"layering,deep-flatten,exclusive-lock,object-map,fast-diff","imageFormat":"2","networkMask":"10.128.0.0/14","nvmeofGatewayAddress":"172.30.113.58","nvmeofGatewayPort":"5500","pool":"nvmeof","subsystemNQN":"nqn.2016-06.io.spdk:cnode1.rook-ceph"},"secrets":"***stripped***","volume_capabilities":[{"access_mode":{"mode":"SINGLE_NODE_WRITER"},"mount":{"fs_type":"ext4"}}]}
..
..
I0208 15:25:23.174353       1 nvmeof.go:293] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Subsystem created successfully: nqn.2016-06.io.spdk:cnode1.rook-ceph
..
..
I0208 15:25:24.384533       1 nvmeof.go:504] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Listing listeners in subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph
I0208 15:25:24.407404       1 nvmeof.go:518] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Listed listeners in subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph successfully
I0208 15:25:24.407431       1 controllerserver.go:804] ID: 15 Req-ID: pvc-6ad5a7a7-3c82-4afd-87a2-e0be3651968f Retrieved 0 auto-created listeners

Additional context

from script:

kubectl -n rook-ceph exec -it nvmeof-cli -- /tmp/nvmeof-cli subsystem add --subsystem nqn.2016-06.io.ceph:subsystem.test-integration --network-mask "10.128.0.0/14"
kubectl -n rook-ceph exec -it nvmeof-cli -- /tmp/nvmeof-cli listener list -n nqn.2016-06.io.ceph:subsystem.test-integration.group-a

log:

Listeners for nqn.2016-06.io.ceph:subsystem.test-integration.group-a:
╒═══════════════════════════╤═════════════╤══════════════════╤══════════════════╤══════════╤══════════╤══════════╕
│ Host                      │ Transport   │ Address Family   │ Address          │ Secure   │ Active   │ Manual   │
╞═══════════════════════════╪═════════════╪══════════════════╪══════════════════╪══════════╪══════════╪══════════╡
│ rook-ceph-nvmeof-nvmeof-a │ TCP         │ IPv4             │ 10.131.0.31:4420 │ No       │ Yes      │ No       │
╘═══════════════════════════╧═════════════╧══════════════════╧══════════════════╧══════════╧══════════╧══════════╛

Solution

I added a very short sleep right after subsystem creation and right before the ListListener grpc call and it worked.
probably there is some delay in the nvmeof GW update omap file.
I am going to add retry mechanism with few attempts.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcomponent/nvme-ofIssues and PRs related to NVMe-oF.keepaliveThis label can be used to disable stale bot activiity in the repowontfixThis will not be worked on

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions