DNM: Reproduce placement http endpoint race#1457
Conversation
With the delay introduce to the placement tls reconciliation we can reproduce the following sequence of events. * PlacementAPI CR is created with the non tlse endpoints and being actively reconciled by the placement-operator * Nova CR is CR is created and being actively reconciled by the nova-operator * placement-operator deploys the service and exposes it in keystone via a KeystoneEndpoint CR with the http URL * nova-operator deploys nova-cell0-conductor and that service creates a placement client that discovers the http URL for placement * opentstack-operator finally updates the PlacementAPI with the tlse config and therefore placement-operator updates the KeystoneEndpoint CR with the https endpoints. This does not trigger a restart in nova-cell0-conductor deployment as that only depends on the KeystoneEndpoint/keystone * two edpm compute node is deployed and a nova instance is created on one of them then requested to be migrated to the other node. * nova-cell0-conductor-0 uses its placement client to move get the instance allocations from placement. The client uses the http URL and fails as the placement now only speaks https. while true ; do date ; oc get pod | grep nova ; sleep 5 ; done ... Tue May 27 10:54:25 AM CEST 2025 nova-api-0 0/2 ContainerCreating 0 1s nova-api-8967-account-create-nqf8c 0/1 Completed 0 34s nova-api-db-create-cj4ls 0/1 Completed 0 44s nova-cell0-cell-mapping-r4mcr 1/1 Running 0 1s nova-cell0-conductor-0 1/1 Running 0 12s nova-cell0-conductor-db-sync-htqxh 0/1 Completed 0 29s nova-cell0-db-create-7pqqx 0/1 Completed 0 44s nova-cell0-e647-account-create-f8nz4 0/1 Completed 0 34s nova-cell1-9d60-account-create-r47q8 0/1 Completed 0 34s nova-cell1-conductor-db-sync-sr2qp 1/1 Running 0 1s nova-cell1-db-create-7kl84 0/1 Completed 0 44s nova-cell1-novncproxy-0 0/1 ContainerCreating 0 1s nova-metadata-0 0/2 ContainerCreating 0 1s nova-scheduler-0 0/1 ContainerCreating 0 1s while true ; do date ; oc get KeystoneEndpoint/placement -o yaml | yq ".spec.endpoints" ; sleep 5 ; done ... Tue May 27 10:54:21 AM CEST 2025 internal: http://placement-internal.openstack.svc:8778 public: http://placement-public.openstack.svc:8778 Tue May 27 10:54:26 AM CEST 2025 internal: http://placement-internal.openstack.svc:8778 public: http://placement-public.openstack.svc:8778 Tue May 27 10:54:31 AM CEST 2025 internal: http://placement-internal.openstack.svc:8778 public: http://placement-public.openstack.svc:8778 Tue May 27 10:54:36 AM CEST 2025 internal: http://placement-internal.openstack.svc:8778 public: http://placement-public.openstack.svc:8778 Tue May 27 10:54:42 AM CEST 2025 internal: http://placement-internal.openstack.svc:8778 public: http://placement-public.openstack.svc:8778 Tue May 27 10:54:47 AM CEST 2025 internal: https://placement-internal.openstack.svc:8778 public: https://placement-public-openstack.apps-crc.testing ❯ openstack server migrate test_0 --wait oc logs nova-cell0-conductor-0 ... 2025-05-27 09:15:26.911 1 WARNING nova.scheduler.utils [None req-180d4caa-8cdb-4e64-99ce-b1fae03e7ccf 1225f304b197465c9b7861edb324e225 bb30f64d028c4b93816803c9f9476c36 - - default default] Failed to compute_task_migrate_server: Failed to retrieve allocations for consumer af38a891-59d9-4aa1-9568-0d57a31a8b37: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.<br /> Reason: You're speaking plain HTTP to an SSL-enabled server port.<br /> Instead use the HTTPS scheme to access this URL, please.<br /> </p> </body></html> : nova.exception.ConsumerAllocationRetrievalFailed: Failed to retrieve allocations for consumer af38a891-59d9-4aa1-9568-0d57a31a8b37: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
|
/hold not intended to merge this |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gibizer The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2916fae854964768ad9e5ebb696c4e47 ✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 24m 59s |
|
closing this as the fixes merged |
With the delay introduce to the placement tls reconciliation we can reproduce the following sequence of events.