Skip to content

Use shared singleton vhost/user CRs with per-consumer finalizers#577

Merged
openshift-merge-bot[bot] merged 1 commit into
openstack-k8s-operators:mainfrom
lmiccini:shared-singleton-vhost-user
May 13, 2026
Merged

Use shared singleton vhost/user CRs with per-consumer finalizers#577
openshift-merge-bot[bot] merged 1 commit into
openstack-k8s-operators:mainfrom
lmiccini:shared-singleton-vhost-user

Conversation

@lmiccini

@lmiccini lmiccini commented May 5, 2026

Copy link
Copy Markdown
Contributor

Multiple TransportURL CRs pointing to the same RabbitMQ vhost/user
(e.g., cinder and heat both using vhost "openstack") would each
create separate RabbitMQVhost/RabbitMQUser CRs. Deleting any one
of them destroyed the RabbitMQ resource for all other consumers.

This replaces per-TransportURL CRs with shared singletons named
deterministically via CanonicalVhostName/CanonicalUserName. Each
consumer adds a per-TransportURL finalizer (turl.openstack.org/t-)
so the RabbitMQ resource is only deleted when all consumers are gone.
The RabbitMQ cluster CR is set as owner for automatic GC on cluster
deletion.

Shared CR lifecycle:

  • Orphaned user CRs are labeled (rabbitmq.openstack.org/orphaned)
    instead of deleted, keeping them reclaimable by new consumers.
    Auto-deletion only proceeds after admin removes the
    cleanup-blocked finalizer.
  • Vhost CRs show "Waiting for external finalizers" status when
    deletion is blocked by user finalizer cleanup.
  • User controller cascades vhost CR deletion after its own cleanup.

Migration from legacy per-TransportURL CRs:

  • Legacy CRs (with TransportURL owner refs) are cleaned up after
    canonical CRs are created.
  • Credentials are pre-copied to the canonical secret to avoid
    password regeneration that would break existing connections.
  • Webhook skips legacy CRs and deleting CRs in uniqueness checks.

Webhook hardening:

  • Reject permission changes on shared users with active consumers.
  • Enforce RabbitmqClusterName immutability on user, vhost, and policy.
  • Validate RabbitMQPolicy Pattern as valid regex at admission.
  • Remove duplicate +kubebuilder:webhook markers from apis/ package.
  • Propagate admission context and add timeouts for k8s API calls.

API client hardening:

  • Add context.Context to all RabbitMQ API client methods.
  • Bound error response reads to 1MB and drain bodies on success.
  • Track actual vhost/policy name in RabbitMQPolicyStatus so
    reconcileDelete targets the correct resource.

Controller fixes:

  • Move ObservedGeneration to end of reconcileNormal (user, vhost,
    policy) so it is only set after all work succeeds.
  • Propagate PatchInstance errors via named return instead of
    swallowing them.
  • Guard per-consumer finalizer names against the Kubernetes
    63-char limit with SHA-256 hash truncation.
  • Add periodic 5-minute requeue for long TransportURL names
    where watch-based reverse mapping fails.

Backup/restore:

  • Add backup.openstack.org labels to auto-generated credential
    secrets so Velero includes them.
  • Set restore-order to 10 so the secret is restored before the
    RabbitMQUser CR (order 40).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

@openshift-ci openshift-ci Bot requested review from dprince and stuggi May 5, 2026 09:42
@openshift-ci openshift-ci Bot added the approved label May 5, 2026
@openshift-ci openshift-ci Bot added the approved label May 5, 2026
@lmiccini lmiccini force-pushed the shared-singleton-vhost-user branch 2 times, most recently from 431fc58 to eebffe6 Compare May 8, 2026 05:55
@lmiccini lmiccini changed the title Shared singleton vhost user Use shared singleton vhost/user CRs with per-consumer finalizers May 8, 2026
@centosinfra-prod-github-app

Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/6bb04dac1e48482482250097b177b249

openstack-k8s-operators-content-provider FAILURE in 11m 24s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@lmiccini

lmiccini commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

recheck

@lmiccini

lmiccini commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/test infra-operator-build-deploy-kuttl

@lmiccini lmiccini force-pushed the shared-singleton-vhost-user branch from eebffe6 to 813b0a8 Compare May 12, 2026 14:34
Multiple TransportURL CRs pointing to the same RabbitMQ vhost/user
(e.g., cinder and heat both using vhost "openstack") would each
create separate RabbitMQVhost/RabbitMQUser CRs. Deleting any one
of them destroyed the RabbitMQ resource for all other consumers.

This replaces per-TransportURL CRs with shared singletons named
deterministically via CanonicalVhostName/CanonicalUserName. Each
consumer adds a per-TransportURL finalizer (turl.openstack.org/t-<name>)
so the RabbitMQ resource is only deleted when all consumers are gone.
The RabbitMQ cluster CR is set as owner for automatic GC on cluster
deletion.

Shared CR lifecycle:
- Orphaned user CRs are labeled (rabbitmq.openstack.org/orphaned)
  instead of deleted, keeping them reclaimable by new consumers.
  Auto-deletion only proceeds after admin removes the
  cleanup-blocked finalizer.
- Vhost CRs show "Waiting for external finalizers" status when
  deletion is blocked by user finalizer cleanup.
- User controller cascades vhost CR deletion after its own cleanup.

Migration from legacy per-TransportURL CRs:
- Legacy CRs (with TransportURL owner refs) are cleaned up after
  canonical CRs are created.
- Credentials are pre-copied to the canonical secret to avoid
  password regeneration that would break existing connections.
- Webhook skips legacy CRs and deleting CRs in uniqueness checks.

Webhook hardening:
- Reject permission changes on shared users with active consumers.
- Enforce RabbitmqClusterName immutability on user, vhost, and policy.
- Validate RabbitMQPolicy Pattern as valid regex at admission.
- Remove duplicate +kubebuilder:webhook markers from apis/ package.
- Propagate admission context and add timeouts for k8s API calls.

API client hardening:
- Add context.Context to all RabbitMQ API client methods.
- Bound error response reads to 1MB and drain bodies on success.
- Track actual vhost/policy name in RabbitMQPolicyStatus so
  reconcileDelete targets the correct resource.

Controller fixes:
- Move ObservedGeneration to end of reconcileNormal (user, vhost,
  policy) so it is only set after all work succeeds.
- Propagate PatchInstance errors via named return instead of
  swallowing them.
- Guard per-consumer finalizer names against the Kubernetes
  63-char limit with SHA-256 hash truncation.
- Add periodic 5-minute requeue for long TransportURL names
  where watch-based reverse mapping fails.

Backup/restore:
- Add backup.openstack.org labels to auto-generated credential
  secrets so Velero includes them.
- Set restore-order to 10 so the secret is restored before the
  RabbitMQUser CR (order 40).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lmiccini lmiccini force-pushed the shared-singleton-vhost-user branch from 813b0a8 to fcc8556 Compare May 12, 2026 14:36
@lmiccini

Copy link
Copy Markdown
Contributor Author

retest

@lmiccini

Copy link
Copy Markdown
Contributor Author

/test images

TransportURLFinalizerPrefix = "turl.openstack.org/t-"

// maxFinalizerNameSegment is the Kubernetes limit for the name segment after "/"
maxFinalizerNameSegment = 63

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuggi stuggi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci

openshift-ci Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmiccini, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit ae35cd9 into openstack-k8s-operators:main May 13, 2026
7 checks passed
lmiccini added a commit to lmiccini/infra-operator that referenced this pull request May 22, 2026
The RabbitMQUser webhook (added in PR openstack-k8s-operators#577) validates that the
referenced RabbitMQVhost exists before allowing creation. The kuttl
test created the vhost and user in the same step, causing a race
where the user webhook could fire before the vhost was committed.

Split the single create step into two: create and assert the vhost
first, then create the dependent resources (user, policy, transporturl).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lmiccini added a commit to lmiccini/infra-operator that referenced this pull request May 22, 2026
The RabbitMQUser webhook (added in PR openstack-k8s-operators#577) validates that the
referenced RabbitMQVhost exists before allowing creation. The kuttl
test created the vhost and user in the same step, causing a race
where the user webhook could fire before the vhost was committed.

Split the single create step into two: create and assert the vhost
first, then create the dependent resources (user, policy, transporturl).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lmiccini added a commit to lmiccini/infra-operator that referenced this pull request May 22, 2026
The RabbitMQUser webhook (added in PR openstack-k8s-operators#577) validates that the
referenced RabbitMQVhost exists before allowing creation. The kuttl
test created the vhost and user in the same step, causing a race
where the user webhook could fire before the vhost was committed.

Split the single create step into two: create and assert the vhost
first, then create the dependent resources (user, policy, transporturl).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lmiccini lmiccini deleted the shared-singleton-vhost-user branch May 27, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants