Skip to content

Assignable role catalog in datum-cloud differs between production and staging (bundle version/deployment skew) #244

Description

@yahyafakhroji

Summary

The set of assignable Roles exposed in the datum-cloud namespace differs significantly between production and staging. Production returns only the 3 top-level aggregate roles (owner, editor, viewer) — each with no spec.includedPermissions and spec.inheritedRoles pointing at leaf roles in milo-system. Staging (preview-pr-746) returns the full ~26-role catalog, including the leaf roles (core-admin, core-editor, core-reader, iam-editor, networking sub-roles, …) with spec.includedPermissions present in datum-cloud itself.

This looks like an assignable-roles bundle version/deployment difference between environments, not a permissions-computation bug: the controller-computed status.effectivePermissions is populated and complete in both environments.

Investigation

GET of the role list in datum-cloud:

Production Staging (preview-pr-746)
Roles returned owner, editor, viewer (3) full catalog (~26)
spec.includedPermissions on aggregates absent (aggregate-only) absent (aggregate-only)
Leaf roles present in datum-cloud no (they live in milo-system) yes, with spec.includedPermissions
spec.inheritedRoles targets milo-system roles mix; many resolvable in-namespace
status.effectivePermissions ✅ present & complete ✅ present & complete

Observations:

  • The aggregate roles having empty spec.includedPermissions is expected (they compose purely via inheritedRoles) — not a defect.
  • The meaningful divergence is the catalog/packaging: staging ships the leaf roles (and their includedPermissions) into datum-cloud, production does not.
  • Enforcement is unaffected: status.effectivePermissions (the authoritative flattened list the controller writes) is correct in both environments.

Likely cause

Version/deployment skew of the assignable-organization-roles bundle between environments — e.g. production is reconciled to an older IAM/NSO artifact than staging, or the two environments track different sources/revisions (cf. the Flux OCI drift pattern in #235).

Verification

# Catalog size + names per environment
kubectl get roles -n datum-cloud -o custom-columns=NAME:.metadata.name --context <prod>
kubectl get roles -n datum-cloud -o custom-columns=NAME:.metadata.name --context <staging>

# Resolved digest / revision of the assignable-roles bundle
flux get kustomization <assignable-organization-roles> -n <flux-ns> --context <prod>
flux get kustomization <assignable-organization-roles> -n <flux-ns> --context <staging>
# Compare the resolved OCIRepository artifact revision between the two.

Impact

  • Enforcement: none — status.effectivePermissions is correct in both envs.
  • UI: this catalog difference is why cloud-portal's client-side permission resolver behaves differently per environment (empty on prod, partially populated on staging). That is tracked and fixed independently in cloud-portal: Effective Permissions panel is empty on production cloud-portal#1293 (portal reads status.effectivePermissions instead of re-deriving). This issue does not block that fix.

Question for platform owners

Is production intentionally pinned to an older assignable-roles layout (aggregate roles in datum-cloud, leaves in milo-system), or is this unintended drift between environments? If intended, no action beyond confirmation. If drift, reconcile production to the current bundle.

Related: datum-cloud/cloud-portal#1293 (cloud-portal Effective Permissions panel empty on production).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions