Skip to content

Fix ROSA cluster tagging for both SSO and IAM access key deployments#1007

Open
pragya811 wants to merge 6 commits into
mainfrom
rosa_tagging_fix
Open

Fix ROSA cluster tagging for both SSO and IAM access key deployments#1007
pragya811 wants to merge 6 commits into
mainfrom
rosa_tagging_fix

Conversation

@pragya811

Copy link
Copy Markdown
Member

Type of change

Note: Fill x in []

  • bug
  • enhancement
  • documentation
  • dependencies

Description

ROSA instances are created by AWS/Red Hat service accounts, hiding the actual user in RunInstances CloudTrail events. This adds multi-strategy username resolution that traces ownership through cluster IAM roles and CloudTrail events, with a fallback that handles deleted roles via direct CloudTrail CreateRole event search (90-day retention).

For security reasons, all pull requests need to be approved first before running any automated CI

Assisted-by: Cursor

ROSA instances are created by AWS/Red Hat service accounts, hiding the
actual user in RunInstances CloudTrail events. This adds multi-strategy
username resolution that traces ownership through cluster IAM roles and
CloudTrail events, with a fallback that handles deleted roles via direct
CloudTrail CreateRole event search (90-day retention).

Co-authored-by: Cursor <cursoragent@cursor.com>
@pragya811 pragya811 self-assigned this Jul 1, 2026
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CloudTrail-based username resolution helpers and wires them into cluster and non-cluster ownership flows, with automation identity exclusion and a cluster-instance fallback.

Changes

Ownership resolution via CloudTrail

Layer / File(s) Summary
CloudTrail lookup helpers
cloud_governance/common/clouds/aws/cloudtrail/cloudtrail_operations.py, tests/unittest/cloud_governance/common/clouds/aws/cloudtrail/test_cloudtrail_operations.py
Adds resource-event lookup, role-based CreateRole tracing, and cluster-role username resolution in CloudTrailOperations, with tests for excluded users, direct role matching, ambiguity handling, and pagination.
Cluster tagging flow
cloud_governance/policy/policy_operations/aws/tag_cluster/tag_cluster_operations.py, cloud_governance/policy/policy_operations/aws/tag_cluster/tag_cluster_resouces.py, tests/unittest/cloud_governance/aws/tag_cluster/test_tag_cluster_operations.py
TagClusterOperations adds regional CloudTrail wiring, automation-user exclusion, cluster-role lookup, and cluster-instance tag fallback; update_cluster_tags uses the fallback when username resolution stays empty, with tests covering the new resolution order.
Non-cluster tagging flow
cloud_governance/policy/policy_operations/aws/tag_non_cluster/non_cluster_operations.py, tests/unittest/cloud_governance/aws/tag_non_cluster/test_non_cluster_operations.py
Adds automation-user detection and updates get_username to prefer CloudTrail resource-event lookup only after excluding that identity, with tests for the primary and fallback paths.

Estimated code review effort: 4 (Complex) | ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the ROSA cluster tagging and ownership-resolution change.
Description check ✅ Passed The description matches the PR by explaining multi-strategy username resolution for ROSA instances and CloudTrail fallbacks.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands.

@pragya811 pragya811 added the enhancement New feature or request label Jul 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cloud_governance/common/clouds/aws/cloudtrail/cloudtrail_operations.py`:
- Around line 400-418: The temporal fallback in the CloudTrail role matching
logic is too broad because it can return any `openshift` role within
`ROSA_ROLE_WINDOW_SECONDS`, which may assign the wrong owner when multiple
clusters exist. Update `cloudtrail_operations.py` in the `candidate_roles`
selection and `__get_username_from_role_cloudtrail` flow to require a
cluster-specific identifier or otherwise confirm the resolved owner is
unambiguous before returning a username. If more than one plausible candidate
remains after sorting `CreateDate`, do not pick one arbitrarily; only return
when there is a single clearly matched role/user, otherwise fall back without
assigning ownership.
- Line 390: The IAM role lookup in `CloudTrailOperations` is only reading a
single page from `self.__iam_client.list_roles`, so large accounts can miss the
target role. Update the role retrieval logic to paginate through all IAM roles
(using the paginator or marker-based loop) before searching for the matching
cluster/OpenShift role, and keep the lookup behavior in the same
`CloudTrailOperations` flow.
- Around line 341-348: The CreateRole fallback in the CloudTrail lookup is only
reading the first page of results and is using the regional CloudTrail client
instead of the global one. Update the lookup flow in the CreateRole recovery
logic to use __global_cloudtrail and continue fetching pages with NextToken
until all matching events are collected, so the matching event is not missed in
busy accounts.

In
`@cloud_governance/policy/policy_operations/aws/tag_cluster/tag_cluster_operations.py`:
- Around line 123-141: The owner resolution in tag_cluster_operations.py only
applies the automation-user exclusion to the final CloudTrail fallback, so
_get_username_from_instance_id_and_time can still return the automation IAM user
and short-circuit the rest of the lookup. Update the owner resolution flow in
the method that calls get_user_name_from_name_tag,
_get_username_from_instance_id_and_time, and get_username_from_cluster_role so
the same exclude set is checked against the primary CloudTrail result before
returning it, keeping the automation user out of all lookup paths.

In
`@cloud_governance/policy/policy_operations/aws/tag_non_cluster/non_cluster_operations.py`:
- Around line 234-243: The first CloudTrail lookup in the non-cluster username
resolution path can still return the automation user, because only the fallback
call applies the exclusion. Update the logic in the username resolution method
that calls _get_username_from_cloudtrail and get_username_from_resource_events
so the automation user is excluded from both results, either by filtering
ct_username before returning it or by passing the same exclude set into the
first lookup if supported.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: b606510c-10b9-462b-aa5f-b1ec09eaf1f2

📥 Commits

Reviewing files that changed from the base of the PR and between b4bcb9b and 5f382fd.

📒 Files selected for processing (4)
  • cloud_governance/common/clouds/aws/cloudtrail/cloudtrail_operations.py
  • cloud_governance/policy/policy_operations/aws/tag_cluster/tag_cluster_operations.py
  • cloud_governance/policy/policy_operations/aws/tag_cluster/tag_cluster_resouces.py
  • cloud_governance/policy/policy_operations/aws/tag_non_cluster/non_cluster_operations.py

Comment thread cloud_governance/common/clouds/aws/cloudtrail/cloudtrail_operations.py Outdated
Comment thread cloud_governance/common/clouds/aws/cloudtrail/cloudtrail_operations.py Outdated
Comment thread cloud_governance/common/clouds/aws/cloudtrail/cloudtrail_operations.py Outdated
@ebattat

ebattat commented Jul 1, 2026

Copy link
Copy Markdown
Member

@pragya811, do we need to add any unittest or integration test ?

Cover CloudTrail fallback lookups, automation-user exclusion, strategy
ordering, and ambiguous cluster role matching introduced in PR #1007.

Co-authored-by: Cursor <cursoragent@cursor.com>
@pragya811

Copy link
Copy Markdown
Member Author

@pragya811, do we need to add any unittest or integration test ?

Added unittests

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/unittest/cloud_governance/common/clouds/aws/cloudtrail/test_cloudtrail_operations.py (2)

359-359: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Mutable class attribute flagged by Ruff (RUF012).

Annotate as ClassVar or move into setup_method/fixture to avoid shared mutable state across test instances.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/cloud_governance/common/clouds/aws/cloudtrail/test_cloudtrail_operations.py`
at line 359, The IAM_USERS list in the CloudTrail test class is a mutable class
attribute flagged by Ruff, so move it into instance/setup scope or annotate it
as a ClassVar in the test class that defines IAM_USERS. Update the surrounding
test setup in the CloudTrail operations test so the list is not treated as
shared mutable state across test instances, keeping the symbol name IAM_USERS
easy to locate.

Source: Linters/SAST tools


444-444: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer unpacking over concatenation (RUF005).

Suggested fix
-            iam_users=self.IAM_USERS + ['otheruser'],
+            iam_users=[*self.IAM_USERS, 'otheruser'],
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/cloud_governance/common/clouds/aws/cloudtrail/test_cloudtrail_operations.py`
at line 444, The test setup in the cloudtrail operations case is concatenating
IAM_USERS with a single-item list, which triggers the RUF005 preference for
unpacking. Update the argument near the cloudtrail test data construction that
uses IAM_USERS so it uses unpacking instead of list concatenation, keeping the
same values while making the literal clearer and lint-compliant.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@tests/unittest/cloud_governance/aws/tag_cluster/test_tag_cluster_operations.py`:
- Around line 36-44: The `test_skips_automation_user_from_instance_cloudtrail`
test currently does not exercise the automation-user exclusion path because
`cloud-governance-user` is missing from `iam_users`, so the membership check in
`get_username` short-circuits before `exclude` is applied. Update the test setup
in `_make_tag_cluster_ops` or the test fixture so the automation user is
included in `iam_users`, then keep the assertion that `get_username` returns
`pragchau` and that `cloudtrail.get_username_from_cluster_role` is called,
ensuring the `exclude` logic in `get_username` is what actually filters the
automation user.

In
`@tests/unittest/cloud_governance/aws/tag_non_cluster/test_non_cluster_operations.py`:
- Around line 22-28: The test in
test_excludes_automation_user_from_primary_cloudtrail is not actually covering
the automation-user exclusion path because `_automation_user` is missing from
`iam_users`, so the membership check fails too early. Update the test setup in
`_make_non_cluster_ops`/the test fixture so the automation user is included in
`iam_users`, then keep the assertion on `get_username` to verify the exclusion
logic in `NonClusterOps.get_username` is what returns the empty result.

---

Nitpick comments:
In
`@tests/unittest/cloud_governance/common/clouds/aws/cloudtrail/test_cloudtrail_operations.py`:
- Line 359: The IAM_USERS list in the CloudTrail test class is a mutable class
attribute flagged by Ruff, so move it into instance/setup scope or annotate it
as a ClassVar in the test class that defines IAM_USERS. Update the surrounding
test setup in the CloudTrail operations test so the list is not treated as
shared mutable state across test instances, keeping the symbol name IAM_USERS
easy to locate.
- Line 444: The test setup in the cloudtrail operations case is concatenating
IAM_USERS with a single-item list, which triggers the RUF005 preference for
unpacking. Update the argument near the cloudtrail test data construction that
uses IAM_USERS so it uses unpacking instead of list concatenation, keeping the
same values while making the literal clearer and lint-compliant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 7a557270-70a7-4199-bb64-464499c3e67f

📥 Commits

Reviewing files that changed from the base of the PR and between 0e2fb2e and 8d1d7d2.

📒 Files selected for processing (3)
  • tests/unittest/cloud_governance/aws/tag_cluster/test_tag_cluster_operations.py
  • tests/unittest/cloud_governance/aws/tag_non_cluster/test_non_cluster_operations.py
  • tests/unittest/cloud_governance/common/clouds/aws/cloudtrail/test_cloudtrail_operations.py

@pragya811 pragya811 requested a review from ebattat July 2, 2026 05:35
pragya811 and others added 2 commits July 2, 2026 11:09
Replace real IAM/AD identifiers with fictional names in ROSA tagging
unit tests, and include the automation user in iam_users so exclude
logic is actually exercised.

Co-authored-by: Cursor <cursoragent@cursor.com>

@ebattat ebattat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Tagging can resolve owners via get_username_from_resource_events while
teardown still used CloudTrail-only lookup, leaving partial tags that
caused validate_existing_tag to skip re-applying Budget and LaunchTime.
Align removal with tagging and require input tags in validation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants