Skip to content

fix(cluster): check Kueue Workload admission for list_all_queued()#1035

Merged
openshift-merge-bot[bot] merged 1 commit into
project-codeflare:mainfrom
laurafitzgerald:bugfix/RHOAIENG-54734-fix-queued-filter
Mar 30, 2026
Merged

fix(cluster): check Kueue Workload admission for list_all_queued()#1035
openshift-merge-bot[bot] merged 1 commit into
project-codeflare:mainfrom
laurafitzgerald:bugfix/RHOAIENG-54734-fix-queued-filter

Conversation

@laurafitzgerald

@laurafitzgerald laurafitzgerald commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Fixes: RHOAIENG-54734
Made-with: Cursor

Issue link

https://redhat.atlassian.net/browse/RHOAIENG-54734

What changes have been made

check Kueue Workload admission for list_all_queued()

The previous implementation of list_all_queued() incorrectly filtered by RayCluster state (READY/SUSPENDED), which doesn't reliably indicate whether a cluster is queued. Instead, check the Kueue Workload admission status — a cluster is queued when its associated Workload has not been admitted yet (no admission field in status).

Verification steps

Manual / live cluster verification (if you have access to a cluster with Kueue)
Setup: You need a namespace with Kueue installed and at least one RayCluster managed by Kueue.

Test queued clusters are detected:

Submit a RayCluster that will be queued (e.g. request more resources than the ClusterQueue allows)
Verify the Workload has no admission field:
oc get workloads -n -o jsonpath='{.items[*].status.admission}'
Call list_all_queued("") from the SDK — the cluster should appear
Test admitted clusters are NOT shown as queued:

Once Kueue admits the workload (or use a cluster that fits the quota), verify admission exists:
oc get workloads -n -o jsonpath='{.items[*].status.admission}'
Call list_all_queued("") — the cluster should not appear
Test clusters without Kueue are NOT shown as queued:

Create a RayCluster in a namespace without Kueue (no associated Workload)
Call list_all_queued("") — the cluster should not appear (since it has no Workload, it's not managed by Kueue)

Some Edge cases to consider

Scenario Expected result
No RayClusters in namespace Empty list, "No resources found" message
Kueue not installed (API 404) Empty list (exception caught gracefully)
Workload exists but status is empty/missing Cluster shown as queued (no admission = queued)
Workload has admission: {} (empty dict) Cluster shown as queued (not admission is truthy for {})
Workload has admission: {clusterQueue: "..."} Cluster not shown as queued

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@laurafitzgerald laurafitzgerald force-pushed the bugfix/RHOAIENG-54734-fix-queued-filter branch from 1c8d859 to 01f3b46 Compare March 25, 2026 11:26
@codecov

codecov Bot commented Mar 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.95652% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.88%. Comparing base (d87aac4) to head (ef9851f).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/codeflare_sdk/ray/cluster/cluster.py 86.95% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1035      +/-   ##
==========================================
- Coverage   96.15%   95.88%   -0.28%     
==========================================
  Files          23       23              
  Lines        2238     2258      +20     
==========================================
+ Hits         2152     2165      +13     
- Misses         86       93       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The previous implementation of list_all_queued() incorrectly filtered
by RayCluster state (READY/SUSPENDED), which doesn't reliably indicate
whether a cluster is queued. Instead, check the Kueue Workload admission
status — a cluster is queued when its associated Workload has not been
admitted yet (no admission field in status).

Fixes: RHOAIENG-54734
Made-with: Cursor
@laurafitzgerald laurafitzgerald force-pushed the bugfix/RHOAIENG-54734-fix-queued-filter branch from 01f3b46 to ef9851f Compare March 25, 2026 11:29

@pawelpaszki pawelpaszki left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified the scenarios with cursor and additionally by running e2e tests on openshift

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2026
@openshift-ci

openshift-ci Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pawelpaszki

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 30, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 6ab47d3 into project-codeflare:main Mar 30, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants