fix(cluster): check Kueue Workload admission for list_all_queued()#1035
Conversation
1c8d859 to
01f3b46
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1035 +/- ##
==========================================
- Coverage 96.15% 95.88% -0.28%
==========================================
Files 23 23
Lines 2238 2258 +20
==========================================
+ Hits 2152 2165 +13
- Misses 86 93 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The previous implementation of list_all_queued() incorrectly filtered by RayCluster state (READY/SUSPENDED), which doesn't reliably indicate whether a cluster is queued. Instead, check the Kueue Workload admission status — a cluster is queued when its associated Workload has not been admitted yet (no admission field in status). Fixes: RHOAIENG-54734 Made-with: Cursor
01f3b46 to
ef9851f
Compare
pawelpaszki
left a comment
There was a problem hiding this comment.
verified the scenarios with cursor and additionally by running e2e tests on openshift
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pawelpaszki The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6ab47d3
into
project-codeflare:main
Fixes: RHOAIENG-54734
Made-with: Cursor
Issue link
https://redhat.atlassian.net/browse/RHOAIENG-54734
What changes have been made
check Kueue Workload admission for list_all_queued()
The previous implementation of list_all_queued() incorrectly filtered by RayCluster state (READY/SUSPENDED), which doesn't reliably indicate whether a cluster is queued. Instead, check the Kueue Workload admission status — a cluster is queued when its associated Workload has not been admitted yet (no admission field in status).
Verification steps
Manual / live cluster verification (if you have access to a cluster with Kueue)
Setup: You need a namespace with Kueue installed and at least one RayCluster managed by Kueue.
Test queued clusters are detected:
Submit a RayCluster that will be queued (e.g. request more resources than the ClusterQueue allows)
Verify the Workload has no admission field:
oc get workloads -n -o jsonpath='{.items[*].status.admission}'
Call list_all_queued("") from the SDK — the cluster should appear
Test admitted clusters are NOT shown as queued:
Once Kueue admits the workload (or use a cluster that fits the quota), verify admission exists:
oc get workloads -n -o jsonpath='{.items[*].status.admission}'
Call list_all_queued("") — the cluster should not appear
Test clusters without Kueue are NOT shown as queued:
Create a RayCluster in a namespace without Kueue (no associated Workload)
Call list_all_queued("") — the cluster should not appear (since it has no Workload, it's not managed by Kueue)
Some Edge cases to consider
statusis empty/missingadmission: {}(empty dict)not admissionis truthy for{})admission: {clusterQueue: "..."}Checks