Skip to content

fix(cluster): add bounds check for workerGroupSpecs array access#1041

Merged
openshift-merge-bot[bot] merged 1 commit into
project-codeflare:mainfrom
laurafitzgerald:bugfix/RHOAIENG-54729-worker-group-specs-bounds-check
Mar 30, 2026
Merged

fix(cluster): add bounds check for workerGroupSpecs array access#1041
openshift-merge-bot[bot] merged 1 commit into
project-codeflare:mainfrom
laurafitzgerald:bugfix/RHOAIENG-54729-worker-group-specs-bounds-check

Conversation

@laurafitzgerald

@laurafitzgerald laurafitzgerald commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Fixes: RHOAIENG-54729
Made-with: Cursor

Issue link

https://redhat.atlassian.net/browse/RHOAIENG-54729

What changes have been made

Add defensive bounds checking before accessing workerGroupSpecs[0] in three functions to prevent IndexError on head-only Ray clusters (num_workers=0).
Head-only clusters have an empty workerGroupSpecs list, which caused crashes when calling status(), list_all_clusters(), or get_cluster(). The fix guards all three access points and defaults worker-related values to 0 when no worker groups are defined.
Functions fixed:

  • _head_worker_extended_resources_from_rc_dict() — returns empty worker resources dict
  • get_cluster() — defaults worker CPU/memory limits/requests to 0
  • _map_to_ray_cluster() — defaults worker specs to 0

Verification steps

  1. Run the unit tests:
    poetry run pytest src/codeflare_sdk/ray/cluster/test_cluster.py -v
    
  2. The new test_head_only_cluster_no_workers test validates all three code paths with an empty workerGroupSpecs list.
  3. Manual verification (requires KubeRay operator):
    • Create a RayCluster with num_workers=0 (empty workerGroupSpecs)
    • Call cluster.status(), list_all_clusters(), and cluster.details()
    • Confirm no IndexError; results show num_workers=0

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Add defensive bounds checking before accessing workerGroupSpecs[0] in
three functions to prevent IndexError on head-only Ray clusters
(num_workers=0). When workerGroupSpecs is empty, worker-related values
default to 0.

Functions fixed:
- _head_worker_extended_resources_from_rc_dict()
- get_cluster()
- _map_to_ray_cluster()

Fixes: RHOAIENG-54729
Made-with: Cursor
@openshift-ci openshift-ci Bot requested review from pawelpaszki and szaher March 25, 2026 16:10
@codecov

codecov Bot commented Mar 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.32%. Comparing base (d87aac4) to head (4c05c43).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1041      +/-   ##
==========================================
+ Coverage   96.15%   96.32%   +0.17%     
==========================================
  Files          23       23              
  Lines        2238     2261      +23     
==========================================
+ Hits         2152     2178      +26     
+ Misses         86       83       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2026
@openshift-ci

openshift-ci Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pawelpaszki

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 30, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit f8368da into project-codeflare:main Mar 30, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants