[Data] Decouple sub-progress metrics from progress bar implementations#62262
[Data] Decouple sub-progress metrics from progress bar implementations#62262400Ping wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the sub-progress tracking system in Ray Data, transitioning from a direct progress bar dictionary to a more robust metrics-based approach using ProgressMetrics and SubProgressUpdater. The changes involve updating various physical operators (AllToAll, HashShuffle, GPUShuffle) and progress managers (Logging, Rich, Tqdm) to utilize the new SubProgressMixin. Key feedback includes addressing potential type mismatches and state inconsistencies in legacy setters, ensuring thread safety in the SubProgressUpdater.update method to prevent race conditions, and improving maintainability by using constants for hardcoded strings and name lengths.
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
Unstale |
|
|
||
| class NoopSubProgressBar(BaseProgressBar): | ||
| """Sub-Progress Bar for Noop (Disabled) Progress Manager""" | ||
| @dataclass(frozen=True) |
There was a problem hiding this comment.
whats the purpose of keeping this class frozen?
There was a problem hiding this comment.
It is frozen because ProgressMetrics is meant to be an immutable snapshot, not a mutable progress bar object. SubProgressUpdater updates progress by replacing the snapshot, while progress managers only read/sync from it. This keeps rendering code from mutating operator-owned execution state directly.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit f164692. Configure here.
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
8f24ad6 to
827b23a
Compare
|
Opening a new on at #63195 , there seems to be a PR git ref error |

Summary
This PR refactors Ray Data sub-progress reporting so operators expose sub-progress as metrics instead of receiving concrete progress bar objects from progress managers.
Before this change, the flow looked like this:
After this change, the flow is:
Dict[str, ProgressMetrics].This removes the previous bi-directional coupling between operators and progress bar implementations.
What Changed
API changes
get_sub_progress_metrics() -> Dict[str, ProgressMetrics]ProgressMetricsis now a frozen dataclass snapshot with:nametotalcompletedInternal execution changes
SubProgressUpdaterhelpers for driver-side mutation paths that still need:update(...)fetch_until_complete(...)block_until_complete(...)TaskContextnow carries:sub_progress_metricssub_progress_updatersProgress manager changes
tqdm,rich,logging, andnoopprogress managers no longer inject sub-progress bar objects back into operators.NoopSubProgressBaris no longer needed.Operator and scheduler updates
Updated the following sub-progress users to the new model:
AllToAllOperatorWhy This Design
This PR keeps the external sub-progress interface simple and metrics-based, while preserving existing driver-side execution behavior for shuffle/sample paths that still need incremental wait-and-update helpers.
Conceptually:
This is a middle ground that removes bar injection and keeps the metrics interface clean, without rewriting all driver-side progress update paths at once.
Related Issues
SubProgressMixinfrom progress bar #60082Dataset.map_batchesprogress bar is incorrect #33517