[Data] Decouple sub-progress metrics from progress bar implementations#63195
[Data] Decouple sub-progress metrics from progress bar implementations#63195400Ping wants to merge 14 commits into
Conversation
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
There was a problem hiding this comment.
Code Review
This pull request refactors the sub-progress tracking system in Ray Data, transitioning from direct progress bar manipulation to a metrics-based architecture. It replaces SubProgressBarMixin with SubProgressMixin and introduces ProgressMetrics for immutable state snapshots along with SubProgressUpdater for driver-side updates. Several operators and progress managers (Logging, Rich, and TQDM) have been updated to support this new system. Feedback identifies critical issues: SubProgressUpdater is missing required abstract methods (close, refresh) from its base class, which will prevent instantiation; TqdmSubProgressBar lacks the update_absolute method called by the progress manager; and the inclusion of a non-serializable threading.Lock in SubProgressUpdater will likely cause serialization failures when TaskContext is sent to worker tasks.
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
Signed-off-by: 400Ping <jiekaichang@apache.org>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit a1fbc6c. Configure here.
Signed-off-by: 400Ping <jiekaichang@apache.org>
|
cc @kyuds to take a look |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |

Description
This PR refactors Ray Data sub-progress reporting so operators expose sub-progress
as structured metrics instead of receiving or holding concrete progress bar
objects.
The main goal is to decouple execution state from display/rendering state:
This addresses #60082 by removing the bidirectional coupling where progress
managers created sub-progress bars and injected them back into operators.
What Changed
Sub-progress API
ProgressMetricsas an immutable snapshot of sub-progress state:nametotalcompletedSubProgressMixinto expose:get_sub_progress_metrics()get_sub_progress_updaters()SubProgressUpdater
SubProgressUpdaternow updatesProgressMetricssnapshots instead ofwrapping concrete progress bar objects.
bars back into operators. Progress managers now attach display-sync callbacks
to driver-side updaters instead.
when metrics change.
Progress Managers
tqdmandrichprogress managers now own their concrete display bars andsync them from
ProgressMetricsupdates.loggingprogress manager reads metrics directly instead of using a fakelogging sub-progress bar.
noopprogress manager no longer needs to attachNonedisplay bars.Operators and Schedulers
Updates sub-progress users to the new metrics/updater model, including:
AllToAllOperatorWhy This Design
Operators should own execution state, while progress managers should own
rendering.
This keeps the operator layer focused on reporting progress as data, and keeps
display-specific behavior inside progress managers. It also avoids making
operators depend on
tqdm,rich, logging, or noop progress barimplementations.
The resulting model is:
Related issues
Closes #60082