Feature Request
Problem / Motivation
The current frame-parallel implementation parallelises the FrameGraph/covariance stage, but the recent scaling results suggest that this may not dominate the full CodeEntropy workflow as much as expected, especially for the smaller benchmark systems.
Some expensive frame-dependent work still appears to happen before the current frame-parallel section, particularly in the static stage. This may include the dihedral/conformational analysis and neighbour calculation. As a result, the overall workflow scaling may be limited by serial work outside the current Dask frame execution path.
This also means each parallel task currently has a relatively small unit of work, as workers mainly process the covariance pathway for a frame. A larger frame-based unit of work may reduce overhead and improve scaling.
Proposed Solution
Add clearer profiling/timing around the main LevelDAG stages to identify which parts of the workflow are still dominating runtime. This should include timings for:
- Static setup/stage execution
- Dihedral/conformational analysis
- Neighbour calculation
- FrameGraph/covariance execution
- Frame reduction/finalisation
If profiling confirms that frame-dependent work in the static stage is a significant bottleneck, investigate restructuring the workflow so more of this work is moved into the frame-parallel path.
The longer-term structure would be closer to:
for frame or frame_chunk in selected_frames:
compute covariance contribution
compute neighbour contribution
compute heavy frame-dependent dihedral/conformational contributions
return compact partial results
rather than the current structure where only the covariance path is handled by the frame-parallel FrameGraph.
For dihedral/conformational analysis, this may require a map-reduce style approach because some parts depend on trajectory-wide information, such as peak/state assignment. For example:
Pass 1:
workers compute partial dihedral angle/histogram data per frame chunk
Reduce:
combine partial histograms and identify global peaks/states
Pass 2:
workers assign conformational states using the global peak/state data
Reduce:
combine final state counts/populations
Neighbour calculation may be a simpler first candidate, as it already appears to follow a frame-based structure.
Alternatives Considered
-
Keep the current implementation as covariance-only frame parallelism.
- This is useful and provides the initial Dask/HPC infrastructure, but may not give the strongest whole-workflow scaling if other serial stages dominate.
-
Only optimise individual functions within the static stage.
- This may improve runtime locally, but would not address the larger issue that expensive frame-dependent work remains outside the frame-parallel execution path.
-
Increase the number of Dask workers without changing the task structure.
- This is unlikely to fully solve the issue if the parallel task size remains small and significant serial work remains outside the parallel path.
Expected Impact
- Clearer understanding of where CodeEntropy runtime is spent after the initial frame-parallel implementation.
- Better evidence for whether dihedral/conformational analysis, neighbour calculation, or another stage is limiting scaling.
- Potentially stronger Dask/HPC scaling by increasing the amount of useful work done per worker.
- Cleaner long-term parallel structure, closer to an outer frame/chunk loop where all frame-dependent work is grouped together.
- Potential memory improvements by returning compact partial sums, histograms, or counts instead of building larger all-frame objects where possible.
- Better benchmark evidence for future paper edits and performance discussion.
Additional Context
The current frame-parallel implementation is an important first step because it introduces the explicit frame-local boundary and Dask/HPC execution infrastructure.
Initial profiling with SnakeViz suggested that the FrameGraph/covariance pathway was the main runtime cost, which motivated parallelising that section first. However, benchmark scaling suggests that other workflow stages may still be contributing enough serial runtime to limit overall speedup.
This issue is intended as a follow-up investigation and possible restructuring step, rather than a replacement for the current implementation.
Feature Request
Problem / Motivation
The current frame-parallel implementation parallelises the
FrameGraph/covariance stage, but the recent scaling results suggest that this may not dominate the full CodeEntropy workflow as much as expected, especially for the smaller benchmark systems.Some expensive frame-dependent work still appears to happen before the current frame-parallel section, particularly in the static stage. This may include the dihedral/conformational analysis and neighbour calculation. As a result, the overall workflow scaling may be limited by serial work outside the current Dask frame execution path.
This also means each parallel task currently has a relatively small unit of work, as workers mainly process the covariance pathway for a frame. A larger frame-based unit of work may reduce overhead and improve scaling.
Proposed Solution
Add clearer profiling/timing around the main
LevelDAGstages to identify which parts of the workflow are still dominating runtime. This should include timings for:If profiling confirms that frame-dependent work in the static stage is a significant bottleneck, investigate restructuring the workflow so more of this work is moved into the frame-parallel path.
The longer-term structure would be closer to:
rather than the current structure where only the covariance path is handled by the frame-parallel
FrameGraph.For dihedral/conformational analysis, this may require a map-reduce style approach because some parts depend on trajectory-wide information, such as peak/state assignment. For example:
Neighbour calculation may be a simpler first candidate, as it already appears to follow a frame-based structure.
Alternatives Considered
Keep the current implementation as covariance-only frame parallelism.
Only optimise individual functions within the static stage.
Increase the number of Dask workers without changing the task structure.
Expected Impact
Additional Context
The current frame-parallel implementation is an important first step because it introduces the explicit frame-local boundary and Dask/HPC execution infrastructure.
Initial profiling with SnakeViz suggested that the
FrameGraph/covariance pathway was the main runtime cost, which motivated parallelising that section first. However, benchmark scaling suggests that other workflow stages may still be contributing enough serial runtime to limit overall speedup.This issue is intended as a follow-up investigation and possible restructuring step, rather than a replacement for the current implementation.