Fix separated external source prefetch drain#6397
Conversation
a17716a to
d972238
Compare
|
!build |
|
| Filename | Overview |
|---|---|
| dali/python/nvidia/dali/pipeline.py | Core fix: removes _prefetch_inputs and the is_prefetch variant of _run_input_callbacks, routing both separated and non-separated execution through _legacy_interleaved_prefetch with a corrected loop count; logic is correct and consistent with the C++ change. |
| dali/pipeline/executor/async_separated_pipelined_executor.cc | Prefetch() CPU-only tail reduced from cpu_size to max(0, cpu_size - gpu_size) rounds; InputFeedCount returns max(cpu_size, gpu_size) — both consistent with the Python-side loop count change. |
| dali/test/python/test_pipeline.py | New regression test covers all three queue-depth combinations (symmetric and both asymmetric directions); validates decoded image cardinality, rank, channel count, and non-zero pixel content; expects StopIteration on the over-run call. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Py as pipeline.py (_prefetch)
participant LIP as _legacy_interleaved_prefetch
participant ICS as _iter_setup / _run_input_callbacks
participant Exec as AsyncSeparatedPipelinedExecutor
participant CPU as CPU stage
participant Mix as Mixed stage
participant GPU as GPU stage
Note over Py,GPU: New flow (both separated and non-separated)
Py->>LIP: always delegate
LIP->>LIP: "prefetch_count = max(cpu_size, gpu_size) if exec_separated else cpu_size"
loop prefetch_count times
LIP->>ICS: _iter_setup()
ICS->>ICS: _run_input_callbacks() - feed 1 batch
LIP->>Exec: _pipe.Run()
Exec->>CPU: RunCPU()
Exec->>Mix: RunMixed()
Exec->>GPU: RunGPU()
end
Note over Py,GPU: Old separated flow (removed) - fed cpu_size+gpu_size batches upfront
Note over Py,GPU: then called _pipe.Prefetch() leaving CPU-only tail at epoch boundary
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Py as pipeline.py (_prefetch)
participant LIP as _legacy_interleaved_prefetch
participant ICS as _iter_setup / _run_input_callbacks
participant Exec as AsyncSeparatedPipelinedExecutor
participant CPU as CPU stage
participant Mix as Mixed stage
participant GPU as GPU stage
Note over Py,GPU: New flow (both separated and non-separated)
Py->>LIP: always delegate
LIP->>LIP: "prefetch_count = max(cpu_size, gpu_size) if exec_separated else cpu_size"
loop prefetch_count times
LIP->>ICS: _iter_setup()
ICS->>ICS: _run_input_callbacks() - feed 1 batch
LIP->>Exec: _pipe.Run()
Exec->>CPU: RunCPU()
Exec->>Mix: RunMixed()
Exec->>GPU: RunGPU()
end
Note over Py,GPU: Old separated flow (removed) - fed cpu_size+gpu_size batches upfront
Note over Py,GPU: then called _pipe.Prefetch() leaving CPU-only tail at epoch boundary
Reviews (23): Last reviewed commit: "Reset all external sources on epoch end" | Re-trigger Greptile
|
CI MESSAGE: [55012969]: BUILD STARTED |
|
CI MESSAGE: [55012969]: BUILD PASSED |
d972238 to
82c4432
Compare
|
@greptile review |
82c4432 to
fe2b252
Compare
|
@greptile review |
fe2b252 to
3c8286e
Compare
|
@greptile review |
1 similar comment
|
@greptile review |
Keep pipeline prefetching interleaved with backend runs so separated execution does not leave CPU-prefetched external source batches without scheduled Mixed/GPU work at end of epoch. Prime separated execution for the maximum of CPU and GPU queue depths to avoid underfilling asymmetric queue configurations. Add a regression that drains a batch external source through mixed image decoding with symmetric and asymmetric separated CPU/GPU prefetch queues. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
3c8286e to
1276420
Compare
|
@greptile review |
1 similar comment
|
@greptile review |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
@greptile review |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
@greptile review |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
!build |
|
CI MESSAGE: [55097813]: BUILD STARTED |
|
CI MESSAGE: [55097813]: BUILD FAILED |
|
@greptile review |
Make async separated executor prefetch and InputFeedCount use the same maximum queue-depth contract. Keep the Python separated prefetch path for drainable queue shapes, but use interleaved prefetch when the CPU queue is longer than the GPU queue so end-of-epoch Python sources do not leave CPU-only work without scheduled Mixed/GPU stages. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
a0755d2 to
6fa12d5
Compare
|
@greptile review |
|
!build |
|
CI MESSAGE: [55110109]: BUILD STARTED |
Route separated Python prefetch through the interleaved path for all queue shapes. The bulk _prefetch_inputs path could call _run_input_callbacks with a stale argument and underfeed the backend Prefetch schedule for Python external sources, causing either a TypeError or a hang at epoch end. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
@greptile review |
1 similar comment
|
@greptile review |
|
!build |
|
CI MESSAGE: [55116811]: BUILD STARTED |
|
CI MESSAGE: [55110109]: BUILD FAILED |
|
CI MESSAGE: [55116811]: BUILD FAILED |
|
!build |
|
CI MESSAGE: [55485233]: BUILD STARTED |
|
CI MESSAGE: [55485233]: BUILD FAILED |
|
!build |
|
CI MESSAGE: [55529667]: BUILD STARTED |
|
CI MESSAGE: [55529667]: BUILD FAILED |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
!build |
|
CI MESSAGE: [55541646]: BUILD STARTED |
|
CI MESSAGE: [55541646]: BUILD PASSED |
Category:
Bug fix
Description:
This PR fixes a hang at the end of an epoch when a Python-managed
external_sourceis used with separated CPU/GPU prefetch queues.The newer prefetch path fed all inputs before running the backend. For
separated execution this could leave CPU-prefetched external source batches
without scheduled Mixed/GPU work once the source reached end of epoch. The
consumer then waited indefinitely for output indexes in the separated queue
policy.
The fix keeps prefetching interleaved with backend runs, so input feeding and
backend scheduling stay aligned at epoch boundaries.
Related to #5199
Additional information:
Affected modules and functionalities:
nvidia.dali.Pipelineprefetch scheduling in the legacy executor path.external_source(batch=True, cycle="raise")with mixed image decoding and separated prefetch queues.
Key points relevant for the review:
Review whether routing prefetch through the interleaved path is acceptable for
legacy separated execution. This restores the behavior that avoids a CPU-only
tail when input callbacks reach end of epoch.
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A