Skip to content

Add transparent pipelining in dynamic mode#6301

Open
rostan-t wants to merge 12 commits intoNVIDIA:mainfrom
rostan-t:ndd-transparent-pipelining
Open

Add transparent pipelining in dynamic mode#6301
rostan-t wants to merge 12 commits intoNVIDIA:mainfrom
rostan-t:ndd-transparent-pipelining

Conversation

@rostan-t
Copy link
Copy Markdown
Collaborator

@rostan-t rostan-t commented Apr 16, 2026

Category:

New feature (non-breaking change which adds functionality)

Description:

This PR adds the argument compile to Reader.next_epoch. When set to True the first iteration traces the operator calls and builds a pipeline that is used instead of the regular dynamic operators on subsequent iterations.

In order to enable prefetching, only operators for which arguments for the next iteration can be determined statically are traced. This is currently limited to arguments that are:

  • Outputs of the reader
  • Outputs of other traced operators
  • DALI constants (e.g., instances of DALI enums)
  • Literals

When an operator cannot be traced, it conservatively falls back to dynamic mode. This is the first version and the API is subject to changes in the future. Known current limitations include:

  • Arguments that are not literals are excluded. This can be addressed later with constant propagation.
  • Multi-line expressions. Expressions that span on multiple lines are currently not parsed. This can lead to operators being needlessly excluded.
  • Multiple operators on the same line. Although not common, if multiple operators are on the same line, the first one is used.
  • Random arguments are excluded.
  • Multiple readers are not supported.

Additional information:

Affected modules and functionalities:

Dynamic mode.

Key points relevant for the review:

  • Does this introduce regressions?
  • Does it have limitations that are not listed above?

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

This is a first prototype. Documentation can be added later.

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-4641

Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Fixed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Fixed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Fixed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Dismissed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Dismissed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_source_analysis.py Dismissed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_source_analysis.py Dismissed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_source_analysis.py Dismissed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_source_analysis.py Dismissed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_source_analysis.py Dismissed
@rostan-t rostan-t force-pushed the ndd-transparent-pipelining branch from 3390a57 to c886281 Compare April 16, 2026 13:21
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR adds compile=True to Reader.next_epoch, enabling transparent pipelining: the first epoch traces operator calls via _compile_intercept and AST-based argument classification (_source_analysis.py), builds a DALI Pipeline with prefetch_queue_depth=2, and serves all subsequent epochs from the compiled pipeline — falling back to dynamic mode per-operator when inputs or call sites don't match the traced graph. Operators with non-literal, non-DALI-constant, or non-CompiledBatch arguments are conservatively excluded from the compiled graph.

Confidence Score: 5/5

Safe to merge; no P0/P1 issues found, all findings are P2 style/robustness suggestions.

All discovered issues are P2 (stale tensor_args in a disabled-tracing fallback path, O(depth) trie walk in compiled mode, is_dali_constant AST bypass, linecache fragility). The core tracing/compiled state machine, epoch counting, prefetch interaction, and call-chain trie correctness look sound and are well-covered by the new test suite.

_compile.py and _source_analysis.py carry the two P2 logic notes; all other files are clean.

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/dynamic/_compile.py New file implementing the transparent pipelining machinery: compile graph (CompileSource/Node/Ref), _CallTrie for call-site identity, CompileContext state machine (TRACING→COMPILED/DISABLED), CompiledEpochIterator, and the _compile_intercept wrapper. Two P2 issues: stale tensor_args in disabled-tracing fallback path, and O(stack depth) frame walk per compiled operator call.
dali/python/nvidia/dali/experimental/dynamic/_source_analysis.py New file providing AST-based call-site analysis and argument classification (constant vs. CompileRef). Two P2 issues: is_dali_constant bypasses AST position bounds check for positional args, and linecache-based source retrieval is fragile for dynamically-defined functions.
dali/python/nvidia/dali/experimental/dynamic/_call_site.py Renamed from _callsite.py; adds CodeLoc NamedTuple, CallChain type alias, and build_call_chain helper for stack capture. Change is clean and well-scoped.
dali/python/nvidia/dali/experimental/dynamic/_ops.py Reader gains compile=True support in next_epoch, plus helper methods (_shard_epoch_size, _advance_shard, _require_api_type, _run_unchecked, _output_batch_size). Refactoring cleanly deduplicates the epoch-size calculation and shard-advance logic across _samples/_batches.
dali/python/nvidia/dali/experimental/dynamic/_op_builder.py Backend resolution extracted into _resolve_backend, fn_call gains _backend parameter, and _compile_intercept is applied after fn_call definition. Logic is preserved correctly; the closure correctly captures the resolved backend/device for _call().
dali/test/python/experimental_mode/test_compile.py Comprehensive test suite covering compile mode: stickiness guards, multi-epoch consistency, loop data-dependency, partial tracing, batch-size/device mismatch, stale-batch detection, tensor args, and dtype casting.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["next_epoch(compile=True)"] --> B{compiled_iter exists?}
    B -- No --> C[Create CompiledEpochIterator]
    C --> D[batches: state=TRACING]
    B -- Yes --> E{state?}
    E -- TRACING --> D
    E -- COMPILED --> F[_batches_compiled]
    E -- DISABLED --> G[reset compile_mode=None]

    D --> H[Run reader batch 0 dynamically]
    H --> I[make_source_batches CompiledBatch iteration=0]
    I --> J[yield batch with compile_ctx.active]
    J --> K{User operator calls via _compile_intercept}
    K -- classify succeeds --> L[record CompileNode in _CallTrie]
    K -- classify fails or stateful --> M[return dynamic result]
    L --> N[return CompiledBatch with CompileRef]

    J --> O[build_pipeline]
    O -- success --> P[state=COMPILED DALI Pipeline built]
    O -- no nodes --> Q[state=DISABLED warn reset reader]
    O -- exception --> R[state=DISABLED raise]

    P --> F
    F --> S[run_pipeline pipeline.run prefetch_queue=2]
    S --> T[cache results in _pipeline_results]
    T --> U[yield batches with compile_ctx.active]
    U --> V{operator call in compiled mode}
    V -- trie lookup hit --> W[return cached CompiledBatch]
    V -- miss --> X[call dynamically]
Loading

Fix All in Claude Code

Reviews (8): Last reviewed commit: "Simplify kwarg type check" | Re-trigger Greptile

Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py
Comment thread dali/python/nvidia/dali/experimental/dynamic/_source_analysis.py Outdated
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Fixed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Fixed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Fixed
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Outdated
@rostan-t rostan-t force-pushed the ndd-transparent-pipelining branch from b04551d to b78da96 Compare April 16, 2026 17:27
@mzient mzient self-assigned this Apr 17, 2026
@mzient mzient force-pushed the ndd-transparent-pipelining branch from b78da96 to 4bb5294 Compare April 17, 2026 07:37
@rostan-t
Copy link
Copy Markdown
Collaborator Author

!build

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [49377550]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [49377550]: BUILD PASSED

Comment thread dali/python/nvidia/dali/experimental/dynamic/_call_site.py
@mzient
Copy link
Copy Markdown
Contributor

mzient commented Apr 24, 2026

I have an architectural concern - but please correct me if I misunderstood the design.

  1. The pipeline can have only one reader. All input data must originate in that reader (hard requirement).
  2. The pipeline doesn't have any external sources.
  3. Once we encounter a node that needs to be evaluated immediately, the graph capture ends and we're back in dynamic mode.
  4. The reader cannot have any run-time arguments - it must be usable as a generator.

If so, a reader called with compile=True could be captured directly as a DataNode from the very start. Instead of creating a standalone operator instance, with workspace and everything for which we pay the Python tax, we could wire it directly to the graph instead of complicating things (and losing performance) on the intermediate ExternalSource. Once our capture is complete (either because something was evaluated or we reached the next iteration), we would just build the pipeline. All prefetching will be done for us.
Using DataNodes directly in CompileNode/CompileRef would likely simplify the code. We don't need to defer construction of DataNodes until capture is complete - we could happily eagerly construct them, since we have to (for now) pessimistically assume that everything is going to be used anyway. And even if that doesn't hold, Pipeline will prune DataNodes that don't contribute to any outputs and don't have side-effects.

Comment on lines +445 to +451
es = ExternalSource(
source=reader_callback,
num_outputs=source.num_outputs,
batch=True,
device=source.device,
)
reader_outs = es()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a crutch that allows us to capture more nodes but also a source of major inefficiency.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll include the reader directly in the pipeline in a follow-up PR.

Comment thread dali/test/python/experimental_mode/test_compile.py Outdated
Comment thread dali/test/python/experimental_mode/test_compile.py Outdated
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Outdated
@rostan-t rostan-t force-pushed the ndd-transparent-pipelining branch from 83406ec to 3af166b Compare April 28, 2026 09:22
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py
Comment thread dali/python/nvidia/dali/experimental/dynamic/_compile.py Outdated
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
rostan-t added 11 commits April 28, 2026 11:40
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
…acing

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t rostan-t force-pushed the ndd-transparent-pipelining branch from 3af166b to d7832b1 Compare April 28, 2026 15:04
@rostan-t
Copy link
Copy Markdown
Collaborator Author

!build

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [49721250]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [49721250]: BUILD PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants