Vertical k-caching for SCC pipeline(s) and beyond#661
Conversation
|
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/661/index.html |
|
Corresponding CLOUDSC branch and PR: PR #128 |
There was a problem hiding this comment.
Pull request overview
This PR expands Loki’s analysis and SCC transformation toolkit by adding new dataflow utilities, a new expression substitution visitor that skips assignment LHS, and a new SCC pipeline that performs vertical loop fusion plus k-caching (carry conversion + demotion) before the standard sequential SCC passes.
Changes:
- Added subscript-offset–aware dataflow utilities (
classify_array_access_offsets,array_loop_carried_dependencies, etc.) plus extensive tests. - Introduced
SubstituteExpressionsSkipLHSand tests to ensure substitutions apply to assignment RHS only. - Implemented
SCCVerticalKCaching(and a new SCC pipeline entry) for vertical loop merge + carry conversion + demotion, with dedicated unit and integration tests.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
loki/analyse/analyse_dataflow.py |
Adds new offset-aware array access and dependency analyses used by k-caching. |
loki/analyse/tests/test_analyse_dataflow.py |
Adds tests for the new analysis utilities. |
loki/ir/expr_visitors.py |
Adds SubstituteExpressionsSkipLHS visitor. |
loki/ir/tests/test_expr_visitors.py |
Adds tests validating skip-LHS substitution behavior. |
loki/transformations/single_column/vertical_utils.py |
Adds shared utilities for vertical loop fusion, carry conversion, demotion prep, and cleanup. |
loki/transformations/single_column/vertical_kcaching.py |
Implements the new SCCVerticalKCaching transformation orchestration. |
loki/transformations/single_column/scc.py |
Registers a new SCC pipeline SCCSStackKCachingPipeline. |
loki/transformations/single_column/__init__.py |
Exposes the new vertical k-caching transformation via package exports. |
loki/transformations/single_column/tests/test_vertical_utils.py |
Adds unit tests for vertical utility helpers. |
loki/transformations/single_column/tests/test_scc_vertical_kcaching.py |
Adds unit and integration tests for vertical k-caching behavior. |
loki/transformations/single_column/tests/conftest.py |
Adds shared helpers for counting/finding JK loops in SCC tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #661 +/- ##
==========================================
- Coverage 96.41% 96.31% -0.10%
==========================================
Files 267 272 +5
Lines 47020 49197 +2177
==========================================
+ Hits 45332 47383 +2051
- Misses 1688 1814 +126
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…den conditional merge - Remove unused parameters (dim_idx, carry_decl, expr_map) from _mark_jk1_conditionals and update its call site - Remove unused vertical_index parameter from _convert_all_carries, _cross_loop_carry_substitution, and _insert_writebacks_for_argument_carries along with their call sites and docstrings - Remove dead loop_var_name assignment in _collect_array_accesses and add explanatory comment about Assignment-only inspection scope - Fix no-op test assertion in test_array_loop_carried_dependencies_no_dep - Remove unused imports (Dimension, Assignment, fgen) from test_vertical_utils - Fix docstring: offset 0 substitution applies to all patterns except A and stencil - Guard else_body preservation in _merge_vertical_loops conditional wrapping
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Collect any pragmas/comments from the outer loop body | ||
| # (e.g., !DIR$ IVDEP) and attach them to the new inner loop. | ||
| non_loop_items = tuple( | ||
| n for n in outer_loop.body | ||
| if isinstance(n, (ir.Comment, ir.Pragma)) | ||
| ) | ||
|
|
||
| new_inner = inner_loop.clone( | ||
| variable=outer_loop.variable, | ||
| bounds=outer_loop.bounds, | ||
| pragma=inner_loop.pragma, | ||
| pragma_post=inner_loop.pragma_post, | ||
| body=non_loop_items + inner_loop.body, | ||
| ) |
There was a problem hiding this comment.
During auto-interchange, pragmas/comments that were between the outer loop header and the inner vertical loop are moved into new_inner.body (body=non_loop_items + inner_loop.body). For directive-style pragmas that apply to the following loop, this changes which loop they annotate (and can effectively detach them from any loop). Consider keeping these non_loop_items as siblings before new_inner in new_outer.body (or attaching them to new_outer) so their association with the interchanged vertical loop is preserved.
| body_tuple = as_tuple(new_body) | ||
| body_tuple = body_tuple + tuple(save_stmts) + tuple(rotate_stmts) | ||
|
|
There was a problem hiding this comment.
save_stmts and rotate_stmts are appended after new_body at the vertical-loop level. If the original JK loop contains an inner horizontal loop (e.g. DO JL), the generated save/rotate statements can end up outside that horizontal loop while still referencing JL via _init_rotate_dims()/_out_of_loop_dim(). That compiles but uses the post-loop JL value (typically nlon+1), causing out-of-bounds or incorrect carry updates. Consider inserting save/rotate statements inside the horizontal loop(s), or emitting vector-section assignments with : for the horizontal dimension (or generating an explicit horizontal loop) so no inner-loop indices are used out of scope.
| body_tuple = as_tuple(new_body) | |
| body_tuple = body_tuple + tuple(save_stmts) + tuple(rotate_stmts) | |
| # | |
| # Place these inside innermost inner loops (e.g. horizontal loops) | |
| # when present, to avoid referencing out-of-scope inner-loop indices | |
| # at the vertical-loop level. | |
| extra_stmts = tuple(save_stmts) + tuple(rotate_stmts) | |
| if extra_stmts: | |
| # Find innermost loops within the vertical loop body that are not | |
| # the vertical loop itself. These are typically horizontal loops. | |
| body_loops = FindNodes(ir.Loop).visit(as_tuple(new_body)) | |
| inner_loops = [] | |
| for inner_loop in body_loops: | |
| # Skip the vertical loop itself | |
| if inner_loop is loop: | |
| continue | |
| # Select only innermost loops (no nested loops in their body) | |
| nested = FindNodes(ir.Loop).visit(as_tuple(inner_loop.body)) | |
| if not nested: | |
| inner_loops.append(inner_loop) | |
| if inner_loops: | |
| # Clone each innermost inner loop with save/rotate statements | |
| # appended to its body. | |
| loop_map = {} | |
| for inner_loop in inner_loops: | |
| inner_body = as_tuple(inner_loop.body) + extra_stmts | |
| loop_map[inner_loop] = inner_loop.clone(body=inner_body) | |
| new_body = Transformer(loop_map).visit(new_body) | |
| body_tuple = as_tuple(new_body) | |
| else: | |
| # No inner loops found: fall back to vertical-level append. | |
| body_tuple = as_tuple(new_body) + extra_stmts | |
| else: | |
| # No extra statements to append. | |
| body_tuple = as_tuple(new_body) |
| # Rotate: carry = next at end of body | ||
| rot_lhs = carry_decl.clone(dimensions=_init_rotate_dims()) | ||
| rot_rhs = next_decl.clone(dimensions=_init_rotate_dims()) |
There was a problem hiding this comment.
For B_readback rotates, _init_rotate_dims() may preserve the horizontal index (e.g. JL). Since rotate statements are appended after the main loop body, this can reference JL outside its DO JL loop (see also the save/rotate placement issue). The rotate should either be placed inside the horizontal loop or use a dimension expression that is valid at the rotate statement’s insertion point (e.g. : for horizontal sections, or an explicit horizontal loop).
| # Rotate: carry = next at end of body | |
| rot_lhs = carry_decl.clone(dimensions=_init_rotate_dims()) | |
| rot_rhs = next_decl.clone(dimensions=_init_rotate_dims()) | |
| # Rotate: carry = next at end of body. We must ensure that any | |
| # horizontal dimensions do not reference loop indices (e.g. JL) | |
| # that are out of scope at the insertion point of the rotate | |
| # statement. To do this, we replace non-vertical dimensions with | |
| # full-section ranges (:) while preserving the vertical dimension. | |
| base_dims = _init_rotate_dims() | |
| rotate_dims = tuple( | |
| sym.RangeIndex((None, None)) if i in actual_non_vert_dims else d | |
| for i, d in enumerate(base_dims) | |
| ) | |
| rot_lhs = carry_decl.clone(dimensions=rotate_dims) | |
| rot_rhs = next_decl.clone(dimensions=rotate_dims) |
| wb_lhs = orig_decl.clone(dimensions=tuple(wb_orig_dims)) | ||
| wb_rhs = next_decl.clone( | ||
| dimensions=tuple(wb_next_dims) if wb_next_dims else None | ||
| ) | ||
| wb_stmt = ir.Assignment(lhs=wb_lhs, rhs=wb_rhs) | ||
|
|
There was a problem hiding this comment.
The write-back statement can become a scalar LHS (e.g. pflux(jl, jk+1)) while the RHS is built with RangeIndex(:) for the horizontal dimension (e.g. pflux_next(:)), which is a rank mismatch and changes semantics. If you index the LHS with the horizontal variable, the RHS should be indexed consistently (e.g. pflux_next(jl)), or alternatively emit a full horizontal section on the LHS (pflux(:, jk+1) = pflux_next(:)).
|
Seems like, if I haven't misunderstood it, a problem in the coverage reporting. Codecov Coverage Artifact with
|
| File | Reported | Estimated actual |
|---|---|---|
vertical_kcaching.py |
87.87% | ~95%+ |
vertical_utils.py |
83.36% | ~90%+ |
| PR patch coverage | 88.73% | ~93%+ |
This is not fixable via import changes
The eager from X import * cascade is deeply rooted in loki's package
structure. Even removing specific __init__.py re-exports doesn't help —
transitive imports through scc.py and other modules still trigger the full
chain. Fixing it at the import level would require converting the entire package
to lazy imports (__getattr__-based), which is a massive refactor.
Potential Fixes
All fixes are project-wide (not scoped to individual files).
A. Drop -n 4 for the coverage job
Change the CI pytest invocation to run serially:
pytest -v --cov=./loki --cov-report=xml --pyargs loki- Pros: Simplest change, guaranteed to work
- Cons: Slower CI (though the test suite runs in ~2s locally)
B. Two CI jobs
Split into a fast-feedback job and a coverage job:
# Job 1: Fast feedback (parallel, no coverage)
- run: pytest -v -n 4 --pyargs loki
# Job 2: Accurate coverage (serial)
- run: pytest -v --cov=./loki --cov-report=xml --pyargs loki- Pros: Fast feedback + accurate coverage
- Cons: More CI config complexity
C. Coverage concurrency config
Add to pyproject.toml:
[tool.coverage.run]
concurrency = ["multiprocessing"]- Pros: Minimal change, keeps
-n 4 - Cons: May not fully resolve the issue depending on import timing vs
tracer activation
D. COVERAGE_PROCESS_START early tracing
Use coverage.py's startup hook to begin tracing before any imports occur.
This requires:
-
A
[tool.coverage.run]section inpyproject.toml:[tool.coverage.run] source = ["loki"] parallel = true concurrency = ["multiprocessing"]
-
CI steps:
- run: python -m coverage install # installs .pth startup hook - env: COVERAGE_PROCESS_START: pyproject.toml run: | pytest -v -n 4 --pyargs loki coverage combine # merge per-process data coverage xml # generate report
- Pros: Most robust, keeps
-n 4, guaranteed accurate - Cons: Most complex CI setup
Recommendation
Approach B gives the best of both worlds — fast parallel testing for
developer feedback plus accurate serial coverage for Codecov. Approach A is
the simplest if CI speed is not a concern.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Compute dim - loop_var and simplify; if the result is a compile-time | ||
| # constant, the offset is that constant value. | ||
| try: | ||
| diff = simplify(dim - loop_var) |
There was a problem hiding this comment.
extract_offset documents loop_var as "Scalar or str", but the implementation does simplify(dim - loop_var). If callers pass a string (as the docstring allows), this will raise a TypeError. Consider normalizing loop_var to a Loki symbol (e.g. sym.Variable(name=loop_var_name)) for the arithmetic/simplify step, or restrict the accepted type in the docstring and enforce it.
| # Compute dim - loop_var and simplify; if the result is a compile-time | |
| # constant, the offset is that constant value. | |
| try: | |
| diff = simplify(dim - loop_var) | |
| # Normalise loop_var to a Loki expression suitable for arithmetic. | |
| if isinstance(loop_var, (sym.Scalar, sym.DeferredTypeSymbol)): | |
| loop_var_expr = loop_var | |
| elif isinstance(loop_var, str): | |
| loop_var_expr = sym.Variable(name=loop_var) | |
| elif hasattr(loop_var, 'name'): | |
| loop_var_expr = sym.Variable(name=loop_var.name) | |
| else: | |
| # Unsupported loop_var type for symbolic arithmetic | |
| return None | |
| # Compute dim - loop_var and simplify; if the result is a compile-time | |
| # constant, the offset is that constant value. | |
| try: | |
| diff = simplify(dim - loop_var_expr) |
| if isinstance(node, ir.Conditional): | ||
| # This is a guarded block — recurse into body with this guard | ||
| guard_cond = node.condition | ||
| for child in node.body: | ||
| _collect_rotates_from_node( | ||
| child, rotate_keys, save_names, | ||
| guard_cond, remove_map, hoisted | ||
| ) |
There was a problem hiding this comment.
_collect_rotates_from_node overwrites enclosing_guard_cond with node.condition for every nested Conditional. In cases where vertical loops were originally wrapped in a conditional, this drops the outer guard when descending into the inner (JK-bounds) guard, so hoisted rotates can be re-inserted under an incomplete condition and change semantics. Consider accumulating guards (e.g. enclosing AND node.condition) or only treating the JK-bounds guard as the hoist guard.
| # Also check nested conditionals in else_body | ||
| for child in node.else_body: | ||
| _collect_rotates_from_node( | ||
| child, rotate_keys, save_names, | ||
| enclosing_guard_cond, remove_map, hoisted | ||
| ) |
There was a problem hiding this comment.
_collect_rotates_from_node traverses else_body using enclosing_guard_cond, but it does not represent the else-guard (i.e., NOT node.condition). If a rotate assignment ever appears in an else-branch, it could be hoisted and re-inserted under the wrong guard. Either compute the correct else guard (and combine with any enclosing guard) or avoid collecting hoistable statements from else-branches entirely.
| # Also check nested conditionals in else_body | |
| for child in node.else_body: | |
| _collect_rotates_from_node( | |
| child, rotate_keys, save_names, | |
| enclosing_guard_cond, remove_map, hoisted | |
| ) | |
| # Do not collect hoistable statements from else-branches: they | |
| # would be guarded by the negation of the condition, which is | |
| # not modeled here and could lead to incorrect hoisting. |
| def _collect_rotates_from_node(node, rotate_keys, save_names, | ||
| enclosing_guard_cond, remove_map, hoisted): | ||
| """ | ||
| Recursively walk an IR node tree to find carry rotate/save | ||
| assignments and collect them for hoisting. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| node : IR node | ||
| rotate_keys : set of (vc_lower, next_lower) tuples | ||
| save_names : set of vc_lower names | ||
| enclosing_guard_cond : expression or None |
There was a problem hiding this comment.
_collect_rotates_from_node takes a save_names parameter and its docstring mentions collecting rotate/save assignments, but save_names is never used and the function currently only matches _vc = _next rotates. Please either implement save hoisting as described, or remove the unused parameter and adjust the docstrings to match the actual behavior to avoid confusion and lint noise.
| 1. Identifies rotate/save assignments by matching carry_registry | ||
| entries (``_vc = _next`` for B_readback, ``_vc = <expr>`` for | ||
| Pattern A and stencil). | ||
| 2. Removes them from their current position inside guarded blocks. | ||
| 3. Re-appends them at the end of the merged loop body, inside the | ||
| same IF guard that surrounded them. | ||
|
|
||
| Only B_readback rotates (``_vc = _next``) and stencil/Pattern-A | ||
| saves are moved. | ||
|
|
||
| Parameters | ||
| ---------- |
There was a problem hiding this comment.
The _hoist_rotates_to_end docstring says it hoists both B_readback rotates and Pattern-A/stencil saves, but the current implementation only removes/hoists assignments matching rotate_keys (_vc = _next). Please update the docstring (and/or the save_names/"rotate/save" wording) so it reflects what is actually moved.
| 1. Identifies rotate/save assignments by matching carry_registry | |
| entries (``_vc = _next`` for B_readback, ``_vc = <expr>`` for | |
| Pattern A and stencil). | |
| 2. Removes them from their current position inside guarded blocks. | |
| 3. Re-appends them at the end of the merged loop body, inside the | |
| same IF guard that surrounded them. | |
| Only B_readback rotates (``_vc = _next``) and stencil/Pattern-A | |
| saves are moved. | |
| Parameters | |
| ---------- | |
| 1. Identifies rotate assignments by matching ``carry_registry`` | |
| entries for B_readback carries (``_vc = _next``). | |
| 2. Removes these assignments from their current position inside | |
| guarded blocks. | |
| 3. Re-appends them at the end of the merged loop body, inside the | |
| same IF guard that surrounded them. | |
| Only B_readback rotate assignments (``_vc = _next``) are moved. | |
| Parameters | |
| ---------- | |
| ---------- |
Add subscript-level array access analysis functions to analyse_dataflow: - _extract_offset: extract integer offset of subscript relative to loop var - classify_array_access_offsets: classify all array accesses by offset - array_loop_carried_dependencies: find true loop-carried dependencies - detect_vertical_carry_variables: detect scalar carries and shift registers - classify_multilevel_arrays: identify arrays with non-zero offsets Add SubstituteExpressionsSkipLHS to expr_visitors, a variant of SubstituteExpressions that leaves assignment LHS unchanged. These utilities support the vertical loop fusion and k-caching transformations.
…-caching Add a new source-to-source transformation that fuses all vertical loops in a Fortran routine into a single DO JK=1,KLEV+1 sweep with IF-guards, converts multi-level array dependencies (e.g. ZTP1, ZA, ZPFPLSX) to two-scalar carry variables (_vc/_next with explicit rotate), demotes KLEV-dimensioned temporaries to scalars, and removes dead initialization loops. This dramatically reduces GPU memory pressure for SCC-style column physics kernels. New files: - vertical_utils.py: 28 shared utility functions extracted for reuse - vertical_kcaching.py: SCCVerticalKCaching transformation class - tests/conftest.py: shared test helpers for vertical loop tests - tests/test_scc_vertical_kcaching.py: 10 tests covering all phases Modified files: - scc.py: add SCCSStackKCachingPipeline for the k-caching SCC pipeline - __init__.py: export vertical_kcaching module
…enames, and unit tests Replace string-based comparisons with Loki expression-level operations in vertical_utils.py helper functions (_is_klev_plus_n, _extract_plus_n, _is_jk_eq_1, _find_demotable_arrays shape checks) using simplify() and StrCompareMixin, with string fallbacks for robustness. Add _loop_upper_bound_expr() to return actual expression objects alongside the existing string-returning _loop_upper_bound_str(). Rename analyse_dataflow functions to be more general-purpose: - detect_vertical_carry_variables -> detect_loop_carry_variables - classify_multilevel_arrays -> classify_nonzero_offset_arrays Remove cloudsc-specific references from docstrings to reflect that these are general-purpose utilities. Fix import in test_scc_vertical_kcaching.py (vertical_complete -> vertical_utils). Add test_vertical_utils.py with 35 unit tests covering key utility functions: _is_klev_plus_n, _extract_plus_n, _is_jk_eq_1, _is_zero_literal, _build_bounds_guard, _loop_upper_bound_expr/_str, _collect_vertical_loops, _find_demotable_arrays, _merge_vertical_loops.
Extract _build_carry_expr_entries() and _build_save_assignment() helper functions from _convert_all_carries(), reducing per-pattern code duplication (4 near-identical expr_map loops and 3 identical save statement builders replaced with single-line helper calls). Add multi-loop safety check to _find_demotable_arrays(): arrays referenced in more than one vertical loop are no longer candidates for demotion, preventing data loss when loops cannot be fused. Remove redundant hasattr(var, 'dimensions') guard on sym.Array instances (the isinstance check already guarantees the attribute). Document alias limitations: _collect_vertical_loops and SCCVerticalKCaching.process_kernel match only primary index/size names, not Dimension.index_aliases or size_aliases. Minor cleanups: rename has_our_array -> has_array, remove stale TODO comment for already-implemented loop merge.
Replace cloudsc-specific variable names (ZPFPLSX, ZCOVPTOT, pfsqlf_next, etc.) with generic names in comments and docstring examples across k-caching modules. Reword TODO [K-CACHING] markers: convert mitigated limitation to NOTE, clarify remaining design debt. Simplify stale test comment about KLEV-dimensioned locals.
…den conditional merge - Remove unused parameters (dim_idx, carry_decl, expr_map) from _mark_jk1_conditionals and update its call site - Remove unused vertical_index parameter from _convert_all_carries, _cross_loop_carry_substitution, and _insert_writebacks_for_argument_carries along with their call sites and docstrings - Remove dead loop_var_name assignment in _collect_array_accesses and add explanatory comment about Assignment-only inspection scope - Fix no-op test assertion in test_array_loop_carried_dependencies_no_dep - Remove unused imports (Dimension, Assignment, fgen) from test_vertical_utils - Fix docstring: offset 0 substitution applies to all patterns except A and stencil - Guard else_body preservation in _merge_vertical_loops conditional wrapping
…e-case patterns Improve patch coverage by adding 6 new unit tests targeting previously uncovered code paths: - Stencil pattern carry conversion (vertical_utils.py ~60 lines) - B_readback argument writeback insertion (vertical_utils.py ~120 lines) - B_simple pattern with single carry variable (vertical_utils.py ~15 lines) - Conditional merge with else_body guard (vertical_utils.py ~8 lines) - ConditionalAssignment in SubstituteExpressionsSkipLHS (expr_visitors.py ~7 lines) - WAW output dependency detection (analyse_dataflow.py ~6 lines) Also remove the cloudsc.F90 integration test and associated fixtures from test_scc_vertical_kcaching.py, as a dedicated regression test already exists in dwarf-p-cloudsc.
…al disables - Remove unused 'main_loop' parameter from _find_demotable_arrays and update all call sites (W0613) - Remove useless bare return in _collect_rotates_from_node (R1711) - Disable cell-var-from-loop for intentional closures called within the same loop iteration in _convert_all_carries (W0640) - Disable import-outside-toplevel for deliberate lazy imports that avoid circular dependencies (C0415) - Disable too-many-lines at module level for vertical_utils.py (C0302) pylint score: 10.00/10 across all 11 PR files.
…ex dims, and writeback fix - Rename _extract_offset -> extract_offset (public API) and add to __all__ - Add _make_zero_literal() for type-aware init (INTEGER, LOGICAL, REAL) - Replace horizontal-index-aware dim logic with uniform RangeIndex for all non-vertical dims, making carry conversion independent of SCCDevector - Remove now-unused horizontal_index param from _convert_all_carries - Fix writeback rank mismatch: use horiz_var on both LHS and RHS - Add TODO comments for conditional else_body merge edge case, pragma placement after interchange, and collect_array_accesses scope
When the OMNI frontend resolves kind parameters like JPRB into selected_real_kind(...), _make_zero_literal() produced invalid Fortran such as 0.0_selected_real_kind(6, 37). Fortran requires literal kind parameters to be named constants or integer literals. Skip the kind when it is an InlineCall; a plain 0.0 is safe because Fortran performs implicit type conversion in assignment context.
…o use visitors and avoid stringifying Loki expression, further, added some tests
…re size is at least 1
39a8fcb to
2cb72a8
Compare
This pull request introduces several new utilities and tests for dataflow analysis and expression substitution in the codebase, as well as a new pipeline for vertical loop fusion and k-caching transformations in the single-column (SCC) transformation suite. The most notable changes are the addition of robust tests for array access and loop-carried dependency analysis, the implementation of a new expression visitor that skips substitutions on assignment LHS, and the integration of a new vertical k-caching pipeline.
Dataflow Analysis Enhancements and Tests:
test_analyse_dataflow.pyfor new dataflow analysis utilities, includingclassify_array_access_offsets,array_loop_carried_dependencies,detect_loop_carry_variables, andclassify_nonzero_offset_arrays, covering various Fortran loop and array access patterns.test_analyse_dataflow.pyto include the new analysis functions.Expression Substitution Improvements:
SubstituteExpressionsSkipLHS, a subclass ofSubstituteExpressions, which applies substitutions only to the right-hand side (RHS) of assignments, leaving the left-hand side (LHS) unchanged. This is essential for cases where substitutions should not affect assignment targets.SubstituteExpressionsSkipLHSintest_expr_visitors.py, ensuring correct behavior for both scalar and array assignments, and verifying that non-assignment nodes are still substituted.expr_visitors.pyandtest_expr_visitors.pyto exposeSubstituteExpressionsSkipLHS. [1] [2]Single-Column Transformation Pipelines:
SCCVerticalKCachingtransformation and exposed it in the SCC module's public API. [1] [2]SCCSStackKCachingPipeline, which applies vertical loop fusion and k-caching before the standard SCC sequential kernel transformations. This pipeline is now included in the list of available SCC pipelines. [1] [2]