Skip to content

Vertical k-caching for SCC pipeline(s) and beyond#661

Open
MichaelSt98 wants to merge 14 commits into
mainfrom
nams-vertical-k-caching
Open

Vertical k-caching for SCC pipeline(s) and beyond#661
MichaelSt98 wants to merge 14 commits into
mainfrom
nams-vertical-k-caching

Conversation

@MichaelSt98
Copy link
Copy Markdown
Collaborator

This pull request introduces several new utilities and tests for dataflow analysis and expression substitution in the codebase, as well as a new pipeline for vertical loop fusion and k-caching transformations in the single-column (SCC) transformation suite. The most notable changes are the addition of robust tests for array access and loop-carried dependency analysis, the implementation of a new expression visitor that skips substitutions on assignment LHS, and the integration of a new vertical k-caching pipeline.

Dataflow Analysis Enhancements and Tests:

  • Added comprehensive tests to test_analyse_dataflow.py for new dataflow analysis utilities, including classify_array_access_offsets, array_loop_carried_dependencies, detect_loop_carry_variables, and classify_nonzero_offset_arrays, covering various Fortran loop and array access patterns.
  • Updated imports in test_analyse_dataflow.py to include the new analysis functions.

Expression Substitution Improvements:

  • Introduced SubstituteExpressionsSkipLHS, a subclass of SubstituteExpressions, which applies substitutions only to the right-hand side (RHS) of assignments, leaving the left-hand side (LHS) unchanged. This is essential for cases where substitutions should not affect assignment targets.
  • Added tests for SubstituteExpressionsSkipLHS in test_expr_visitors.py, ensuring correct behavior for both scalar and array assignments, and verifying that non-assignment nodes are still substituted.
  • Updated import lists in expr_visitors.py and test_expr_visitors.py to expose SubstituteExpressionsSkipLHS. [1] [2]

Single-Column Transformation Pipelines:

  • Integrated the new SCCVerticalKCaching transformation and exposed it in the SCC module's public API. [1] [2]
  • Added a new pipeline, SCCSStackKCachingPipeline, which applies vertical loop fusion and k-caching before the standard SCC sequential kernel transformations. This pipeline is now included in the list of available SCC pipelines. [1] [2]

@github-actions
Copy link
Copy Markdown

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/661/index.html

@MichaelSt98
Copy link
Copy Markdown
Collaborator Author

Corresponding CLOUDSC branch and PR: PR #128

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Loki’s analysis and SCC transformation toolkit by adding new dataflow utilities, a new expression substitution visitor that skips assignment LHS, and a new SCC pipeline that performs vertical loop fusion plus k-caching (carry conversion + demotion) before the standard sequential SCC passes.

Changes:

  • Added subscript-offset–aware dataflow utilities (classify_array_access_offsets, array_loop_carried_dependencies, etc.) plus extensive tests.
  • Introduced SubstituteExpressionsSkipLHS and tests to ensure substitutions apply to assignment RHS only.
  • Implemented SCCVerticalKCaching (and a new SCC pipeline entry) for vertical loop merge + carry conversion + demotion, with dedicated unit and integration tests.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
loki/analyse/analyse_dataflow.py Adds new offset-aware array access and dependency analyses used by k-caching.
loki/analyse/tests/test_analyse_dataflow.py Adds tests for the new analysis utilities.
loki/ir/expr_visitors.py Adds SubstituteExpressionsSkipLHS visitor.
loki/ir/tests/test_expr_visitors.py Adds tests validating skip-LHS substitution behavior.
loki/transformations/single_column/vertical_utils.py Adds shared utilities for vertical loop fusion, carry conversion, demotion prep, and cleanup.
loki/transformations/single_column/vertical_kcaching.py Implements the new SCCVerticalKCaching transformation orchestration.
loki/transformations/single_column/scc.py Registers a new SCC pipeline SCCSStackKCachingPipeline.
loki/transformations/single_column/__init__.py Exposes the new vertical k-caching transformation via package exports.
loki/transformations/single_column/tests/test_vertical_utils.py Adds unit tests for vertical utility helpers.
loki/transformations/single_column/tests/test_scc_vertical_kcaching.py Adds unit and integration tests for vertical k-caching behavior.
loki/transformations/single_column/tests/conftest.py Adds shared helpers for counting/finding JK loops in SCC tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread loki/transformations/single_column/vertical_utils.py Outdated
Comment thread loki/transformations/single_column/vertical_utils.py Outdated
Comment thread loki/transformations/single_column/vertical_utils.py Outdated
Comment thread loki/analyse/analyse_dataflow.py Outdated
Comment thread loki/analyse/tests/test_analyse_dataflow.py Outdated
Comment thread loki/transformations/single_column/vertical_utils.py Outdated
Comment thread loki/transformations/single_column/vertical_utils.py
Comment thread loki/transformations/single_column/vertical_utils.py Outdated
Comment thread loki/analyse/analyse_dataflow.py
Comment thread loki/transformations/single_column/tests/test_vertical_utils.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 94.22283% with 126 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.31%. Comparing base (b4258ce) to head (88b0a6b).
⚠️ Report is 28 commits behind head on main.

Files with missing lines Patch % Lines
...ki/transformations/single_column/vertical_utils.py 90.64% 100 Missing ⚠️
loki/analyse/analyse_dataflow.py 87.23% 18 Missing ⚠️
...transformations/single_column/vertical_kcaching.py 96.31% 6 Missing ⚠️
loki/ir/expr_visitors.py 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #661      +/-   ##
==========================================
- Coverage   96.41%   96.31%   -0.10%     
==========================================
  Files         267      272       +5     
  Lines       47020    49197    +2177     
==========================================
+ Hits        45332    47383    +2051     
- Misses       1688     1814     +126     
Flag Coverage Δ
lint_rules 96.40% <ø> (ø)
loki 96.30% <94.22%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MichaelSt98 added a commit that referenced this pull request Mar 30, 2026
…den conditional merge

- Remove unused parameters (dim_idx, carry_decl, expr_map) from
  _mark_jk1_conditionals and update its call site
- Remove unused vertical_index parameter from _convert_all_carries,
  _cross_loop_carry_substitution, and _insert_writebacks_for_argument_carries
  along with their call sites and docstrings
- Remove dead loop_var_name assignment in _collect_array_accesses and add
  explanatory comment about Assignment-only inspection scope
- Fix no-op test assertion in test_array_loop_carried_dependencies_no_dep
- Remove unused imports (Dimension, Assignment, fgen) from test_vertical_utils
- Fix docstring: offset 0 substitution applies to all patterns except A and stencil
- Guard else_body preservation in _merge_vertical_loops conditional wrapping
@MichaelSt98 MichaelSt98 requested a review from Copilot March 30, 2026 10:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread loki/transformations/single_column/vertical_utils.py
Comment thread loki/transformations/single_column/vertical_utils.py
Comment on lines +212 to +225
# Collect any pragmas/comments from the outer loop body
# (e.g., !DIR$ IVDEP) and attach them to the new inner loop.
non_loop_items = tuple(
n for n in outer_loop.body
if isinstance(n, (ir.Comment, ir.Pragma))
)

new_inner = inner_loop.clone(
variable=outer_loop.variable,
bounds=outer_loop.bounds,
pragma=inner_loop.pragma,
pragma_post=inner_loop.pragma_post,
body=non_loop_items + inner_loop.body,
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During auto-interchange, pragmas/comments that were between the outer loop header and the inner vertical loop are moved into new_inner.body (body=non_loop_items + inner_loop.body). For directive-style pragmas that apply to the following loop, this changes which loop they annotate (and can effectively detach them from any loop). Consider keeping these non_loop_items as siblings before new_inner in new_outer.body (or attaching them to new_outer) so their association with the interchanged vertical loop is preserved.

Copilot uses AI. Check for mistakes.
Comment thread loki/transformations/single_column/vertical_utils.py Outdated
Comment thread .github/workflows/regression_tests.yml
Comment on lines +1228 to +1230
body_tuple = as_tuple(new_body)
body_tuple = body_tuple + tuple(save_stmts) + tuple(rotate_stmts)

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_stmts and rotate_stmts are appended after new_body at the vertical-loop level. If the original JK loop contains an inner horizontal loop (e.g. DO JL), the generated save/rotate statements can end up outside that horizontal loop while still referencing JL via _init_rotate_dims()/_out_of_loop_dim(). That compiles but uses the post-loop JL value (typically nlon+1), causing out-of-bounds or incorrect carry updates. Consider inserting save/rotate statements inside the horizontal loop(s), or emitting vector-section assignments with : for the horizontal dimension (or generating an explicit horizontal loop) so no inner-loop indices are used out of scope.

Suggested change
body_tuple = as_tuple(new_body)
body_tuple = body_tuple + tuple(save_stmts) + tuple(rotate_stmts)
#
# Place these inside innermost inner loops (e.g. horizontal loops)
# when present, to avoid referencing out-of-scope inner-loop indices
# at the vertical-loop level.
extra_stmts = tuple(save_stmts) + tuple(rotate_stmts)
if extra_stmts:
# Find innermost loops within the vertical loop body that are not
# the vertical loop itself. These are typically horizontal loops.
body_loops = FindNodes(ir.Loop).visit(as_tuple(new_body))
inner_loops = []
for inner_loop in body_loops:
# Skip the vertical loop itself
if inner_loop is loop:
continue
# Select only innermost loops (no nested loops in their body)
nested = FindNodes(ir.Loop).visit(as_tuple(inner_loop.body))
if not nested:
inner_loops.append(inner_loop)
if inner_loops:
# Clone each innermost inner loop with save/rotate statements
# appended to its body.
loop_map = {}
for inner_loop in inner_loops:
inner_body = as_tuple(inner_loop.body) + extra_stmts
loop_map[inner_loop] = inner_loop.clone(body=inner_body)
new_body = Transformer(loop_map).visit(new_body)
body_tuple = as_tuple(new_body)
else:
# No inner loops found: fall back to vertical-level append.
body_tuple = as_tuple(new_body) + extra_stmts
else:
# No extra statements to append.
body_tuple = as_tuple(new_body)

Copilot uses AI. Check for mistakes.
Comment on lines +1168 to +1170
# Rotate: carry = next at end of body
rot_lhs = carry_decl.clone(dimensions=_init_rotate_dims())
rot_rhs = next_decl.clone(dimensions=_init_rotate_dims())
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For B_readback rotates, _init_rotate_dims() may preserve the horizontal index (e.g. JL). Since rotate statements are appended after the main loop body, this can reference JL outside its DO JL loop (see also the save/rotate placement issue). The rotate should either be placed inside the horizontal loop or use a dimension expression that is valid at the rotate statement’s insertion point (e.g. : for horizontal sections, or an explicit horizontal loop).

Suggested change
# Rotate: carry = next at end of body
rot_lhs = carry_decl.clone(dimensions=_init_rotate_dims())
rot_rhs = next_decl.clone(dimensions=_init_rotate_dims())
# Rotate: carry = next at end of body. We must ensure that any
# horizontal dimensions do not reference loop indices (e.g. JL)
# that are out of scope at the insertion point of the rotate
# statement. To do this, we replace non-vertical dimensions with
# full-section ranges (:) while preserving the vertical dimension.
base_dims = _init_rotate_dims()
rotate_dims = tuple(
sym.RangeIndex((None, None)) if i in actual_non_vert_dims else d
for i, d in enumerate(base_dims)
)
rot_lhs = carry_decl.clone(dimensions=rotate_dims)
rot_rhs = next_decl.clone(dimensions=rotate_dims)

Copilot uses AI. Check for mistakes.
Comment on lines +2224 to +2229
wb_lhs = orig_decl.clone(dimensions=tuple(wb_orig_dims))
wb_rhs = next_decl.clone(
dimensions=tuple(wb_next_dims) if wb_next_dims else None
)
wb_stmt = ir.Assignment(lhs=wb_lhs, rhs=wb_rhs)

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write-back statement can become a scalar LHS (e.g. pflux(jl, jk+1)) while the RHS is built with RangeIndex(:) for the horizontal dimension (e.g. pflux_next(:)), which is a rank mismatch and changes semantics. If you index the LHS with the horizontal variable, the RHS should be indexed consistently (e.g. pflux_next(jl)), or alternatively emit a full horizontal section on the LHS (pflux(:, jk+1) = pflux_next(:)).

Copilot uses AI. Check for mistakes.
Comment thread loki/transformations/single_column/vertical_utils.py
@MichaelSt98
Copy link
Copy Markdown
Collaborator Author

Seems like, if I haven't misunderstood it, a problem in the coverage reporting.

Codecov Coverage Artifact with pytest-xdist

Problem

Our CI runs tests with:

pytest -v -n 4 --cov=./loki --cov-report=xml --pyargs loki

The -n 4 flag uses pytest-xdist to run tests across 4 worker processes.
This causes module-level code (imports, __all__, def header lines) to
appear as uncovered in Codecov.

Root cause: The loki package uses eager from X import * chains across
__init__.py files:

from loki import Subroutine            # every test file does this
  → loki/__init__.py
      from loki.transformations import *
    → transformations/__init__.py
        from ...single_column import *
      → single_column/__init__.py
          from .scc import *           → scc.py imports vertical_kcaching
          from .vertical_kcaching import *  → imports vertical_utils

All module-level code executes during test collection in the main process,
before pytest-xdist spawns workers and before pytest-cov starts tracing in
those workers. Function bodies called during tests are traced correctly, but
module-level lines are reported as "uncovered."

Impact: This affects every module in loki, not just the new k-caching
files. For the new files specifically:

File Reported Estimated actual
vertical_kcaching.py 87.87% ~95%+
vertical_utils.py 83.36% ~90%+
PR patch coverage 88.73% ~93%+

This is not fixable via import changes

The eager from X import * cascade is deeply rooted in loki's package
structure. Even removing specific __init__.py re-exports doesn't help —
transitive imports through scc.py and other modules still trigger the full
chain. Fixing it at the import level would require converting the entire package
to lazy imports (__getattr__-based), which is a massive refactor.

Potential Fixes

All fixes are project-wide (not scoped to individual files).

A. Drop -n 4 for the coverage job

Change the CI pytest invocation to run serially:

pytest -v --cov=./loki --cov-report=xml --pyargs loki
  • Pros: Simplest change, guaranteed to work
  • Cons: Slower CI (though the test suite runs in ~2s locally)

B. Two CI jobs

Split into a fast-feedback job and a coverage job:

# Job 1: Fast feedback (parallel, no coverage)
- run: pytest -v -n 4 --pyargs loki

# Job 2: Accurate coverage (serial)
- run: pytest -v --cov=./loki --cov-report=xml --pyargs loki
  • Pros: Fast feedback + accurate coverage
  • Cons: More CI config complexity

C. Coverage concurrency config

Add to pyproject.toml:

[tool.coverage.run]
concurrency = ["multiprocessing"]
  • Pros: Minimal change, keeps -n 4
  • Cons: May not fully resolve the issue depending on import timing vs
    tracer activation

D. COVERAGE_PROCESS_START early tracing

Use coverage.py's startup hook to begin tracing before any imports occur.
This requires:

  1. A [tool.coverage.run] section in pyproject.toml:

    [tool.coverage.run]
    source = ["loki"]
    parallel = true
    concurrency = ["multiprocessing"]
  2. CI steps:

    - run: python -m coverage install    # installs .pth startup hook
    - env:
        COVERAGE_PROCESS_START: pyproject.toml
      run: |
        pytest -v -n 4 --pyargs loki
        coverage combine                  # merge per-process data
        coverage xml                      # generate report
  • Pros: Most robust, keeps -n 4, guaranteed accurate
  • Cons: Most complex CI setup

Recommendation

Approach B gives the best of both worlds — fast parallel testing for
developer feedback plus accurate serial coverage for Codecov. Approach A is
the simplest if CI speed is not a concern.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +701 to +704
# Compute dim - loop_var and simplify; if the result is a compile-time
# constant, the offset is that constant value.
try:
diff = simplify(dim - loop_var)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_offset documents loop_var as "Scalar or str", but the implementation does simplify(dim - loop_var). If callers pass a string (as the docstring allows), this will raise a TypeError. Consider normalizing loop_var to a Loki symbol (e.g. sym.Variable(name=loop_var_name)) for the arithmetic/simplify step, or restrict the accepted type in the docstring and enforce it.

Suggested change
# Compute dim - loop_var and simplify; if the result is a compile-time
# constant, the offset is that constant value.
try:
diff = simplify(dim - loop_var)
# Normalise loop_var to a Loki expression suitable for arithmetic.
if isinstance(loop_var, (sym.Scalar, sym.DeferredTypeSymbol)):
loop_var_expr = loop_var
elif isinstance(loop_var, str):
loop_var_expr = sym.Variable(name=loop_var)
elif hasattr(loop_var, 'name'):
loop_var_expr = sym.Variable(name=loop_var.name)
else:
# Unsupported loop_var type for symbolic arithmetic
return None
# Compute dim - loop_var and simplify; if the result is a compile-time
# constant, the offset is that constant value.
try:
diff = simplify(dim - loop_var_expr)

Copilot uses AI. Check for mistakes.
Comment on lines +1856 to +1863
if isinstance(node, ir.Conditional):
# This is a guarded block — recurse into body with this guard
guard_cond = node.condition
for child in node.body:
_collect_rotates_from_node(
child, rotate_keys, save_names,
guard_cond, remove_map, hoisted
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_collect_rotates_from_node overwrites enclosing_guard_cond with node.condition for every nested Conditional. In cases where vertical loops were originally wrapped in a conditional, this drops the outer guard when descending into the inner (JK-bounds) guard, so hoisted rotates can be re-inserted under an incomplete condition and change semantics. Consider accumulating guards (e.g. enclosing AND node.condition) or only treating the JK-bounds guard as the hoist guard.

Copilot uses AI. Check for mistakes.
Comment on lines +1864 to +1869
# Also check nested conditionals in else_body
for child in node.else_body:
_collect_rotates_from_node(
child, rotate_keys, save_names,
enclosing_guard_cond, remove_map, hoisted
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_collect_rotates_from_node traverses else_body using enclosing_guard_cond, but it does not represent the else-guard (i.e., NOT node.condition). If a rotate assignment ever appears in an else-branch, it could be hoisted and re-inserted under the wrong guard. Either compute the correct else guard (and combine with any enclosing guard) or avoid collecting hoistable statements from else-branches entirely.

Suggested change
# Also check nested conditionals in else_body
for child in node.else_body:
_collect_rotates_from_node(
child, rotate_keys, save_names,
enclosing_guard_cond, remove_map, hoisted
)
# Do not collect hoistable statements from else-branches: they
# would be guarded by the negation of the condition, which is
# not modeled here and could lead to incorrect hoisting.

Copilot uses AI. Check for mistakes.
Comment on lines +1838 to +1849
def _collect_rotates_from_node(node, rotate_keys, save_names,
enclosing_guard_cond, remove_map, hoisted):
"""
Recursively walk an IR node tree to find carry rotate/save
assignments and collect them for hoisting.

Parameters
----------
node : IR node
rotate_keys : set of (vc_lower, next_lower) tuples
save_names : set of vc_lower names
enclosing_guard_cond : expression or None
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_collect_rotates_from_node takes a save_names parameter and its docstring mentions collecting rotate/save assignments, but save_names is never used and the function currently only matches _vc = _next rotates. Please either implement save hoisting as described, or remove the unused parameter and adjust the docstrings to match the actual behavior to avoid confusion and lint noise.

Copilot uses AI. Check for mistakes.
Comment on lines +1912 to +1923
1. Identifies rotate/save assignments by matching carry_registry
entries (``_vc = _next`` for B_readback, ``_vc = <expr>`` for
Pattern A and stencil).
2. Removes them from their current position inside guarded blocks.
3. Re-appends them at the end of the merged loop body, inside the
same IF guard that surrounded them.

Only B_readback rotates (``_vc = _next``) and stencil/Pattern-A
saves are moved.

Parameters
----------
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _hoist_rotates_to_end docstring says it hoists both B_readback rotates and Pattern-A/stencil saves, but the current implementation only removes/hoists assignments matching rotate_keys (_vc = _next). Please update the docstring (and/or the save_names/"rotate/save" wording) so it reflects what is actually moved.

Suggested change
1. Identifies rotate/save assignments by matching carry_registry
entries (``_vc = _next`` for B_readback, ``_vc = <expr>`` for
Pattern A and stencil).
2. Removes them from their current position inside guarded blocks.
3. Re-appends them at the end of the merged loop body, inside the
same IF guard that surrounded them.
Only B_readback rotates (``_vc = _next``) and stencil/Pattern-A
saves are moved.
Parameters
----------
1. Identifies rotate assignments by matching ``carry_registry``
entries for B_readback carries (``_vc = _next``).
2. Removes these assignments from their current position inside
guarded blocks.
3. Re-appends them at the end of the merged loop body, inside the
same IF guard that surrounded them.
Only B_readback rotate assignments (``_vc = _next``) are moved.
Parameters
----------
----------

Copilot uses AI. Check for mistakes.
Add subscript-level array access analysis functions to analyse_dataflow:
- _extract_offset: extract integer offset of subscript relative to loop var
- classify_array_access_offsets: classify all array accesses by offset
- array_loop_carried_dependencies: find true loop-carried dependencies
- detect_vertical_carry_variables: detect scalar carries and shift registers
- classify_multilevel_arrays: identify arrays with non-zero offsets

Add SubstituteExpressionsSkipLHS to expr_visitors, a variant of
SubstituteExpressions that leaves assignment LHS unchanged.

These utilities support the vertical loop fusion and k-caching
transformations.
…-caching

Add a new source-to-source transformation that fuses all vertical loops
in a Fortran routine into a single DO JK=1,KLEV+1 sweep with IF-guards,
converts multi-level array dependencies (e.g. ZTP1, ZA, ZPFPLSX) to
two-scalar carry variables (_vc/_next with explicit rotate), demotes
KLEV-dimensioned temporaries to scalars, and removes dead initialization
loops. This dramatically reduces GPU memory pressure for SCC-style
column physics kernels.

New files:
- vertical_utils.py: 28 shared utility functions extracted for reuse
- vertical_kcaching.py: SCCVerticalKCaching transformation class
- tests/conftest.py: shared test helpers for vertical loop tests
- tests/test_scc_vertical_kcaching.py: 10 tests covering all phases

Modified files:
- scc.py: add SCCSStackKCachingPipeline for the k-caching SCC pipeline
- __init__.py: export vertical_kcaching module
…enames, and unit tests

Replace string-based comparisons with Loki expression-level operations
in vertical_utils.py helper functions (_is_klev_plus_n, _extract_plus_n,
_is_jk_eq_1, _find_demotable_arrays shape checks) using simplify() and
StrCompareMixin, with string fallbacks for robustness.

Add _loop_upper_bound_expr() to return actual expression objects
alongside the existing string-returning _loop_upper_bound_str().

Rename analyse_dataflow functions to be more general-purpose:
- detect_vertical_carry_variables -> detect_loop_carry_variables
- classify_multilevel_arrays -> classify_nonzero_offset_arrays

Remove cloudsc-specific references from docstrings to reflect that
these are general-purpose utilities.

Fix import in test_scc_vertical_kcaching.py (vertical_complete ->
vertical_utils).

Add test_vertical_utils.py with 35 unit tests covering key utility
functions: _is_klev_plus_n, _extract_plus_n, _is_jk_eq_1,
_is_zero_literal, _build_bounds_guard, _loop_upper_bound_expr/_str,
_collect_vertical_loops, _find_demotable_arrays, _merge_vertical_loops.
Extract _build_carry_expr_entries() and _build_save_assignment() helper
functions from _convert_all_carries(), reducing per-pattern code
duplication (4 near-identical expr_map loops and 3 identical save
statement builders replaced with single-line helper calls).

Add multi-loop safety check to _find_demotable_arrays(): arrays
referenced in more than one vertical loop are no longer candidates
for demotion, preventing data loss when loops cannot be fused.

Remove redundant hasattr(var, 'dimensions') guard on sym.Array
instances (the isinstance check already guarantees the attribute).

Document alias limitations: _collect_vertical_loops and
SCCVerticalKCaching.process_kernel match only primary index/size
names, not Dimension.index_aliases or size_aliases.

Minor cleanups: rename has_our_array -> has_array, remove stale TODO
comment for already-implemented loop merge.
Replace cloudsc-specific variable names (ZPFPLSX, ZCOVPTOT, pfsqlf_next, etc.) with generic names in comments and docstring examples across k-caching modules. Reword TODO [K-CACHING] markers: convert mitigated limitation to NOTE, clarify remaining design debt. Simplify stale test comment about KLEV-dimensioned locals.
…den conditional merge

- Remove unused parameters (dim_idx, carry_decl, expr_map) from
  _mark_jk1_conditionals and update its call site
- Remove unused vertical_index parameter from _convert_all_carries,
  _cross_loop_carry_substitution, and _insert_writebacks_for_argument_carries
  along with their call sites and docstrings
- Remove dead loop_var_name assignment in _collect_array_accesses and add
  explanatory comment about Assignment-only inspection scope
- Fix no-op test assertion in test_array_loop_carried_dependencies_no_dep
- Remove unused imports (Dimension, Assignment, fgen) from test_vertical_utils
- Fix docstring: offset 0 substitution applies to all patterns except A and stencil
- Guard else_body preservation in _merge_vertical_loops conditional wrapping
…e-case patterns

Improve patch coverage by adding 6 new unit tests targeting previously
uncovered code paths:
- Stencil pattern carry conversion (vertical_utils.py ~60 lines)
- B_readback argument writeback insertion (vertical_utils.py ~120 lines)
- B_simple pattern with single carry variable (vertical_utils.py ~15 lines)
- Conditional merge with else_body guard (vertical_utils.py ~8 lines)
- ConditionalAssignment in SubstituteExpressionsSkipLHS (expr_visitors.py ~7 lines)
- WAW output dependency detection (analyse_dataflow.py ~6 lines)

Also remove the cloudsc.F90 integration test and associated fixtures
from test_scc_vertical_kcaching.py, as a dedicated regression test
already exists in dwarf-p-cloudsc.
…al disables

- Remove unused 'main_loop' parameter from _find_demotable_arrays and
  update all call sites (W0613)
- Remove useless bare return in _collect_rotates_from_node (R1711)
- Disable cell-var-from-loop for intentional closures called within the
  same loop iteration in _convert_all_carries (W0640)
- Disable import-outside-toplevel for deliberate lazy imports that avoid
  circular dependencies (C0415)
- Disable too-many-lines at module level for vertical_utils.py (C0302)

pylint score: 10.00/10 across all 11 PR files.
…ex dims, and writeback fix

- Rename _extract_offset -> extract_offset (public API) and add to __all__
- Add _make_zero_literal() for type-aware init (INTEGER, LOGICAL, REAL)
- Replace horizontal-index-aware dim logic with uniform RangeIndex for all
  non-vertical dims, making carry conversion independent of SCCDevector
- Remove now-unused horizontal_index param from _convert_all_carries
- Fix writeback rank mismatch: use horiz_var on both LHS and RHS
- Add TODO comments for conditional else_body merge edge case,
  pragma placement after interchange, and collect_array_accesses scope
When the OMNI frontend resolves kind parameters like JPRB into
selected_real_kind(...), _make_zero_literal() produced invalid Fortran
such as 0.0_selected_real_kind(6, 37). Fortran requires literal kind
parameters to be named constants or integer literals.

Skip the kind when it is an InlineCall; a plain 0.0 is safe because
Fortran performs implicit type conversion in assignment context.
…o use visitors and avoid stringifying Loki expression, further, added some tests
@MichaelSt98 MichaelSt98 force-pushed the nams-vertical-k-caching branch from 39a8fcb to 2cb72a8 Compare April 21, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants