Skip to content

Commit 9c26ea2

Browse files
committed
Use uneven mapping in temporal reuse test
1 parent 6931b59 commit 9c26ea2

2 files changed

Lines changed: 23 additions & 11 deletions

File tree

tests/input_files/temporal_reuse_minimal.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,20 @@
1+
# Uneven mapping: GlobalBuffer holds T1 above the m loop (output
2+
# accumulation) and W0 below it (weight reuse). This is the pattern
3+
# used in fused_matmuls_to_simple.yaml and eyeriss-style architectures
4+
# where different tensors are pegged to the same buffer at different
5+
# loop-nest levels.
6+
#
7+
# Because T1 already claims the above-m slot at GlobalBuffer, W0 is
8+
# forced below the m loop — even though m is irrelevant to W0.
9+
# Temporal reuse must suppress the redundant parent fills of W0.
110
mapping:
211
nodes:
312
- !Storage
413
tensors: [W0, T0, T1]
514
component: MainMemory
15+
- !Storage
16+
tensors: [T1]
17+
component: GlobalBuffer
618
- !Temporal
719
rank_variable: m
820
tile_shape: 1

tests/test_temporal_reuse_minimal.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,24 @@
88
Workload: Single matmul T1[m,n1] = T0[m,n0] * W0[n0,n1] (M=4, KN=4)
99
bits_per_value = 8
1010
11-
Mapping:
11+
Mapping (uneven — two Storage nodes for GlobalBuffer):
1212
Storage [W0, T0, T1] @ MainMemory
13+
Storage [T1] @ GlobalBuffer ← T1 pegged above m (output accumulation)
1314
Temporal m=1 ← m is IRRELEVANT to W0[n0,n1]
14-
Storage [W0] @ GlobalBuffer ← W0 lives here, below the m loop
15+
Storage [W0] @ GlobalBuffer ← W0 pegged below m (weight reuse)
1516
Temporal n0=1
1617
Temporal n1=1
1718
Compute Matmul0 @ MAC
1819
19-
The m loop sits above GlobalBuffer[W0], but m does not appear in W0's
20-
dimensions [n0, n1]. The model should recognize this and fill W0 only
21-
ONCE rather than once per m iteration.
20+
T1 (the output) depends on m and must accumulate across the inner
21+
loops, so it is stored at GlobalBuffer above the m loop. W0 (the
22+
weight matrix) does not depend on m but is forced below the m loop
23+
because T1 already claims the above-m slot at GlobalBuffer. This is
24+
the same split-storage pattern used in fused_matmuls_to_simple.yaml
25+
and eyeriss-style architectures.
2226
23-
Note: in this simple example, reordering W0 above the m loop would
24-
avoid the issue entirely. In real architectures (e.g. eyeriss), the
25-
mapper may place a tensor below an irrelevant loop because the overall
26-
mapping is globally optimal across all tensors and buffer capacities.
27-
This test validates the model's temporal reuse computation for such
28-
mappings.
27+
The model should recognize that m is irrelevant to W0 and fill W0 only
28+
ONCE rather than once per m iteration.
2929
3030
Action counts are in bits (elements * bits_per_value).
3131
W0 shape = [n0, n1] = [4, 4] = 16 elements = 128 bits.

0 commit comments

Comments
 (0)