Skip to content

Commit 1a87a41

Browse files
authored
feat(as_of): set_input_sparse + transition_formula for sparse forward simulation (#1368)
2 parents 81cb3ab + 473ca77 commit 1a87a41

15 files changed

Lines changed: 1640 additions & 135 deletions

File tree

CHANGELOG.md

Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,47 @@
11
# Changelog
22

3-
## 44.5.0
3+
## 44.5.0 [#1368](https://github.com/openfisca/openfisca-core/pull/1368)
44

55
#### New features
66

7-
- Add `as_of` attribute to `Variable` for persistent vector variables.
8-
- A variable declared with `as_of = True` (or `"start"` / `"end"`) stores its value at a given instant and automatically returns that value for any later period, until a new value is explicitly set — the vectorial analogue of OpenFisca parameters.
9-
- `as_of = "start"` (default when `True`): lookup uses the start of the requested period as reference instant.
10-
- `as_of = "end"`: lookup uses the end of the requested period (useful for annual variables like income tax).
11-
- Lookup is O(log P) via `bisect` on a sorted instants index.
12-
- Reference sharing: when consecutive stored values are identical, the same array object is reused, reducing memory usage for stable variables (e.g. `marital_status`, `housing_occupancy_status`).
13-
- Stored arrays are read-only (`writeable=False`) to prevent accidental in-place mutation.
14-
- Combining `as_of` with `set_input` dispatch helpers raises a `ValueError` at variable definition time.
7+
- Add `transition_formula` to `Variable` for formula-driven `as_of` forward simulation.
8+
- A variable with `transition_formula` computes sparse updates instead of full arrays: the formula returns `(selector, values)` where `selector` is a boolean mask or index array, and `values` is the new values for the selected individuals.
9+
- Each call to `get_array` at a new period triggers the transition formula once (guarded by `_as_of_transition_computed`), applies the sparse diff via `set_input_sparse`, and caches the result.
10+
- `set_input_sparse` is also exposed as a public method on `Holder` for callers that want to apply sparse patches directly.
11+
12+
- Add `initial_formula` to `Variable` for seeding `as_of` variables without a prior `set_input`.
13+
- When a `transition_formula` needs to read the variable at `period - 1` but no base snapshot exists, OpenFisca now calls `initial_formula` instead of raising an error.
14+
- `initial_formula` follows the same date-dispatch convention as regular formulas (`initial_formula_YYYY`, `initial_formula_YYYY_MM`, etc.).
15+
- Requires `as_of = True` on the same variable; a `ValueError` is raised at definition time otherwise.
16+
17+
- Add multi-snapshot LRU cache to `as_of` variable holders.
18+
- Replaces the previous single-entry snapshot cursor with an `OrderedDict`-based LRU cache keeping the K most-recently-used reconstructed snapshots.
19+
- Cache size defaults to 3 and is configurable per variable (`Variable.snapshot_count`) or globally (`MemoryConfig.asof_max_snapshots`), with variable-level taking priority.
20+
- Retroactive `set_input` (out-of-order writes) evicts all cached snapshots at or after the written instant to preserve correctness.
21+
22+
- Add `formula_type` field to `TraceNode` for `as_of` formula visibility.
23+
- When `transition_formula` or `initial_formula` runs, the tracer records `formula_type = "transition"` or `formula_type = "initial"` on the corresponding trace node.
24+
25+
- Add `show_formula_type` option to `computation_log`.
26+
- `simulation.tracer.computation_log.print_log(show_formula_type=True)` appends `[transition]` or `[initial]` tags to the relevant lines, making it easy to see which `as_of` formula ran during a simulation.
27+
28+
#### Bug fixes
29+
30+
- Fix false `SpiralError` when a `transition_formula` reads its own variable at the previous period.
31+
- The existing spiral detector raised `SpiralError` immediately when the same variable appeared in the call stack at any different period, which always triggers for temporal recursion (`V@P``V@P-1``V@P-2`).
32+
- Fix: in `_calculate_transition`, the cycle check is replaced by `_check_for_strict_cycle`, which only raises `CycleError` for the exact same `(variable, period)` pair. Termination is guaranteed by `_as_of_transition_computed`.
33+
34+
## 44.4.1
35+
36+
#### Performance improvements
37+
38+
- Fix quadratic reconstruction cost in `as_of` forward simulations.
39+
- In the typical GET(M-1) → compute → SET(M) monthly loop, `_set_as_of` was unconditionally clearing the snapshot cursor after each write, forcing the next `get_array(M)` to reconstruct from the base through all M patches — O(N + M·k) per step, quadratic overall.
40+
- Root cause: `_reconstruct_at` advanced the snapshot to `instant` during the internal diff computation, so the invalidation guard `snapshot[0] >= instant` triggered on equality even for strictly forward writes.
41+
- Fix: when the new patch is appended at the end of the list (forward-sequential SET), the snapshot is updated to the new state instead of being discarded. Retroactive (out-of-order) writes still invalidate the snapshot correctly.
42+
- Benchmark (N=1M, forward simulation): 1 yr / 10% change ×1.4, 5 yr / 10% ×4.1, 5 yr / 30% ×5.4.
43+
44+
## 44.4.0 [#1366](https://github.com/openfisca/openfisca-core/pull/1366)
1545

1646
#### Performance improvements
1747

@@ -23,10 +53,6 @@
2353
- No change to the public API (`set_input`, `get_array`, `Variable.as_of`).
2454
- Fix quadratic reconstruction cost in `as_of` forward simulations: when the new patch is appended at the end (forward-sequential SET), the snapshot is updated instead of discarded so the next GET does not reconstruct from base through all patches; retroactive writes still invalidate correctly.
2555

26-
#### Technical changes
27-
28-
- Lint: black and flake8 fixes in `tests/core/test_asof_variable.py` and `benchmarks/test_bench_asof.py`.
29-
3056
## 44.4.0 [#1364](https://github.com/openfisca/openfisca-core/pull/1364)
3157

3258
#### New features

benchmarks/test_bench_asof.py

Lines changed: 82 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,9 +129,11 @@ def _holder(self):
129129
_populate(self.holder, self.N_PATCHES, self.CHANGE_RATE, rng)
130130
# Build the list of period strings that were stored
131131
self.periods = ["2020-01"]
132-
for p in range(1, self.N_PATCHES + 1):
132+
for _p in range(1, self.N_PATCHES + 1):
133133
month = (
134-
f"2020-{p + 1:02d}" if p < 12 else f"{2020 + p // 12}-{p % 12 + 1:02d}"
134+
f"2020-{_p + 1:02d}"
135+
if _p < 12
136+
else f"{2020 + _p // 12}-{_p % 12 + 1:02d}"
135137
)
136138
self.periods.append(month)
137139

@@ -144,9 +146,7 @@ def test_get_sequential(self, benchmark):
144146
periods_objs.append(periods_objs[-1])
145147

146148
def _run():
147-
for _ in periods_objs:
148-
holder._as_of_snapshot = None # reset snapshot for fair comparison
149-
holder._as_of_snapshot = None
149+
holder._as_of_snapshots.clear() # reset LRU cache for fair comparison
150150
for p in periods_objs:
151151
holder.get_array(p)
152152

@@ -218,6 +218,52 @@ def test_forward_simulation(self, benchmark, n_months, change_rate):
218218
rng.integers(0, 10, size=k).astype(numpy.int32) for _ in range(n_months)
219219
]
220220

221+
base = rng.integers(0, 10, size=N).astype(numpy.int32)
222+
months = ["2020-01"] + [
223+
f"{2020 + m // 12}-{m % 12 + 1:02d}" for m in range(1, n_months + 1)
224+
]
225+
226+
def _run():
227+
h = _make_holder(N)
228+
h.set_input(months[0], base.copy())
229+
for m in range(1, n_months + 1):
230+
h.set_input_sparse(months[m], all_idx[m - 1], all_vals[m - 1])
231+
232+
benchmark.pedantic(_run, rounds=3, iterations=1)
233+
234+
235+
# ---------------------------------------------------------------------------
236+
# set_input_sparse vs set_input comparison
237+
# ---------------------------------------------------------------------------
238+
239+
240+
class TestSetInputSparseVsDense:
241+
"""Compare set_input (dense O(N) diff) vs set_input_sparse (O(k) + O(N) snapshot).
242+
243+
Run with:
244+
.venv/bin/pytest benchmarks/test_bench_asof.py -v --benchmark-sort=name -k "sparse"
245+
"""
246+
247+
N = 1_000_000
248+
249+
@pytest.mark.parametrize(
250+
"n_months,change_rate",
251+
[
252+
(12, 0.10),
253+
(60, 0.10),
254+
(60, 0.30),
255+
],
256+
ids=["1yr-10%", "5yr-10%", "5yr-30%"],
257+
)
258+
def test_dense(self, benchmark, n_months, change_rate):
259+
"""Forward simulation using set_input — O(N) diff + copy per SET."""
260+
N = self.N
261+
rng = numpy.random.default_rng(42)
262+
k = max(1, int(N * change_rate))
263+
all_idx = [rng.choice(N, size=k, replace=False) for _ in range(n_months)]
264+
all_vals = [
265+
rng.integers(0, 10, size=k).astype(numpy.int32) for _ in range(n_months)
266+
]
221267
base = rng.integers(0, 10, size=N).astype(numpy.int32)
222268
months = ["2020-01"] + [
223269
f"{2020 + m // 12}-{m % 12 + 1:02d}" for m in range(1, n_months + 1)
@@ -234,3 +280,34 @@ def _run():
234280
h.set_input(months[m], new_val)
235281

236282
benchmark.pedantic(_run, rounds=3, iterations=1)
283+
284+
@pytest.mark.parametrize(
285+
"n_months,change_rate",
286+
[
287+
(12, 0.10),
288+
(60, 0.10),
289+
(60, 0.30),
290+
],
291+
ids=["1yr-10%", "5yr-10%", "5yr-30%"],
292+
)
293+
def test_sparse(self, benchmark, n_months, change_rate):
294+
"""Forward simulation using set_input_sparse — skips O(N) diff entirely."""
295+
N = self.N
296+
rng = numpy.random.default_rng(42)
297+
k = max(1, int(N * change_rate))
298+
all_idx = [rng.choice(N, size=k, replace=False) for _ in range(n_months)]
299+
all_vals = [
300+
rng.integers(0, 10, size=k).astype(numpy.int32) for _ in range(n_months)
301+
]
302+
base = rng.integers(0, 10, size=N).astype(numpy.int32)
303+
months = ["2020-01"] + [
304+
f"{2020 + m // 12}-{m % 12 + 1:02d}" for m in range(1, n_months + 1)
305+
]
306+
307+
def _run():
308+
h = _make_holder(N)
309+
h.set_input(months[0], base.copy())
310+
for m in range(1, n_months + 1):
311+
h.set_input_sparse(months[m], all_idx[m - 1], all_vals[m - 1])
312+
313+
benchmark.pedantic(_run, rounds=3, iterations=1)

0 commit comments

Comments
 (0)