Skip to content

Add the DC-EGM solver (discrete-continuous endogenous grid method)#390

Draft
hmgaudecker wants to merge 186 commits into
feat/type-local-continuation-vfrom
feat/dcegm
Draft

Add the DC-EGM solver (discrete-continuous endogenous grid method)#390
hmgaudecker wants to merge 186 commits into
feat/type-local-continuation-vfrom
feat/dcegm

Conversation

@hmgaudecker

@hmgaudecker hmgaudecker commented Jun 19, 2026

Copy link
Copy Markdown
Member

Draft. Stacked on #391 (feat/type-local-continuation-v) → main. (The earlier
#388/#389 stack has been consolidated into this branch.)

Adds the DC-EGM solver (discrete-continuous endogenous grid method) behind the
Solver ABC seam. It inverts the Euler equation on an exogenous end-of-period grid
instead of grid-searching the continuous action, with a FUES upper envelope to drop
dominated candidates. Selected per regime via solver=DCEGM(...); GridSearch (the
default) is untouched, and the engine dispatches polymorphically — no fork on solver
type.

Commit structure (reviewable in steps)

  1. DC-EGM solver implementation (squashed) — the kernel: Euler inversion, the FUES
    envelope, the closed-form constrained segment, multi-target continuation, and
    asset-row mode; wired through the ABC DCEGM.validate / build_period_kernels.
  2. Stochastic-node expectation in lax.scan blocks — a memory knob for the child
    stochastic-node mesh.
  3. FUES provenance + credits (3 commits) — NOTICE/license for the FUES derivation and
    an acknowledgments page.
  4. Modular split of step.py (5 pure-move commits) — regime_introspection
    kernel_scopecontinuationstep_coreasset_row, taking the monolith from
    ~2900 to ~800 lines so the simple single-post-state path reads cleanly apart from the
    asset-row and continuation machinery.
  5. bind_continuation — collapse the continuation behind one bound
    savings → (expected_value, expected_marginal) callable (ContinuationPlan).
  6. Beartype + test fixes (2 commits) — the contract dataclasses are beartyped by the
    package claw, so cyclic field types use the two-definition alias pattern (precise for
    ty, bare container at runtime); plus a valid-DCEGM seam build test.
  7. Seam Layer 2 — a uniform PeriodKernel/KernelResult adapter; the solve loop
    calls one kernel and branches only on optional generic outputs, not on solver type.
  8. Streaming refine-to-query — fold the asset-row single-query interpolation into the
    FUES scan (refine_to_bracket + publish_node_from_bracket), removing the n_pad
    intra-node envelope scratch. Single-post-state still publishes the full refined
    envelope as its carry, unchanged.

Solvers and accuracy work built on the DC-EGM seam

On top of the kernel, this branch adds the prime-time solvers and the Dobrescu–Shanker
benchmark reproductions:

  • NEGM (nested EGM) and the 1-D RFC / MSS / LTM upper-envelope backends
    (DCEGM(upper_envelope=...)), plus the 2-D RFC/G2EGM two-asset foundation for the
    DS-2024 pension comparison.
  • DS-2026 Applications 1/2/3 reproduced (retirement, continuous-housing NEGM +
    EGM-FUES, discrete-housing-with-tax); full paper-scale matrix run on gpu-01.
  • Jit internal functions on a per-period level #129 — branch-aware VFI oracle (tests/solution/_branch_aware_vfi_oracle.py): pins
    the App.2 (S,s) EGM-vs-VFI gap to a comparator-ordering term that is identically zero
    on the scored interior, correcting the earlier attribution; the tolerated interior gap
    is DC-EGM durable-corner discretization. Standard-mode oracle == brute to 8e-5.
  • DS-2024 housing RFC-vs-NEGM comparison (ds2024_housing*.py,
    benchmarks/ds_replication/ds2024_housing_comparison.py): NEGM (nested continuous
    housing) vs RFC/FUES (discrete-housing DC-EGM, the source's per-housing-column 1-D
    rooftop cut). Reproduces the paper headline — RFC runtime < NEGM runtime — against a
    grid-search VFI oracle (faithful at delta = 0; the paper's delta = 0.10 keeper
    depreciation awaits a shared housing-axis carry extension).

Validation

  • Full gpu-01 oracle battery (Tesla V100, -n 4, on a free GPU): 1405 passed, 15
    skipped, 2 xfailed
    . Covers the DC-EGM-vs-brute value-function-parity battery, the
    19-case refine-to-query unit equivalence (streamed ≡ row-then-interp), and the Layer-2
    distinct-compiled-cores regression.
  • ty and prek clean.

Pending (does not block reviewing the engine)

  • DS-2024 housing delta = 0.10 fidelity: the keeper depreciates the held stock off
    the housing grid; the strict-identity NEGM keeper needs a housing-axis carry extension.
  • ACA-scale memory (Gate C): refine-to-query's n_pad runtime win is only visible at
    ACA scale — the in-suite oracles are too small to exercise it; handed to the aca side.
  • Throughput vs the brute-force solver at matched accuracy (Gate E) is a downstream
    experiment this work unlocks, not settled here.

🤖 Generated with Claude Code

hmgaudecker and others added 3 commits June 17, 2026 15:01
Route the existing brute-force grid search through a per-regime solver
configuration and a builder registry, with no change to the numerics.

- `Regime.solver: BruteForce | DCEGM` (default `BruteForce()`), exported from
  `lcm` alongside the `DCEGM` configuration class.
- `_lcm.solution.registry`: `SolverBuildContext`, `SolverKernels`, the
  `SolverKernelBuilder` protocol, and `_build_brute_force_kernels` (the former
  `_build_max_Q_over_a_per_period`), dispatched on `type(regime.solver)` via
  `SOLVER_KERNEL_BUILDERS`.
- `DCEGM` is published as the final configuration surface (fields + field
  validation), but its engine is not yet wired in; a regime requesting it is
  rejected at model build with `NotImplementedError`.

Behavior-preserving: the full test suite passes unchanged and an explicit
`BruteForce()` yields the same value function as the default.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Points the benchmark feature's aca-model at feat/dcegm-solver's tip — main's
audit fixes + the DC-EGM solver + smooth-share eligibility — the version the
solver-seam work is developed against.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the `type(solver)`-keyed builder registry with a polymorphic `Solver`
ABC. The engine now calls `solver.validate(context)` then
`solver.build_period_kernels(context)` — no `SOLVER_KERNEL_BUILDERS` dict, no
`BruteForce | DCEGM` union, no standalone DC-EGM guard.

- `_lcm/solution/contract.py` (new): the `Solver` ABC (abstract
  `build_period_kernels`, default no-op `validate`), `SolverBuildContext`, and
  `SolverKernels`. An engine leaf — imports nothing that reaches `lcm.solvers`,
  so the façade can re-export it without an import cycle.
- `_lcm/solution/solvers.py` (new): `GridSearch(Solver)` (the relocated
  grid-search builder, with function-local `jax`/`get_max_Q_over_a` imports) and
  `DCEGM(Solver)` (the published config; `validate` raises the not-yet-available
  guard, so a regime requesting it is rejected at model build).
- `lcm/solvers.py` → thin re-export façade; `registry.py` deleted; the
  `processing` dispatch and `Regime.solver` field updated.
- Rename the default solver `BruteForce` → `GridSearch` (more descriptive;
  alpha permits the break).

Faithful to dcegm-solver-seam-abc-design.md (ABC, not Protocol). Layer 2
(generic KernelResult) does not apply here — the stub seam has no EGM fork.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@read-the-docs-community

read-the-docs-community Bot commented Jun 19, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Benchmark comparison (main → HEAD)

Comparing 7d8d2aac (main) → e3620099 (HEAD)

Benchmark Statistic before after Ratio Alert
aca-baseline execution time 16.099 s 15.333 s 0.95
peak GPU mem 588 MB 588 MB 1.00
compilation time 521.91 s 444.98 s 0.85
peak CPU mem 7.64 GB 7.42 GB 0.97
aca-baseline-debug execution time 103.913 s 102.546 s 0.99
peak GPU mem 587 MB 587 MB 1.00
compilation time 658.61 s 648.49 s 0.98
peak CPU mem 8.13 GB 8.15 GB 1.00
Mahler-Yum execution time 4.612 s 4.533 s 0.98
peak GPU mem 520 MB 520 MB 1.00
compilation time 11.78 s 11.21 s 0.95
peak CPU mem 1.59 GB 1.59 GB 1.00
Precautionary Savings - Solve execution time 26.9 ms 23.8 ms 0.89
peak GPU mem 8 MB 8 MB 1.00
compilation time 1.55 s 1.57 s 1.02
peak CPU mem 1.16 GB 1.16 GB 1.00
Precautionary Savings - Simulate execution time 65.9 ms 60.3 ms 0.91
peak GPU mem 157 MB 157 MB 1.00
compilation time 3.79 s 3.57 s 0.94
peak CPU mem 1.32 GB 1.32 GB 1.00
Precautionary Savings - Solve & Simulate execution time 89.6 ms 89.2 ms 1.00
peak GPU mem 566 MB 566 MB 1.00
compilation time 4.81 s 4.74 s 0.99
peak CPU mem 1.30 GB 1.31 GB 1.00
Precautionary Savings - Solve & Simulate (irreg) execution time 214.0 ms 208.1 ms 0.97
peak GPU mem 2.18 GB 2.18 GB 1.00
compilation time 5.55 s 5.00 s 0.90
peak CPU mem 1.37 GB 1.37 GB 1.00
IskhakovEtAl2017DCEGMSimulate execution time 192.9 ms
compilation time 4.38 s
peak CPU mem 1.70 GB
IskhakovEtAl2017DCEGMSolve execution time 1.966 s
compilation time 7.23 s
peak CPU mem 1.59 GB
IskhakovEtAl2017Simulate execution time 243.7 ms 201.2 ms 0.83
compilation time 5.19 s 4.33 s 0.83
peak CPU mem 1.29 GB 1.30 GB 1.01
IskhakovEtAl2017Solve execution time 53.9 ms 50.9 ms 0.94
compilation time 0.89 s 0.68 s 0.76
peak CPU mem 1.15 GB 1.15 GB 1.00
IskhakovEtAl2017DCEGMSimulateGpuPeakMem peak GPU mem 282 MB
IskhakovEtAl2017DCEGMSolveGpuPeakMem peak GPU mem 0 MB
IskhakovEtAl2017SimulateGpuPeakMem peak GPU mem 281 MB 281 MB 1.00
IskhakovEtAl2017SolveGpuPeakMem peak GPU mem 67 MB 67 MB 1.00

hmgaudecker and others added 21 commits June 21, 2026 18:09
The full discrete-continuous endogenous grid method, re-rooted as a single
commit on top of the solver-selection seam (PR #388). The tree reproduces the
reviewed feat/dcegm tip exactly, minus two root-level audit scratch files.

On top of the seam this adds the `_lcm.egm` engine, DC-EGM build-time
validation, the `_build_dcegm_kernels` builder and the EGM-specific
SolverBuildContext / SolverKernels / SolutionPhase fields, and replaces the
seam's build-time `NotImplementedError` guard with the real solver. To be
split into reviewable sub-PRs at review time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The child stochastic-node expectation built the full node-stacked
`node_values`/`node_marginals` arrays and reduced them with a weighted sum
downstream of the map. So `stochastic_node_batch_size` (a `lax.map`) shed only
the per-node gather working-set while still materialising the full
`(..., n_nodes)` output — half a fix, and at large scale the residual stack can
bind and even raise peak.

Fold the weighted sum into a `lax.scan` carry instead: each block reads only its
nodes and accumulates the running expectation, so neither the per-node gather
nor the full node-stacked output is materialised. `0` (or a size >= the mesh)
keeps the single fused vmap + reduction.

The block reduction reorders the floating-point adds, so the value function
matches the fused solve to tight numerical tolerance rather than bit-identically;
the node-batch invariance test asserts `allclose` at rtol/atol 1e-9.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- New `docs/credits.md` (linked into the docs nav and the README), crediting
  the methods pylcm builds on (Carroll EGM; Iskhakov–Jørgensen–Rust–Schjerning
  DC-EGM; Dobrescu–Shanker FUES; Fella generalized EGM; the Druedahl /
  Druedahl–Jørgensen / Dobrescu–Shanker multidimensional-EGM methods behind the
  reserved upper-envelope backend), the discretization methods (Tauchen,
  Rouwenhorst, Kopecky–Suen, Fella–Gallipoli–Pan), the replicated example models
  shipped in pylcm (IJRS 2017, Mahler–Yum 2024), QuantEcon, and the open-source
  ecosystem (OSE, NumEconCopenhagen, akshayshanker, JeppeDruedahl, fediskhakov).
- Add the missing BibTeX entries (Carroll 2006, Fella 2014, Dobrescu–Shanker
  FUES 2022 and RFC 2024, Druedahl 2021, Druedahl–Jørgensen 2017).
- Fix the Mahler & Yum (2024) page range in references.bib: 1697–1733
  (Econometrica 92(5)), not 1307–1343.

Every citation verified against the journal/DOI or the authors' repositories.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pylcm's Fast Upper-Envelope Scan was adapted from the
`OpenSourceEconomics/upper-envelope` package (Apache-2.0, © The Upper-Envelope
Authors) and since substantially modified. Both projects are Apache-2.0, so the
derivation is permitted; this records the upstream attribution and the
modification notice (Apache-2.0 §4(b)/(c)) in the `fues.py` header and the
Credits page, which previously credited only the method's paper.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pylcm's Fast Upper-Envelope Scan is adapted from OpenSourceEconomics/upper-envelope
(Apache-2.0). Add a root `NOTICE` recording that derivation and the upstream
copyright, and vendor the upstream license text verbatim at
`licenses/upper-envelope-LICENSE`. The in-source attribution in `fues.py` (which
ships in the wheel via `src/`) already carries the modification notice; these
files add the conventional repo/sdist-level attribution.

No other part of the codebase derives from third-party source — the DC-EGM kernel
and the NEGM Phase-0 toys are written against pylcm's own API — so only the
upper-envelope derivation is recorded.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the pure spec-introspection helpers (discrete/passive/process state
names, child state name, child discrete actions, child resources function and
its arg names, regime-function concatenation) out of step.py into
egm/regime_introspection.py. These read only a regime's static spec and hold
no kernel state, so they form the dependency leaf that the kernel build, the
continuation subsystem, and the kernel-scope checks all import from — breaking
the otherwise-circular dependency between step.py and the scope checks that
move next. Behavior-preserving pure move.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the four `_find_unsupported_*` functions — the build-time checks that
name features outside the kernel's current scope — out of step.py into
egm/kernel_scope.py. They read the processed kernel-build context (period
carry targets, qualified flat params, the regime's VInterpolationInfo), which
the model-time `validation` module does not have, so they belong with the
kernel build rather than with the model-construction validators. step.py keeps
calling `_find_unsupported_feature` to install the raising step. Behavior-
preserving pure move.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the multi-target continuation machinery — target splitting, the per-child
carry reader, per-row resources queries and gradients, the stochastic-node
expectation, passive-state blending, discrete-choice aggregation (hard max or
EV1 logsum), and the child-read build — out of step.py into egm/continuation.py.
The core EGM step and the asset-row step keep calling `_get_child_carry_reader`
/ `_build_child_reads` / `get_egm_continuation_targets`, so step.py shrinks by
~870 lines and the textbook Euler inversion no longer sits in the same file as
the multi-target, passive-state, taste-shock, and stochastic-node logic.
Behavior-preserving pure move; the bound continuation contract is a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the certifiable core algorithm — _EgmKernelPieces, _get_solve_one_combo,
_compute_nodes_over_savings, _get_compute_node, _compute_constrained_candidates,
_publish_V_and_carry_rows, and the constrained-offset constant — out of step.py
into egm/step_core.py. It reads the expected continuation through
continuation's per-target carry reader and depends on nothing in asset-row mode
or the kernel orchestration. step.py keeps the orchestration (build, dispatch,
combo map) and the asset-row path (extracted next). The one test importing
_publish_V_and_carry_rows from step is repointed to step_core. Behavior-
preserving pure move.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the per-Euler-node solve — _get_solve_one_combo_asset_rows, its
differentiated continuation closure _get_expected_continuation_value, and the
scalar-query publisher _publish_node_V_and_policy — out of step.py into
egm/asset_row.py. It maps step_core's single-post-state pipeline over the Euler
grid and consumes continuation's per-target reader; step.py keeps only the
orchestration (build, dispatch, combo map, feasibility, raising step). This
completes the split: the textbook core (step_core) and the asset-row extension
(asset_row) are now separate, certifiable modules. Behavior-preserving pure
move; the streaming refine-to-query feature lands here next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collapse the multi-target continuation aggregation into `ContinuationPlan`
(build-time statics) plus `bind_continuation`, which binds the plan to one
combo pool and the next period's carries and returns a
`savings -> (expected value, expected marginal)` callable. The textbook step
(`_get_compute_node`) and the asset-row step (`_get_expected_continuation_value`)
read only that bound callable instead of each re-deriving the regime-transition
probabilities, child carry readers, and per-target aggregation.

Behavior-preserving: the bound callable performs the identical aggregation, and
it evaluates the probabilities and child reads from the combo pool *inside* the
builder, so the asset-row Euler-slot gradient still carries the direct
dP/da . EV terms (precomputing them outside the differentiated closure would
silently drop them).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…the GridSearch rename.

The re-rooted `SolverBuildContext` / `SolverKernels` are beartyped by the package
claw, so under PEP 649 their field annotations resolve to real objects when a
context is constructed at model build. `UserRegime` and `VInterpolationInfo`
(whose module imports `lcm.regime`) close an import cycle through the
`lcm.solvers` façade, so each is referenced via a two-form alias — precise for ty
under `TYPE_CHECKING`, a bare container for the beartype claw at runtime. The
other engine types import normally.

The `BruteForce` → `GridSearch` rename had not reached the DC-EGM oracle test
modules; propagate it, including the validation-message `match=` string.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`test_model_with_dcegm_solver_builds` asserts that selecting the DC-EGM solver
builds the model rather than being rejected, but it placed the solver on the
stock iskhakov `working_life` regime, which lacks the resources, post-decision,
and inverse-marginal-utility functions the DC-EGM contract requires — so
validation correctly rejected it. Point the test at `build_dcegm_model`, whose
regimes satisfy the contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r 2)

Remove the residual solver-type fork in the backward-induction loop. The
loop no longer branches `if regime.solution.egm_step is not None`; every
regime — grid search or DC-EGM — exposes one uniform per-period adapter that
wraps its shared jitted core, calls it with the solver's own argument layout,
and assembles a `KernelResult(V_arr, carry, sim_policy)` outside JIT. The only
branches left in the loop are on the optional generic outputs
(`result.carry is not None` / `result.sim_policy is not None`), never on
solver type.

Contract (`_lcm/solution/contract.py`):
- Add `KernelResult` (frozen dataclass) and `PeriodKernel` (runtime_checkable
  structural Protocol exposing `core`, `build_lower_args`, `with_fixed_params`,
  `__call__`).
- Reshape `SolverKernels` → `SolutionKernels(period_kernels, continuation_template)`,
  dropping the `max_Q_over_a` / `egm_step` / `egm_carry_template` /
  `egm_reachable_targets` quartet.
- Name the continuation channel solver-agnostically via `ContinuationPayload`
  (= `EGMCarry` today).

Solvers (`_lcm/solution/solvers.py`):
- `GridSearch`/`DCEGM.build_period_kernels` build per-period adapter closures
  (`_GridSearchPeriodKernel` / `_DCEGMPeriodKernel`) over the existing jitted
  core; the EGM core's reachable-carry filter and param projection move onto
  the DC-EGM adapter.

Engine wiring:
- `SolutionPhase` carries `period_kernels` + `continuation_template` (plus a
  `solves_via_dcegm` property); the cyclic `PeriodKernel` annotation uses the
  TYPE_CHECKING-precise / runtime-widened two-definition pattern so the
  beartype claw resolves the dataclass field at construction.
- The terminal-continuation publisher is composed engine-side
  (`_TerminalCarryPeriodKernel` in regime_building/processing.py) as an
  output decorator, not inside `GridSearch`.
- `model_processing._partial_fixed_params_into_regimes` binds fixed params via
  each adapter's `with_fixed_params`; `simulation/additional_targets.py` reads
  `solves_via_dcegm`.

Compilation reuse and the loop's memory/host-offload/gc discipline are
preserved: only the shared core is jitted and identity-deduped (by
`id(Q_and_F)` / function identity), no adapter is jitted per period, and the
solve loop's device eviction / carry-buffer release / reachable-carry subset
logic is untouched (dispatch re-routed only). Add
`test_period_kernels_sharing_a_config_reuse_one_compiled_core` asserting the
distinct-compiled-core count.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In asset-row mode the per-node solve refines a full NaN-padded upper
envelope of static length n_pad and interpolates it at exactly one
query (resources_at_node) to publish a scalar (V_node, policy_node),
then discards it. Batched over combos x Euler-nodes, that n_pad scratch
axis is the binder that OOMs large solves.

Fold the single-query interpolation into the upper-envelope scan so the
n_pad rows never materialize:

- Extract `_interp_between_nodes` in interp.py — the pure two-bracket-node
  arithmetic plus the Hermite correction, shared by `interp_on_prepared_grid`
  and the streamed path. The existing interp unit tests pin it (behavior-
  preserving). This guarantees the streamed value cannot diverge from
  row-then-interp: only which two nodes differs.
- Add `refine_to_bracket` + `QueryBracket` in fues.py — a new scan driver
  reusing `_inspect_candidate` verbatim, changing only the emission sink: each
  step's up-to-3 emitted points fold into an O(1) bracket-capture carry
  (first/second, rolling prev/last, lo = latest emitted with grid <= q, hi =
  first with grid > q, running counts), no [n_input,3] stack and no [n_pad]
  rows. Post-scan, the bracket reproduces
  clip(searchsorted(side="right"), 1, max(n_kept-1, 1)) node-for-node: the
  side="right" tie-break at a duplicated kink (right copy wins the lower slot),
  the below-first clamp (first, second), and the at-or-above-last clamp
  (second-last, last; single-live falls back to the NaN-padded slot like the
  row path). Geometry only — no utility, borrowing limit, or floor.
- Add `publish_node_from_bracket` in asset_row.py — the scalar-bracket
  counterpart of `_publish_node_V_and_policy`: the shared `_interp_between_nodes`
  with the value Hermite slope = grad(utility) at the two bracket policies only,
  plus the constrained floor and the n_kept > n_pad overflow poison, preserved
  identically. Wire the asset-row node solve to refine_to_bracket ->
  publish_node_from_bracket and drop the n_pad envelope from that path.

Scope: asset-row mode only. Single-post-state mode publishes the refined
envelope AS its inter-period carry (queried later at many parent points), so
step_core's single-post-state path keeps calling `refine` unchanged.

The new equivalence test pins the streamed pair against refine +
_publish_node_V_and_policy for the same candidates + query, to fp tolerance:
smooth, kinked, multi-crossing, all-dead, single-live, overflow, and query
below-first / above-last / exactly on a duplicated kink abscissa.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The backward-induction loop forced a `gc.collect()` every period to return the
device pool from a rolled-off continuation carry (a registered pytree CPython
frees only on a cyclic collection). A grid-search period rolls no such carry, so
the collection bought nothing there while its fixed cost dominated the warm solve
of small / many-period models — deteriorating those benchmarks by up to ~25x
(the Iskhakov-et-al grid-search warm solve: 1.01s -> 0.35s).

Gate the collection on the loop's own per-period carry output
(`period_egm_carries`), empty exactly when no carry rolled. The backward-
induction loop stays backend-agnostic — it does not fork on solver type or read
`solution.solves_via_dcegm`; the clean separation of the solve backends is a
separate design task.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Dobrescu-Shanker housing keeper (no-house-trade) regime is a plain
1-D DC-EGM model: liquid assets is the Euler state, consumption the
continuous action, and housing a passive continuous state whose service
flow `alpha * log(housing)` is read directly by utility.

The cornerstone test locks in that the DC-EGM envelope-condition guard
admits utility reading a *passive* continuous state (it forbids only the
Euler state) by calling `validate_dcegm_regimes` directly on the
finalized keeper regimes — no kernel trace, no solve. The full GPU solve
is gated off the local box.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase-0 of the NEGM / multidimensional DC-EGM build (general 2-asset capability):

- `negm_phase0/brute_feasibility_probe.py`: sweeps a 2-continuous-state
  (liquid + illiquid) + 2-continuous-action + income-shock lifecycle solved with
  BruteForce, reporting wall-clock and GPU peak. Finding: brute is single-solve
  feasible at fine grids (~4-9 s, <600 MB on a V100 up to 2.5 B combos/period) —
  the "brute breaks at Laibson scale" premise does not hold; NEGM is an
  accuracy/throughput/generality play, not a feasibility rescue.
- `negm_phase0/kinked_toy_oracle.py`: the smallest 2-asset toy carrying the
  Laibson frictions (credit-card rate kink at a^X=0, withdrawal penalty at Iz=0,
  Z>=0 floor, u(C+iota*Z)), brute-solved as the V-parity oracle for a future NEGM
  prototype.
- `negm_phase0/negm-phase0-findings.md`: brute-feasibility table, the NEGM
  nesting derivation with the P1 kink-candidate handling, build-vs-buy, and the
  gate recommendation (build for the general capability; reframe the motivation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…del.

Implements P2 of the Dobrescu–Shanker replication: the NEGM solver as a
third `Solver` on the ABC seam, composing an inner 1-D `DCEGM` solve of the
consumption-savings problem with an outer deterministic grid search over a
durable/illiquid margin.

- `NEGM(Solver)` in `_lcm/solution/solvers.py` (re-exported from
  `lcm.solvers` and `lcm`): frozen dataclass with `inner: DCEGM`,
  `outer_action`, `outer_post_decision`, `outer_grid`, optional
  `outer_no_adjustment_candidate`. `__post_init__` guards reject a stochastic
  outer grid and margins that coincide with the inner ones
  (`RegimeInitializationError`), mirroring DCEGM's `_fail_if_*` helpers.
- `_NEGMPeriodKernel`: non-jitted outer adapter over the shared jitted inner
  DC-EGM core; sweeps the outer grid plus the per-node no-adjustment candidate,
  binds the outer post-decision into the regime's flat params per node, and
  collapses the outer axis by `max`. Static only (ty + prek green); not run.
- `validate_negm_regimes` in `_lcm/egm/negm_validation.py`, wired into the same
  model-build path as `validate_dcegm_regimes`. Fail-loudly contract
  (`ModelInitializationError`, each message naming the alternative solver): no
  outer margin → use `DCEGM`; outer margin coincides with the inner margin →
  reject; coupled-2-Euler (outer post-decision feeds the inner Euler law /
  savings-stage pool, or a non-additively-separable utility cross-term) → use
  the 2-D EGM foundation; taste-shocked discrete choice → reject (ordering).
- Kinked-toy NEGM model (`tests/test_models/negm_kinked_toy.py`) and the G1
  parity test (`tests/solution/test_negm_kinked_toy.py`, skipped — GPU-only).
- Unit tests: 6 config-guard cases, 8 contract-validator cases (construct and
  assert-raise, no solve).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The DS housing-keeper DC-EGM probe gains a brute-force (GridSearch) twin and a
DC-EGM-vs-brute value-parity test, mirroring tests/solution/test_egm_passive.

The brute twin solves the same keeper economics without the Euler/passive
split: housing is a regular fixed continuous state, consumption a grid-searched
action, the liquid-asset law of motion reads consumption directly, and a
borrowing constraint floors post-decision liquid assets at the borrowing limit
(the floor of the DC-EGM savings grid). build_working_regime, build_model, and
build_params gain a variant arg ("dcegm" | "brute").

The parity test asserts DC-EGM >= brute and allclose at the passive-
interpolation tolerance, excluding the lowest liquid-asset nodes where the
coarse consumption grid makes brute itself unreliable. It is skip-marked
(gpu-01 only) since it solves and OOMs the local box.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The splay/batch-size invariance assertions hard-coded float64 tolerances
(rtol/atol 1e-12 for the combo axes, 1e-9 for the stochastic-node reduction),
which the 32-bit CI job cannot meet: rescheduling the lax.map / reordering the
weighted-sum adds shifts results at single-precision scale. Key the tolerance
off conftest's X64_ENABLED, matching the established pattern in
test_economic_validation / test_grids.

Also stop test_egm_interp forcing dtype=jnp.float64 for its interpolation grid;
under --precision=32 that raises a float64-truncated-to-float32 UserWarning that
CI treats as an error. Use the canonical default float dtype instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The solve module's job is backward induction; name it for that. Renames
src/_lcm/solution/solve_brute.py → backward_induction.py and
tests/solution/test_solve_brute.py → test_backward_induction.py, updates
every import (model.py, simulation/compile.py, test_beartype_claw.py),
the two test-function names, and the doc/docstring references.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
hmgaudecker and others added 30 commits June 25, 2026 13:17
Add `envelope_at_query`: the query-side counterpart of the full-row refiners. It
evaluates the branch-aware upper envelope directly at a set of query abscissae —
a fixed-shape (n_query, n_segment) bracket-and-reduce with no sequential scan and
no NaN-padded refined row. For each query the value is the maximum over the live
bracketing segments; the policy and marginal are the winner's. A folded branch
contributes several bracketing segments, so the maximum is exact for the
piecewise-linear correspondence, and a value tie breaks right-continuously (the
larger-slope segment) to match the kernel's side="right" read.

Topology is explicit: a segment is a consecutive-pair link whose endpoints share
a segment_id, so unrelated branches are never bridged. Verified against the host
oracle on a clean crossing, a folded branch, a non-bridging branch, and an
out-of-range query.

This is the backend asset-row mode wants — one query per Euler node, no full
envelope to refine — and the machinery the NEGM outer envelope migrates onto so
a sub-grid adjustment island is never sampled away.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The nested-EGM outer fold read every adjuster candidate onto the keeper's
shared cash-on-hand grid and took the running maximum at those nodes — a
sampled maximum that silently drops an adjuster branch winning only on a coh
interval strictly between two shared nodes (a thin (S, s) adjustment island).

Recover the island with a bounded, fixed-size merge: each fold also reads the
running envelope at the candidate's own coh nodes and records that adjuster's
single best interior win in its own pre-allocated column, so the published
carry grows by the adjuster count (additive), never the grid product. A win
must clear a relative noise floor — the shared-grid read is a cubic-Hermite
interpolant, so a smooth branch's apparent excess is O(h^2) representation
noise, whereas a genuine island beats the envelope by an economically
meaningful margin (measured separation ~1000x). 'finalize' splices the peaks in
and sorts into ascending coh order.

The published carry is therefore n_adjusters wider than the keeper's. That
width is an AOT-compilation invariant — the parent period's kernel is lowered
against the regime's continuation template — so widen the NEGM regime's
template by the same bounded count.

Flips the strict-xfail (S,s) island red gate to green; the synthetic
build_outer_envelope_carry battery and all NEGM solve tests stay green. The
kinked-toy CPU simulate near-tie (a documented cross-backend grid-search argmax
difference) is unchanged — the splice is transparent to it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The LTM/MSS topology fix (consume `segment_id`) lets the upper-envelope
backends and the host oracle avoid bridging unrelated branches when an explicit
topology is supplied. The production single-post-state kernel supplies none, and
that is correct, not an omission: the candidate chain it refines is one
connected branch (the constrained segment joins the unconstrained Euler branch
continuously at the borrowing limit), within-period discrete choices are
maximized across combos in the V array rather than stacked into one refine, and
a future-kink-induced fold is data-emergent (a decrease in the inverted
endogenous grid) rather than known a priori. So there is no explicit multi-branch
topology for the kernel to emit; explicit `segment_id` is the oracle/test path.
Document the contract at the call site.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…RFC spatial-index)

The four 2-D region clouds (ucon/dcon/acon/con) each invert their FOCs assuming
their own constraints bind, so a node where the region's inequalities fail is a
finite but KKT-inconsistent candidate. These were computed (`valid_region`) but
not dropped before the rooftop cut and publish, because doing so regressed the
pension RFC solver: masking thins the valid survivors at coarse grids, and the
publisher preferred the *largest-area* containing simplex, whose linear
interpolant spans a wide arc of the curved value surface and errs at the
low-pension corner (p90 0.20 -> 0.246 on `test_ds_pension_rfc_solver`).

Root cause was the simplex selection, not the mask: barycentric interpolation
error scales with triangle area x surface curvature, and the degeneracy floor
already rejects ill-conditioned triangles, so the *smallest* non-degenerate
containing simplex is both well-conditioned and most accurate. Prefer it.

With that publish, applying the KKT mask now improves accuracy instead of
regressing: `test_rfc_two_asset_parity` p90 0.044 -> 0.015, and the pension RFC
solver passes (p90 < 0.20). 80 two-asset RFC/G2EGM/pension/publisher/oracle
tests stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The segment mesh marked a node valid on finiteness and positive consumption
alone, so a finite-but-KKT-inconsistent cloud node (a region's FOCs solved where
its inequalities fail) entered the triangulation. AND the region's `valid_region`
mask into `valid_node` so only genuine KKT candidates form the mesh.

Unlike the RFC interpolating publish, G2EGM's across-segment maximum already
dominates these spurious nodes (they carry a suboptimal, lower value), so the
mask does not regress: a target a masked node used to cover becomes a hole and
falls to the accurate direct-Bellman fill. 22 G2EGM/two-asset parity tests stay
green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
EGM needs (u')^{-1} to recover the action from the Euler equation; a model whose
marginal utility has no closed-form inverse could not be solved. Add a bracketed
bisection that inverts any continuous, strictly decreasing u' numerically.

The load-bearing piece is the gradient: a fixed-iteration bisection returns
correct values but, differentiated naively, wrong or iteration-dependent
gradients. The forward solve is detached (stop_gradient) and the derivative is
supplied by the implicit-function theorem through one corrector step, so

  dc*/dm = 1/u''(c*),  dc*/dθ = -∂_θ u'(c*) / u''(c*)

exactly, independent of the iteration count. The unrolled where-based loop is
jittable and vmap-safe.

Tests: CRRA value parity to rtol 1e-7; the parameter derivative matched against
the analytic CRRA derivative AND a finite-difference reference (the contract that
guards against the wrong-gradient failure mode); residual convergence on
u'(c)=c^{-2}+e^{-c} (no closed-form inverse); vmap+jit. Kernel auto-selection
(use this when no analytic inverse_marginal_utility is supplied) is the
follow-on wiring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make 'inverse_marginal_utility' optional for the DCEGM solver: a regime that
omits it is solved by the iEGM path, which inverts u' numerically from 'utility'.

- Validation: drop 'inverse_marginal_utility' from the required-functions set,
  and skip the (u')^-1-vs-jax.grad(utility) consistency check when it is absent.
- step.py: extract the analytic inverse only when present, else None.
- step_core.py: '_EgmKernelPieces.inverse_marginal_utility_func' is now optional;
  when None, build the inverter from the action-derivative of the combo-bound
  'utility_of_action' (jax.grad) over a savings-scaled bracket. The analytic path
  is unchanged (byte-identical) when the inverse is supplied.

End-to-end test: the IJRS CRRA retirement model solved with the analytic inverse
removed reproduces the analytic-inverse value function to rtol 1e-6, so the
numeric root finder recovers the same (u')^-1 through the full solve. Existing
DC-EGM/NEGM analytic-path and validation tests stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Generalize the linear continuation E[V'] over the one-step stochastic lottery to
a certainty-equivalent g_inv(E[g(V')]) — the form the Epstein-Zin / power-mean
and entropic-risk operators take. A regime opts in by supplying a
value_transform (g) / inverse_value_transform (g_inv) pair in its functions;
both read continuation_value (the next value array) from the engine plus any
params. With neither supplied the operator is the identity and the expectation
stays the plain weighted average — byte-identical, default fixtures unchanged.

- Q_and_F seam: the stochastic-state average becomes
  inverse_value_transform(average(value_transform(V'), weights)) when a pair is
  present; regime mixing stays linear (the documented first implementation).
- Build: _build_continuation_operator concatenates the pair from the regime
  functions (excluding H); supplying only one half is a ModelInitializationError.
  Their non-continuation_value args join the Q_and_F signature so params reach
  them at solve time.
- Params template: continuation_value is engine-provided, so it is dropped from
  the discovered params (alongside the existing marginal_continuation handling,
  refactored into _drop_engine_provided_args).

Tests: identity pair reproduces the linear solution byte-for-byte; the entropic
CE on a known lottery equals the hand value and lies below the mean (risk
penalty); a level-shift operator changes the solved value function (end-to-end
wiring + param read); supplying one transform without its inverse is rejected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dness caveat

From Pro's adjudication of the roadmap-tail work:

- W1 (A.4): the implicit-function corrector is valid only at an interior bracketed
  root; outside the bracket the detached bisection clamps to a bound and the
  interior 1/u'' Jacobian would silently extrapolate a wrong gradient. Guard it:
  the target is bracketed iff u'(c_upper) <= m <= u'(c_lower); the unbracketed
  branch returns the clamped root detached, so the binding-bound active-set
  derivative is 0, not the bogus interior slope. New test pins value-at-bound +
  zero gradient.
- RFC publish (A.3): the docstring still claimed largest-area simplex selection
  while the code selects smallest; correct it, and document that the area floor is
  not a shape gate (a sliver can pass it) — the robust refinement is a most-local
  well-conditioned simplex, deferred since the area gate suffices on the models
  exercised.
- W4a (A.5): document the soundness boundary — wrapping only the per-target
  stochastic average with linear regime mixing equals the joint certainty
  equivalent only for affine operators or a single reachable target; a non-linear
  operator (EZ/entropic) over multiple targets is unsound and needs the
  joint-lottery form (with is_linear capability metadata), not yet implemented.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pro's review flagged that wrapping only the per-target stochastic average and
then mixing target regimes linearly is unsound for a non-linear operator:
sum_r p_r g_inv(E_z[g(V_rz)]) != g_inv(sum_rz p_r p_z g(V_rz)) unless g is affine
or there is one reachable target.

Fix it properly (not a guard): split the operator into _transform_and_average
(transform + stochastic average per target, no inverse) and _invert_joint (apply
g_inv once). The seam now accumulates the regime-mixed transformed continuation
sum_r p_r E_z[g(V_rz)] and inverts once, giving the genuine joint certainty
equivalent g_inv(sum_rz p_r p_z g(V_rz)) — sound across both regime and shock
uncertainty. The diagnostic compute_intermediates path mirrors the seam so its
reported E_next_V / Q match the solved value for an operator model.

Default (no operator) and affine operators are byte-identical (g_inv outside the
linear mix collapses to the old form). New test pins the joint CE against the
hand value and shows it differs from the per-target form when the two regimes'
continuations differ. _apply_continuation_operator is kept as the single-lottery
composition for the unit tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pro: the area floor (DEGENERATE_AREA_FLOOR) rejects collinear triangles by area
only, not shape, so an ill-conditioned sliver can pass it and give an unstable
affine interpolant. Add a normalized mean-ratio shape gate Q = sqrt(3)*2A /
(sum of squared edges) in (0,1] (1=equilateral, ->0 sliver) and select the
smallest-area *well-conditioned* containing simplex (Q > SHAPE_QUALITY_FLOOR),
falling back to the smallest containing one only when every container is a sliver
(coverage over conditioning), then the nearest survivor. Replaces the bare
smallest-area rule. 25 RFC/G2EGM/publisher/parity tests stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lity

Replace the fixed 60-iteration bisection forward solve with a safeguarded
Newton iteration in log-consumption. In log c the CRRA family is exactly
linear (log u' = -crra log c), so Newton converges in a single step and stays
well-conditioned for any power-law-like u' whose steep-near-zero,
flat-at-large-c curvature makes a plain-c Newton overshoot into a slow
bisection tail. The bisection fallback (taken when a Newton step leaves the
live bracket) preserves robustness. The detached-root + analytic IFT corrector
gradient contract and the bracket/active-set guard are unchanged.

A 15-iteration convergence test on the wide [1e-8, 1e6] bracket — where 15
bisection steps cannot reach the root — pins the solve as Newton-driven.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Thread a scan_unroll factor through refine_envelope and refine_to_bracket
into both FUES jax.lax.scan call sites, exposed as DCEGM.fues_scan_unroll
(validated >= 1) and wired through the get_upper_envelope / get_bracket_finder
dispatch. The FUES scan is sequential and latency-bound on accelerators;
unrolling trades compile time for fewer loop-carry round trips. Pure
performance knob — the refined envelope is bit-identical across unroll values
(numerical-invariance test at unroll in {2,4,8}); config rejects unroll < 1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The query-side backend's dense (n_query, n_segment) bracket-and-reduce is
itself the memory wall at scale. Add an opt-in segment_block_size that swaps it
for a two-pass blocked scan over segment blocks, peaking at (n_query, block):

- pass 1 accumulates the running per-query max (the envelope value);
- pass 2 re-scans, keeping the global max-slope winner among segments within
  the tie tolerance of that fixed envelope value (the right-continuous
  tie-break).

Both passes are exact associative folds against a fixed target, so the result
matches the dense path up to floating-point reassociation between the two XLA
lowerings (parity test at block sizes {1,2,3,4}, rtol 1e-12). Default
(segment_block_size=0) keeps the dense path unchanged. Refactored the shared
link construction into a _SegmentLinks struct. The memory/GPU payoff is
gpu-01-validated; this lands the correct, opt-in mechanism.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ming doc

A.3 (rfc_2d.py): the shape-quality formula returned half the standard mean
ratio (0.5 at equilateral, not 1). Use the standard Q = 4*sqrt(3)*A / sum(l^2)
so Q=1 is equilateral; the gate is unchanged (SHAPE_QUALITY_FLOOR 0.05 -> 0.10
to match the doubled scale). Docstring corrected.

W1 (numeric_inverse.py): the log-consumption solve needs u' > 0 and m > 0. Add
a log_well_defined gate (m>0, u'(c_lower)>0, u'(c_upper)>0, finite log m); when
it fails (bliss point, marginal touching zero, non-finite branch) return NaN so
the kernel's NaN diagnostics name the (regime, period) instead of the log path
silently returning a clamped bound. Byte-identical for valid CRRA inputs; two
fail-loud tests added.

A.5 (Q_and_F.py): document the joint-unresolved-lottery timing assumption on
the continuation operator — g_inv applies once over the full joint regime+shock
lottery; staged resolution (regime revealed before continuation risk, or
shock-dependent regime probs) would need an explicit operator-scope option.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Generalize the keeper's injected durable transition from a hard-coded identity
to the regime's outer_no_adjustment_candidate function: the keeper holds the
durable at next_<durable> = keep(<durable>) instead of always = <durable>. With
no candidate (None) the auto-identity is kept, so the default path is unchanged;
a depreciating keep(d) = d(1-delta) lands the kept stock off the durable grid
and the inner DC-EGM's passive read blends it over the grid's nodes.

Byte-identical at delta=0: keep_housing returns the stock unchanged, so the
DS-2024 and App.2 housing keeper solves are unaffected (3 + 12 build/parity
tests green). Enables the DS-2024 delta=0.10 keeper (next).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Thread `delta` into `build_model` and bake it into the keeper's no-adjustment
map via `_make_keep_housing(delta)`: `keep_housing(h) = h(1 - delta)`. With the
keeper-durable wiring (outer_no_adjustment_candidate), the NEGM keeper holds the
house at the depreciated level for free and the inner DC-EGM's passive read
blends the continuation off the housing grid, so the model is faithful at any
delta. At delta=0 the map returns the stock unchanged (h * 1.0 == h), so the
keeper solves are byte-identical and the 3 build/parity tests stay green.

The brute grid-search twin remains a valid oracle only at delta=0 (it searches
next_housing on the grid, where the free-keep level h(1-delta) lands only when
delta=0); the delta>0 keeper is validated against a dense host VFI oracle (next).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oracle

Add a dense host VFI oracle (`tests/solution/_ds2024_housing_vfi_oracle.py`) that
mirrors the NEGM nest and searches the free-keep level `h(1-delta)` as an explicit
candidate alongside the outer house grid. At delta>0 that level is off the house
grid, so the on-grid brute twin cannot represent it — the oracle is the only valid
reference for the depreciating keeper.

Two interior tests:
- the oracle reproduces the brute solve at delta=0 (pins its economics);
- the NEGM solve at delta=0.10 reproduces the oracle (mean interior |dV| ~0.08,
  the same grid-resolution agreement as the delta=0 brute pair) — the keeper's
  depreciated hold is faithful.

Exclude the oracle helper from the test-naming hook, matching the other
tests/solution/_*_oracle.py helpers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…durable

A reviewer flagged that `_build_coh_shift_function` lifts adjusters using the
identity keeper reference (`next = durable`) rather than the keeper core's
no-adjustment level `keep(durable) = durable(1-delta)`, which the outer-envelope
design doc says should be the credited-cost reference. Switching to `keep(durable)`
in isolation is the design-exact shift, but a differential VFI-oracle test shows it
*regresses* DS-2024 delta=0.10: the corrected lift places each adjuster at a coh the
fixed keeper-grid envelope reads by extrapolation, and the one-island-per-adjuster
splice over-counts, moving the converging ~0.08 interior agreement to a plateaued
~0.18. The identity reference and the island splice are tuned together; the shipped
solver converges (mean 0.0825 -> 0.0759 as n_grid 10 -> 18). Correcting both belongs
with the segment-topology outer-envelope redesign, so document the coupling here
rather than ship a regression.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Australia-2015-16 capital-income tax brackets are not printed in the DS-2026
paper text (only Figure 9); they are transcribed from the co-authors' replication
repo. Record the exact provenance so upstream drift is detectable: GitHub
akshayshanker/FUES, examples/housing_renting/config_HR/STD_RES_SETTINGS_4_TAXES/
master.yml, pinned at commit 00961e0b (blob 41661779), and a sha256[:16] checksum
(4e8d1bf0748f6933) of the resolved (LOWER, OFFSET, RATE) arrays. All 10 brackets
verified to match the source tax_table.brackets exactly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The GPU peak-memory tracker spawns `python -m benchmarks.asv._gpu_mem` with
cwd=_PROJECT_ROOT, but _PROJECT_ROOT resolved to the benchmarks/ directory, not
the repo root, so the subprocess could not import `benchmarks` and every
GpuPeakMem benchmark failed with `ModuleNotFoundError: No module named
'benchmarks'` (the whole benchmark-pr workflow red). The file moved one level
deeper in 7dc5887 (benchmarks/ -> benchmarks/asv/) but the `.parent.parent` chain
was not updated. Point it at the true repo root (three parents up). Verified
locally: `python -m benchmarks.asv._gpu_mem` resolves from the corrected cwd and
fails only on missing argv, no longer on import.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
W1 made `inverse_marginal_utility` optional: when a regime omits it, EGM derives
the inverse numerically (`numeric_inverse.py`; validation.py:298 documents this,
and tests/solution/test_iegm_wiring.py covers a model solving without it). So
removing that function is no longer a contract violation, and the negative case
in test_dcegm_contract_violation_raises stopped raising — failing `main`
deterministically on every platform. Drop the now-contradictory case; the iEGM
path's positive coverage already lives in test_iegm_wiring.py.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The DC-EGM solve/oracle battery added on this branch makes the serial suite
~86 min on CI runners. Bake `-n 4` into the tests / tests-32bit / tests-with-cov
tasks so every CI job (and local `pixi run tests`) distributes across 4 workers;
`--dist loadfile` (already in addopts) keeps each file on one worker. Set
`XLA_PYTHON_CLIENT_PREALLOCATE=false` on the tasks — a no-op on CPU, but required
on GPU so the 4 worker processes don't each preallocate ~75% of the device and
OOM; they grow to their actual small per-test need, and loadfile keeps the one
memory-heavy file (Mahler-Yum) isolated to a single worker. Verified locally:
4/4 workers spin up and pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The state-action-param bundle threaded through `Q_and_F`, `max_Q_over_a`,
and their `_transform_and_average` / `_invert_joint` helpers carries every
flat param, including lookup-table params represented as `MappingLeaf` /
`SequenceLeaf` (tax schedules, pension accrual tables). The bundle was
annotated `FloatND | IntND | BoolND`, which excludes those leaf types, so
the package beartype claw rejected a `MappingLeaf` param whenever its O(1)
sampling happened to draw that key — an intermittent failure independent of
the model's correctness.

Widen the bundle annotation to the canonical `_ParamsLeaf` union. The two
concrete taste-shock extractions (`taste_shocks__scale`, `taste_shock_key`)
are narrowed back at their call sites with `cast`, since `logsum_and_softmax`
and `draw_taste_shock_noise` require the scalar/array types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iles)

The single self-hosted GPU runner is starved of host CPU/memory when four
xdist workers each AOT-compile the DC-EGM oracle suite in parallel, and the
runner loses communication with the server (~60 min, then death) instead of
finishing the suite in roughly a quarter of that serially. Append `-n0` to the
two GPU steps so pytest-xdist's last `-n` wins and the suite runs in-process.
The CPU matrix keeps `-n 4`, where the four workers are a clear speedup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The macOS runner has 3 vCPUs and 7 GB of RAM; four xdist workers each
AOT-compiling the DC-EGM solve/simulate/oracle tests peak past its memory and
swap-thrash the suite to ~4 hours. A `pytest_collection_modifyitems` hook marks
that battery `slow` (all of `tests/solution/`, plus the DC-EGM-family modules
elsewhere), and the macOS matrix leg runs `-m "not slow"`. Linux and the GPU
runner still run the full suite, so the platform-independent kernel stays
covered; macOS keeps the lighter engine/regime/grid/simulation tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The committed `pixi.lock` is written in the format pixi 0.71.x emits; pixi
0.70.1 cannot parse it and aborts every job at install with
`expected a string, found table`. Pin the `setup-pixi` steps to the version
that produced the lock so the frozen install succeeds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`test_simulate_using_raw_inputs` calls `simulate` directly, bypassing the
public `Model.simulate` canonicalization, so its `flat_params` must already
carry canonical dtypes. `jnp.asarray(0)` is `int64` under x64, but the
params-leaf contract is `int32`; `_invert_joint`'s beartype check rejected it
whenever its O(1) sampling drew that key. Pin the leaf to `int32` — the dtype
`cast_params_to_canonical_dtypes` produces on the public path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant