Add the DC-EGM solver (discrete-continuous endogenous grid method)#390
Draft
hmgaudecker wants to merge 186 commits into
Draft
Add the DC-EGM solver (discrete-continuous endogenous grid method)#390hmgaudecker wants to merge 186 commits into
hmgaudecker wants to merge 186 commits into
Conversation
Route the existing brute-force grid search through a per-regime solver configuration and a builder registry, with no change to the numerics. - `Regime.solver: BruteForce | DCEGM` (default `BruteForce()`), exported from `lcm` alongside the `DCEGM` configuration class. - `_lcm.solution.registry`: `SolverBuildContext`, `SolverKernels`, the `SolverKernelBuilder` protocol, and `_build_brute_force_kernels` (the former `_build_max_Q_over_a_per_period`), dispatched on `type(regime.solver)` via `SOLVER_KERNEL_BUILDERS`. - `DCEGM` is published as the final configuration surface (fields + field validation), but its engine is not yet wired in; a regime requesting it is rejected at model build with `NotImplementedError`. Behavior-preserving: the full test suite passes unchanged and an explicit `BruteForce()` yields the same value function as the default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Points the benchmark feature's aca-model at feat/dcegm-solver's tip — main's audit fixes + the DC-EGM solver + smooth-share eligibility — the version the solver-seam work is developed against. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the `type(solver)`-keyed builder registry with a polymorphic `Solver` ABC. The engine now calls `solver.validate(context)` then `solver.build_period_kernels(context)` — no `SOLVER_KERNEL_BUILDERS` dict, no `BruteForce | DCEGM` union, no standalone DC-EGM guard. - `_lcm/solution/contract.py` (new): the `Solver` ABC (abstract `build_period_kernels`, default no-op `validate`), `SolverBuildContext`, and `SolverKernels`. An engine leaf — imports nothing that reaches `lcm.solvers`, so the façade can re-export it without an import cycle. - `_lcm/solution/solvers.py` (new): `GridSearch(Solver)` (the relocated grid-search builder, with function-local `jax`/`get_max_Q_over_a` imports) and `DCEGM(Solver)` (the published config; `validate` raises the not-yet-available guard, so a regime requesting it is rejected at model build). - `lcm/solvers.py` → thin re-export façade; `registry.py` deleted; the `processing` dispatch and `Regime.solver` field updated. - Rename the default solver `BruteForce` → `GridSearch` (more descriptive; alpha permits the break). Faithful to dcegm-solver-seam-abc-design.md (ABC, not Protocol). Layer 2 (generic KernelResult) does not apply here — the stub seam has no EGM fork. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Benchmark comparison (main → HEAD)Comparing
|
e0fecdd to
44e8522
Compare
The full discrete-continuous endogenous grid method, re-rooted as a single commit on top of the solver-selection seam (PR #388). The tree reproduces the reviewed feat/dcegm tip exactly, minus two root-level audit scratch files. On top of the seam this adds the `_lcm.egm` engine, DC-EGM build-time validation, the `_build_dcegm_kernels` builder and the EGM-specific SolverBuildContext / SolverKernels / SolutionPhase fields, and replaces the seam's build-time `NotImplementedError` guard with the real solver. To be split into reviewable sub-PRs at review time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The child stochastic-node expectation built the full node-stacked `node_values`/`node_marginals` arrays and reduced them with a weighted sum downstream of the map. So `stochastic_node_batch_size` (a `lax.map`) shed only the per-node gather working-set while still materialising the full `(..., n_nodes)` output — half a fix, and at large scale the residual stack can bind and even raise peak. Fold the weighted sum into a `lax.scan` carry instead: each block reads only its nodes and accumulates the running expectation, so neither the per-node gather nor the full node-stacked output is materialised. `0` (or a size >= the mesh) keeps the single fused vmap + reduction. The block reduction reorders the floating-point adds, so the value function matches the fused solve to tight numerical tolerance rather than bit-identically; the node-batch invariance test asserts `allclose` at rtol/atol 1e-9. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- New `docs/credits.md` (linked into the docs nav and the README), crediting the methods pylcm builds on (Carroll EGM; Iskhakov–Jørgensen–Rust–Schjerning DC-EGM; Dobrescu–Shanker FUES; Fella generalized EGM; the Druedahl / Druedahl–Jørgensen / Dobrescu–Shanker multidimensional-EGM methods behind the reserved upper-envelope backend), the discretization methods (Tauchen, Rouwenhorst, Kopecky–Suen, Fella–Gallipoli–Pan), the replicated example models shipped in pylcm (IJRS 2017, Mahler–Yum 2024), QuantEcon, and the open-source ecosystem (OSE, NumEconCopenhagen, akshayshanker, JeppeDruedahl, fediskhakov). - Add the missing BibTeX entries (Carroll 2006, Fella 2014, Dobrescu–Shanker FUES 2022 and RFC 2024, Druedahl 2021, Druedahl–Jørgensen 2017). - Fix the Mahler & Yum (2024) page range in references.bib: 1697–1733 (Econometrica 92(5)), not 1307–1343. Every citation verified against the journal/DOI or the authors' repositories. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pylcm's Fast Upper-Envelope Scan was adapted from the `OpenSourceEconomics/upper-envelope` package (Apache-2.0, © The Upper-Envelope Authors) and since substantially modified. Both projects are Apache-2.0, so the derivation is permitted; this records the upstream attribution and the modification notice (Apache-2.0 §4(b)/(c)) in the `fues.py` header and the Credits page, which previously credited only the method's paper. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pylcm's Fast Upper-Envelope Scan is adapted from OpenSourceEconomics/upper-envelope (Apache-2.0). Add a root `NOTICE` recording that derivation and the upstream copyright, and vendor the upstream license text verbatim at `licenses/upper-envelope-LICENSE`. The in-source attribution in `fues.py` (which ships in the wheel via `src/`) already carries the modification notice; these files add the conventional repo/sdist-level attribution. No other part of the codebase derives from third-party source — the DC-EGM kernel and the NEGM Phase-0 toys are written against pylcm's own API — so only the upper-envelope derivation is recorded. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the pure spec-introspection helpers (discrete/passive/process state names, child state name, child discrete actions, child resources function and its arg names, regime-function concatenation) out of step.py into egm/regime_introspection.py. These read only a regime's static spec and hold no kernel state, so they form the dependency leaf that the kernel build, the continuation subsystem, and the kernel-scope checks all import from — breaking the otherwise-circular dependency between step.py and the scope checks that move next. Behavior-preserving pure move. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the four `_find_unsupported_*` functions — the build-time checks that name features outside the kernel's current scope — out of step.py into egm/kernel_scope.py. They read the processed kernel-build context (period carry targets, qualified flat params, the regime's VInterpolationInfo), which the model-time `validation` module does not have, so they belong with the kernel build rather than with the model-construction validators. step.py keeps calling `_find_unsupported_feature` to install the raising step. Behavior- preserving pure move. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the multi-target continuation machinery — target splitting, the per-child carry reader, per-row resources queries and gradients, the stochastic-node expectation, passive-state blending, discrete-choice aggregation (hard max or EV1 logsum), and the child-read build — out of step.py into egm/continuation.py. The core EGM step and the asset-row step keep calling `_get_child_carry_reader` / `_build_child_reads` / `get_egm_continuation_targets`, so step.py shrinks by ~870 lines and the textbook Euler inversion no longer sits in the same file as the multi-target, passive-state, taste-shock, and stochastic-node logic. Behavior-preserving pure move; the bound continuation contract is a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the certifiable core algorithm — _EgmKernelPieces, _get_solve_one_combo, _compute_nodes_over_savings, _get_compute_node, _compute_constrained_candidates, _publish_V_and_carry_rows, and the constrained-offset constant — out of step.py into egm/step_core.py. It reads the expected continuation through continuation's per-target carry reader and depends on nothing in asset-row mode or the kernel orchestration. step.py keeps the orchestration (build, dispatch, combo map) and the asset-row path (extracted next). The one test importing _publish_V_and_carry_rows from step is repointed to step_core. Behavior- preserving pure move. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the per-Euler-node solve — _get_solve_one_combo_asset_rows, its differentiated continuation closure _get_expected_continuation_value, and the scalar-query publisher _publish_node_V_and_policy — out of step.py into egm/asset_row.py. It maps step_core's single-post-state pipeline over the Euler grid and consumes continuation's per-target reader; step.py keeps only the orchestration (build, dispatch, combo map, feasibility, raising step). This completes the split: the textbook core (step_core) and the asset-row extension (asset_row) are now separate, certifiable modules. Behavior-preserving pure move; the streaming refine-to-query feature lands here next. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collapse the multi-target continuation aggregation into `ContinuationPlan` (build-time statics) plus `bind_continuation`, which binds the plan to one combo pool and the next period's carries and returns a `savings -> (expected value, expected marginal)` callable. The textbook step (`_get_compute_node`) and the asset-row step (`_get_expected_continuation_value`) read only that bound callable instead of each re-deriving the regime-transition probabilities, child carry readers, and per-target aggregation. Behavior-preserving: the bound callable performs the identical aggregation, and it evaluates the probabilities and child reads from the combo pool *inside* the builder, so the asset-row Euler-slot gradient still carries the direct dP/da . EV terms (precomputing them outside the differentiated closure would silently drop them). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…the GridSearch rename. The re-rooted `SolverBuildContext` / `SolverKernels` are beartyped by the package claw, so under PEP 649 their field annotations resolve to real objects when a context is constructed at model build. `UserRegime` and `VInterpolationInfo` (whose module imports `lcm.regime`) close an import cycle through the `lcm.solvers` façade, so each is referenced via a two-form alias — precise for ty under `TYPE_CHECKING`, a bare container for the beartype claw at runtime. The other engine types import normally. The `BruteForce` → `GridSearch` rename had not reached the DC-EGM oracle test modules; propagate it, including the validation-message `match=` string. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`test_model_with_dcegm_solver_builds` asserts that selecting the DC-EGM solver builds the model rather than being rejected, but it placed the solver on the stock iskhakov `working_life` regime, which lacks the resources, post-decision, and inverse-marginal-utility functions the DC-EGM contract requires — so validation correctly rejected it. Point the test at `build_dcegm_model`, whose regimes satisfy the contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r 2) Remove the residual solver-type fork in the backward-induction loop. The loop no longer branches `if regime.solution.egm_step is not None`; every regime — grid search or DC-EGM — exposes one uniform per-period adapter that wraps its shared jitted core, calls it with the solver's own argument layout, and assembles a `KernelResult(V_arr, carry, sim_policy)` outside JIT. The only branches left in the loop are on the optional generic outputs (`result.carry is not None` / `result.sim_policy is not None`), never on solver type. Contract (`_lcm/solution/contract.py`): - Add `KernelResult` (frozen dataclass) and `PeriodKernel` (runtime_checkable structural Protocol exposing `core`, `build_lower_args`, `with_fixed_params`, `__call__`). - Reshape `SolverKernels` → `SolutionKernels(period_kernels, continuation_template)`, dropping the `max_Q_over_a` / `egm_step` / `egm_carry_template` / `egm_reachable_targets` quartet. - Name the continuation channel solver-agnostically via `ContinuationPayload` (= `EGMCarry` today). Solvers (`_lcm/solution/solvers.py`): - `GridSearch`/`DCEGM.build_period_kernels` build per-period adapter closures (`_GridSearchPeriodKernel` / `_DCEGMPeriodKernel`) over the existing jitted core; the EGM core's reachable-carry filter and param projection move onto the DC-EGM adapter. Engine wiring: - `SolutionPhase` carries `period_kernels` + `continuation_template` (plus a `solves_via_dcegm` property); the cyclic `PeriodKernel` annotation uses the TYPE_CHECKING-precise / runtime-widened two-definition pattern so the beartype claw resolves the dataclass field at construction. - The terminal-continuation publisher is composed engine-side (`_TerminalCarryPeriodKernel` in regime_building/processing.py) as an output decorator, not inside `GridSearch`. - `model_processing._partial_fixed_params_into_regimes` binds fixed params via each adapter's `with_fixed_params`; `simulation/additional_targets.py` reads `solves_via_dcegm`. Compilation reuse and the loop's memory/host-offload/gc discipline are preserved: only the shared core is jitted and identity-deduped (by `id(Q_and_F)` / function identity), no adapter is jitted per period, and the solve loop's device eviction / carry-buffer release / reachable-carry subset logic is untouched (dispatch re-routed only). Add `test_period_kernels_sharing_a_config_reuse_one_compiled_core` asserting the distinct-compiled-core count. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In asset-row mode the per-node solve refines a full NaN-padded upper envelope of static length n_pad and interpolates it at exactly one query (resources_at_node) to publish a scalar (V_node, policy_node), then discards it. Batched over combos x Euler-nodes, that n_pad scratch axis is the binder that OOMs large solves. Fold the single-query interpolation into the upper-envelope scan so the n_pad rows never materialize: - Extract `_interp_between_nodes` in interp.py — the pure two-bracket-node arithmetic plus the Hermite correction, shared by `interp_on_prepared_grid` and the streamed path. The existing interp unit tests pin it (behavior- preserving). This guarantees the streamed value cannot diverge from row-then-interp: only which two nodes differs. - Add `refine_to_bracket` + `QueryBracket` in fues.py — a new scan driver reusing `_inspect_candidate` verbatim, changing only the emission sink: each step's up-to-3 emitted points fold into an O(1) bracket-capture carry (first/second, rolling prev/last, lo = latest emitted with grid <= q, hi = first with grid > q, running counts), no [n_input,3] stack and no [n_pad] rows. Post-scan, the bracket reproduces clip(searchsorted(side="right"), 1, max(n_kept-1, 1)) node-for-node: the side="right" tie-break at a duplicated kink (right copy wins the lower slot), the below-first clamp (first, second), and the at-or-above-last clamp (second-last, last; single-live falls back to the NaN-padded slot like the row path). Geometry only — no utility, borrowing limit, or floor. - Add `publish_node_from_bracket` in asset_row.py — the scalar-bracket counterpart of `_publish_node_V_and_policy`: the shared `_interp_between_nodes` with the value Hermite slope = grad(utility) at the two bracket policies only, plus the constrained floor and the n_kept > n_pad overflow poison, preserved identically. Wire the asset-row node solve to refine_to_bracket -> publish_node_from_bracket and drop the n_pad envelope from that path. Scope: asset-row mode only. Single-post-state mode publishes the refined envelope AS its inter-period carry (queried later at many parent points), so step_core's single-post-state path keeps calling `refine` unchanged. The new equivalence test pins the streamed pair against refine + _publish_node_V_and_policy for the same candidates + query, to fp tolerance: smooth, kinked, multi-crossing, all-dead, single-live, overflow, and query below-first / above-last / exactly on a duplicated kink abscissa. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The backward-induction loop forced a `gc.collect()` every period to return the device pool from a rolled-off continuation carry (a registered pytree CPython frees only on a cyclic collection). A grid-search period rolls no such carry, so the collection bought nothing there while its fixed cost dominated the warm solve of small / many-period models — deteriorating those benchmarks by up to ~25x (the Iskhakov-et-al grid-search warm solve: 1.01s -> 0.35s). Gate the collection on the loop's own per-period carry output (`period_egm_carries`), empty exactly when no carry rolled. The backward- induction loop stays backend-agnostic — it does not fork on solver type or read `solution.solves_via_dcegm`; the clean separation of the solve backends is a separate design task. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Dobrescu-Shanker housing keeper (no-house-trade) regime is a plain 1-D DC-EGM model: liquid assets is the Euler state, consumption the continuous action, and housing a passive continuous state whose service flow `alpha * log(housing)` is read directly by utility. The cornerstone test locks in that the DC-EGM envelope-condition guard admits utility reading a *passive* continuous state (it forbids only the Euler state) by calling `validate_dcegm_regimes` directly on the finalized keeper regimes — no kernel trace, no solve. The full GPU solve is gated off the local box. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase-0 of the NEGM / multidimensional DC-EGM build (general 2-asset capability): - `negm_phase0/brute_feasibility_probe.py`: sweeps a 2-continuous-state (liquid + illiquid) + 2-continuous-action + income-shock lifecycle solved with BruteForce, reporting wall-clock and GPU peak. Finding: brute is single-solve feasible at fine grids (~4-9 s, <600 MB on a V100 up to 2.5 B combos/period) — the "brute breaks at Laibson scale" premise does not hold; NEGM is an accuracy/throughput/generality play, not a feasibility rescue. - `negm_phase0/kinked_toy_oracle.py`: the smallest 2-asset toy carrying the Laibson frictions (credit-card rate kink at a^X=0, withdrawal penalty at Iz=0, Z>=0 floor, u(C+iota*Z)), brute-solved as the V-parity oracle for a future NEGM prototype. - `negm_phase0/negm-phase0-findings.md`: brute-feasibility table, the NEGM nesting derivation with the P1 kink-candidate handling, build-vs-buy, and the gate recommendation (build for the general capability; reframe the motivation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…del. Implements P2 of the Dobrescu–Shanker replication: the NEGM solver as a third `Solver` on the ABC seam, composing an inner 1-D `DCEGM` solve of the consumption-savings problem with an outer deterministic grid search over a durable/illiquid margin. - `NEGM(Solver)` in `_lcm/solution/solvers.py` (re-exported from `lcm.solvers` and `lcm`): frozen dataclass with `inner: DCEGM`, `outer_action`, `outer_post_decision`, `outer_grid`, optional `outer_no_adjustment_candidate`. `__post_init__` guards reject a stochastic outer grid and margins that coincide with the inner ones (`RegimeInitializationError`), mirroring DCEGM's `_fail_if_*` helpers. - `_NEGMPeriodKernel`: non-jitted outer adapter over the shared jitted inner DC-EGM core; sweeps the outer grid plus the per-node no-adjustment candidate, binds the outer post-decision into the regime's flat params per node, and collapses the outer axis by `max`. Static only (ty + prek green); not run. - `validate_negm_regimes` in `_lcm/egm/negm_validation.py`, wired into the same model-build path as `validate_dcegm_regimes`. Fail-loudly contract (`ModelInitializationError`, each message naming the alternative solver): no outer margin → use `DCEGM`; outer margin coincides with the inner margin → reject; coupled-2-Euler (outer post-decision feeds the inner Euler law / savings-stage pool, or a non-additively-separable utility cross-term) → use the 2-D EGM foundation; taste-shocked discrete choice → reject (ordering). - Kinked-toy NEGM model (`tests/test_models/negm_kinked_toy.py`) and the G1 parity test (`tests/solution/test_negm_kinked_toy.py`, skipped — GPU-only). - Unit tests: 6 config-guard cases, 8 contract-validator cases (construct and assert-raise, no solve). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The DS housing-keeper DC-EGM probe gains a brute-force (GridSearch) twin and a
DC-EGM-vs-brute value-parity test, mirroring tests/solution/test_egm_passive.
The brute twin solves the same keeper economics without the Euler/passive
split: housing is a regular fixed continuous state, consumption a grid-searched
action, the liquid-asset law of motion reads consumption directly, and a
borrowing constraint floors post-decision liquid assets at the borrowing limit
(the floor of the DC-EGM savings grid). build_working_regime, build_model, and
build_params gain a variant arg ("dcegm" | "brute").
The parity test asserts DC-EGM >= brute and allclose at the passive-
interpolation tolerance, excluding the lowest liquid-asset nodes where the
coarse consumption grid makes brute itself unreliable. It is skip-marked
(gpu-01 only) since it solves and OOMs the local box.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The splay/batch-size invariance assertions hard-coded float64 tolerances (rtol/atol 1e-12 for the combo axes, 1e-9 for the stochastic-node reduction), which the 32-bit CI job cannot meet: rescheduling the lax.map / reordering the weighted-sum adds shifts results at single-precision scale. Key the tolerance off conftest's X64_ENABLED, matching the established pattern in test_economic_validation / test_grids. Also stop test_egm_interp forcing dtype=jnp.float64 for its interpolation grid; under --precision=32 that raises a float64-truncated-to-float32 UserWarning that CI treats as an error. Use the canonical default float dtype instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The solve module's job is backward induction; name it for that. Renames src/_lcm/solution/solve_brute.py → backward_induction.py and tests/solution/test_solve_brute.py → test_backward_induction.py, updates every import (model.py, simulation/compile.py, test_beartype_claw.py), the two test-function names, and the doc/docstring references. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `envelope_at_query`: the query-side counterpart of the full-row refiners. It evaluates the branch-aware upper envelope directly at a set of query abscissae — a fixed-shape (n_query, n_segment) bracket-and-reduce with no sequential scan and no NaN-padded refined row. For each query the value is the maximum over the live bracketing segments; the policy and marginal are the winner's. A folded branch contributes several bracketing segments, so the maximum is exact for the piecewise-linear correspondence, and a value tie breaks right-continuously (the larger-slope segment) to match the kernel's side="right" read. Topology is explicit: a segment is a consecutive-pair link whose endpoints share a segment_id, so unrelated branches are never bridged. Verified against the host oracle on a clean crossing, a folded branch, a non-bridging branch, and an out-of-range query. This is the backend asset-row mode wants — one query per Euler node, no full envelope to refine — and the machinery the NEGM outer envelope migrates onto so a sub-grid adjustment island is never sampled away. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The nested-EGM outer fold read every adjuster candidate onto the keeper's shared cash-on-hand grid and took the running maximum at those nodes — a sampled maximum that silently drops an adjuster branch winning only on a coh interval strictly between two shared nodes (a thin (S, s) adjustment island). Recover the island with a bounded, fixed-size merge: each fold also reads the running envelope at the candidate's own coh nodes and records that adjuster's single best interior win in its own pre-allocated column, so the published carry grows by the adjuster count (additive), never the grid product. A win must clear a relative noise floor — the shared-grid read is a cubic-Hermite interpolant, so a smooth branch's apparent excess is O(h^2) representation noise, whereas a genuine island beats the envelope by an economically meaningful margin (measured separation ~1000x). 'finalize' splices the peaks in and sorts into ascending coh order. The published carry is therefore n_adjusters wider than the keeper's. That width is an AOT-compilation invariant — the parent period's kernel is lowered against the regime's continuation template — so widen the NEGM regime's template by the same bounded count. Flips the strict-xfail (S,s) island red gate to green; the synthetic build_outer_envelope_carry battery and all NEGM solve tests stay green. The kinked-toy CPU simulate near-tie (a documented cross-backend grid-search argmax difference) is unchanged — the splice is transparent to it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The LTM/MSS topology fix (consume `segment_id`) lets the upper-envelope backends and the host oracle avoid bridging unrelated branches when an explicit topology is supplied. The production single-post-state kernel supplies none, and that is correct, not an omission: the candidate chain it refines is one connected branch (the constrained segment joins the unconstrained Euler branch continuously at the borrowing limit), within-period discrete choices are maximized across combos in the V array rather than stacked into one refine, and a future-kink-induced fold is data-emergent (a decrease in the inverted endogenous grid) rather than known a priori. So there is no explicit multi-branch topology for the kernel to emit; explicit `segment_id` is the oracle/test path. Document the contract at the call site. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…RFC spatial-index) The four 2-D region clouds (ucon/dcon/acon/con) each invert their FOCs assuming their own constraints bind, so a node where the region's inequalities fail is a finite but KKT-inconsistent candidate. These were computed (`valid_region`) but not dropped before the rooftop cut and publish, because doing so regressed the pension RFC solver: masking thins the valid survivors at coarse grids, and the publisher preferred the *largest-area* containing simplex, whose linear interpolant spans a wide arc of the curved value surface and errs at the low-pension corner (p90 0.20 -> 0.246 on `test_ds_pension_rfc_solver`). Root cause was the simplex selection, not the mask: barycentric interpolation error scales with triangle area x surface curvature, and the degeneracy floor already rejects ill-conditioned triangles, so the *smallest* non-degenerate containing simplex is both well-conditioned and most accurate. Prefer it. With that publish, applying the KKT mask now improves accuracy instead of regressing: `test_rfc_two_asset_parity` p90 0.044 -> 0.015, and the pension RFC solver passes (p90 < 0.20). 80 two-asset RFC/G2EGM/pension/publisher/oracle tests stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The segment mesh marked a node valid on finiteness and positive consumption alone, so a finite-but-KKT-inconsistent cloud node (a region's FOCs solved where its inequalities fail) entered the triangulation. AND the region's `valid_region` mask into `valid_node` so only genuine KKT candidates form the mesh. Unlike the RFC interpolating publish, G2EGM's across-segment maximum already dominates these spurious nodes (they carry a suboptimal, lower value), so the mask does not regress: a target a masked node used to cover becomes a hole and falls to the accurate direct-Bellman fill. 22 G2EGM/two-asset parity tests stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
EGM needs (u')^{-1} to recover the action from the Euler equation; a model whose
marginal utility has no closed-form inverse could not be solved. Add a bracketed
bisection that inverts any continuous, strictly decreasing u' numerically.
The load-bearing piece is the gradient: a fixed-iteration bisection returns
correct values but, differentiated naively, wrong or iteration-dependent
gradients. The forward solve is detached (stop_gradient) and the derivative is
supplied by the implicit-function theorem through one corrector step, so
dc*/dm = 1/u''(c*), dc*/dθ = -∂_θ u'(c*) / u''(c*)
exactly, independent of the iteration count. The unrolled where-based loop is
jittable and vmap-safe.
Tests: CRRA value parity to rtol 1e-7; the parameter derivative matched against
the analytic CRRA derivative AND a finite-difference reference (the contract that
guards against the wrong-gradient failure mode); residual convergence on
u'(c)=c^{-2}+e^{-c} (no closed-form inverse); vmap+jit. Kernel auto-selection
(use this when no analytic inverse_marginal_utility is supplied) is the
follow-on wiring.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make 'inverse_marginal_utility' optional for the DCEGM solver: a regime that omits it is solved by the iEGM path, which inverts u' numerically from 'utility'. - Validation: drop 'inverse_marginal_utility' from the required-functions set, and skip the (u')^-1-vs-jax.grad(utility) consistency check when it is absent. - step.py: extract the analytic inverse only when present, else None. - step_core.py: '_EgmKernelPieces.inverse_marginal_utility_func' is now optional; when None, build the inverter from the action-derivative of the combo-bound 'utility_of_action' (jax.grad) over a savings-scaled bracket. The analytic path is unchanged (byte-identical) when the inverse is supplied. End-to-end test: the IJRS CRRA retirement model solved with the analytic inverse removed reproduces the analytic-inverse value function to rtol 1e-6, so the numeric root finder recovers the same (u')^-1 through the full solve. Existing DC-EGM/NEGM analytic-path and validation tests stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Generalize the linear continuation E[V'] over the one-step stochastic lottery to a certainty-equivalent g_inv(E[g(V')]) — the form the Epstein-Zin / power-mean and entropic-risk operators take. A regime opts in by supplying a value_transform (g) / inverse_value_transform (g_inv) pair in its functions; both read continuation_value (the next value array) from the engine plus any params. With neither supplied the operator is the identity and the expectation stays the plain weighted average — byte-identical, default fixtures unchanged. - Q_and_F seam: the stochastic-state average becomes inverse_value_transform(average(value_transform(V'), weights)) when a pair is present; regime mixing stays linear (the documented first implementation). - Build: _build_continuation_operator concatenates the pair from the regime functions (excluding H); supplying only one half is a ModelInitializationError. Their non-continuation_value args join the Q_and_F signature so params reach them at solve time. - Params template: continuation_value is engine-provided, so it is dropped from the discovered params (alongside the existing marginal_continuation handling, refactored into _drop_engine_provided_args). Tests: identity pair reproduces the linear solution byte-for-byte; the entropic CE on a known lottery equals the hand value and lies below the mean (risk penalty); a level-shift operator changes the solved value function (end-to-end wiring + param read); supplying one transform without its inverse is rejected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dness caveat From Pro's adjudication of the roadmap-tail work: - W1 (A.4): the implicit-function corrector is valid only at an interior bracketed root; outside the bracket the detached bisection clamps to a bound and the interior 1/u'' Jacobian would silently extrapolate a wrong gradient. Guard it: the target is bracketed iff u'(c_upper) <= m <= u'(c_lower); the unbracketed branch returns the clamped root detached, so the binding-bound active-set derivative is 0, not the bogus interior slope. New test pins value-at-bound + zero gradient. - RFC publish (A.3): the docstring still claimed largest-area simplex selection while the code selects smallest; correct it, and document that the area floor is not a shape gate (a sliver can pass it) — the robust refinement is a most-local well-conditioned simplex, deferred since the area gate suffices on the models exercised. - W4a (A.5): document the soundness boundary — wrapping only the per-target stochastic average with linear regime mixing equals the joint certainty equivalent only for affine operators or a single reachable target; a non-linear operator (EZ/entropic) over multiple targets is unsound and needs the joint-lottery form (with is_linear capability metadata), not yet implemented. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pro's review flagged that wrapping only the per-target stochastic average and then mixing target regimes linearly is unsound for a non-linear operator: sum_r p_r g_inv(E_z[g(V_rz)]) != g_inv(sum_rz p_r p_z g(V_rz)) unless g is affine or there is one reachable target. Fix it properly (not a guard): split the operator into _transform_and_average (transform + stochastic average per target, no inverse) and _invert_joint (apply g_inv once). The seam now accumulates the regime-mixed transformed continuation sum_r p_r E_z[g(V_rz)] and inverts once, giving the genuine joint certainty equivalent g_inv(sum_rz p_r p_z g(V_rz)) — sound across both regime and shock uncertainty. The diagnostic compute_intermediates path mirrors the seam so its reported E_next_V / Q match the solved value for an operator model. Default (no operator) and affine operators are byte-identical (g_inv outside the linear mix collapses to the old form). New test pins the joint CE against the hand value and shows it differs from the per-target form when the two regimes' continuations differ. _apply_continuation_operator is kept as the single-lottery composition for the unit tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pro: the area floor (DEGENERATE_AREA_FLOOR) rejects collinear triangles by area only, not shape, so an ill-conditioned sliver can pass it and give an unstable affine interpolant. Add a normalized mean-ratio shape gate Q = sqrt(3)*2A / (sum of squared edges) in (0,1] (1=equilateral, ->0 sliver) and select the smallest-area *well-conditioned* containing simplex (Q > SHAPE_QUALITY_FLOOR), falling back to the smallest containing one only when every container is a sliver (coverage over conditioning), then the nearest survivor. Replaces the bare smallest-area rule. 25 RFC/G2EGM/publisher/parity tests stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lity Replace the fixed 60-iteration bisection forward solve with a safeguarded Newton iteration in log-consumption. In log c the CRRA family is exactly linear (log u' = -crra log c), so Newton converges in a single step and stays well-conditioned for any power-law-like u' whose steep-near-zero, flat-at-large-c curvature makes a plain-c Newton overshoot into a slow bisection tail. The bisection fallback (taken when a Newton step leaves the live bracket) preserves robustness. The detached-root + analytic IFT corrector gradient contract and the bracket/active-set guard are unchanged. A 15-iteration convergence test on the wide [1e-8, 1e6] bracket — where 15 bisection steps cannot reach the root — pins the solve as Newton-driven. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Thread a scan_unroll factor through refine_envelope and refine_to_bracket
into both FUES jax.lax.scan call sites, exposed as DCEGM.fues_scan_unroll
(validated >= 1) and wired through the get_upper_envelope / get_bracket_finder
dispatch. The FUES scan is sequential and latency-bound on accelerators;
unrolling trades compile time for fewer loop-carry round trips. Pure
performance knob — the refined envelope is bit-identical across unroll values
(numerical-invariance test at unroll in {2,4,8}); config rejects unroll < 1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The query-side backend's dense (n_query, n_segment) bracket-and-reduce is
itself the memory wall at scale. Add an opt-in segment_block_size that swaps it
for a two-pass blocked scan over segment blocks, peaking at (n_query, block):
- pass 1 accumulates the running per-query max (the envelope value);
- pass 2 re-scans, keeping the global max-slope winner among segments within
the tie tolerance of that fixed envelope value (the right-continuous
tie-break).
Both passes are exact associative folds against a fixed target, so the result
matches the dense path up to floating-point reassociation between the two XLA
lowerings (parity test at block sizes {1,2,3,4}, rtol 1e-12). Default
(segment_block_size=0) keeps the dense path unchanged. Refactored the shared
link construction into a _SegmentLinks struct. The memory/GPU payoff is
gpu-01-validated; this lands the correct, opt-in mechanism.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ming doc A.3 (rfc_2d.py): the shape-quality formula returned half the standard mean ratio (0.5 at equilateral, not 1). Use the standard Q = 4*sqrt(3)*A / sum(l^2) so Q=1 is equilateral; the gate is unchanged (SHAPE_QUALITY_FLOOR 0.05 -> 0.10 to match the doubled scale). Docstring corrected. W1 (numeric_inverse.py): the log-consumption solve needs u' > 0 and m > 0. Add a log_well_defined gate (m>0, u'(c_lower)>0, u'(c_upper)>0, finite log m); when it fails (bliss point, marginal touching zero, non-finite branch) return NaN so the kernel's NaN diagnostics name the (regime, period) instead of the log path silently returning a clamped bound. Byte-identical for valid CRRA inputs; two fail-loud tests added. A.5 (Q_and_F.py): document the joint-unresolved-lottery timing assumption on the continuation operator — g_inv applies once over the full joint regime+shock lottery; staged resolution (regime revealed before continuation risk, or shock-dependent regime probs) would need an explicit operator-scope option. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Generalize the keeper's injected durable transition from a hard-coded identity to the regime's outer_no_adjustment_candidate function: the keeper holds the durable at next_<durable> = keep(<durable>) instead of always = <durable>. With no candidate (None) the auto-identity is kept, so the default path is unchanged; a depreciating keep(d) = d(1-delta) lands the kept stock off the durable grid and the inner DC-EGM's passive read blends it over the grid's nodes. Byte-identical at delta=0: keep_housing returns the stock unchanged, so the DS-2024 and App.2 housing keeper solves are unaffected (3 + 12 build/parity tests green). Enables the DS-2024 delta=0.10 keeper (next). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Thread `delta` into `build_model` and bake it into the keeper's no-adjustment map via `_make_keep_housing(delta)`: `keep_housing(h) = h(1 - delta)`. With the keeper-durable wiring (outer_no_adjustment_candidate), the NEGM keeper holds the house at the depreciated level for free and the inner DC-EGM's passive read blends the continuation off the housing grid, so the model is faithful at any delta. At delta=0 the map returns the stock unchanged (h * 1.0 == h), so the keeper solves are byte-identical and the 3 build/parity tests stay green. The brute grid-search twin remains a valid oracle only at delta=0 (it searches next_housing on the grid, where the free-keep level h(1-delta) lands only when delta=0); the delta>0 keeper is validated against a dense host VFI oracle (next). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oracle Add a dense host VFI oracle (`tests/solution/_ds2024_housing_vfi_oracle.py`) that mirrors the NEGM nest and searches the free-keep level `h(1-delta)` as an explicit candidate alongside the outer house grid. At delta>0 that level is off the house grid, so the on-grid brute twin cannot represent it — the oracle is the only valid reference for the depreciating keeper. Two interior tests: - the oracle reproduces the brute solve at delta=0 (pins its economics); - the NEGM solve at delta=0.10 reproduces the oracle (mean interior |dV| ~0.08, the same grid-resolution agreement as the delta=0 brute pair) — the keeper's depreciated hold is faithful. Exclude the oracle helper from the test-naming hook, matching the other tests/solution/_*_oracle.py helpers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…durable A reviewer flagged that `_build_coh_shift_function` lifts adjusters using the identity keeper reference (`next = durable`) rather than the keeper core's no-adjustment level `keep(durable) = durable(1-delta)`, which the outer-envelope design doc says should be the credited-cost reference. Switching to `keep(durable)` in isolation is the design-exact shift, but a differential VFI-oracle test shows it *regresses* DS-2024 delta=0.10: the corrected lift places each adjuster at a coh the fixed keeper-grid envelope reads by extrapolation, and the one-island-per-adjuster splice over-counts, moving the converging ~0.08 interior agreement to a plateaued ~0.18. The identity reference and the island splice are tuned together; the shipped solver converges (mean 0.0825 -> 0.0759 as n_grid 10 -> 18). Correcting both belongs with the segment-topology outer-envelope redesign, so document the coupling here rather than ship a regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Australia-2015-16 capital-income tax brackets are not printed in the DS-2026 paper text (only Figure 9); they are transcribed from the co-authors' replication repo. Record the exact provenance so upstream drift is detectable: GitHub akshayshanker/FUES, examples/housing_renting/config_HR/STD_RES_SETTINGS_4_TAXES/ master.yml, pinned at commit 00961e0b (blob 41661779), and a sha256[:16] checksum (4e8d1bf0748f6933) of the resolved (LOWER, OFFSET, RATE) arrays. All 10 brackets verified to match the source tax_table.brackets exactly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The GPU peak-memory tracker spawns `python -m benchmarks.asv._gpu_mem` with cwd=_PROJECT_ROOT, but _PROJECT_ROOT resolved to the benchmarks/ directory, not the repo root, so the subprocess could not import `benchmarks` and every GpuPeakMem benchmark failed with `ModuleNotFoundError: No module named 'benchmarks'` (the whole benchmark-pr workflow red). The file moved one level deeper in 7dc5887 (benchmarks/ -> benchmarks/asv/) but the `.parent.parent` chain was not updated. Point it at the true repo root (three parents up). Verified locally: `python -m benchmarks.asv._gpu_mem` resolves from the corrected cwd and fails only on missing argv, no longer on import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
W1 made `inverse_marginal_utility` optional: when a regime omits it, EGM derives the inverse numerically (`numeric_inverse.py`; validation.py:298 documents this, and tests/solution/test_iegm_wiring.py covers a model solving without it). So removing that function is no longer a contract violation, and the negative case in test_dcegm_contract_violation_raises stopped raising — failing `main` deterministically on every platform. Drop the now-contradictory case; the iEGM path's positive coverage already lives in test_iegm_wiring.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The DC-EGM solve/oracle battery added on this branch makes the serial suite ~86 min on CI runners. Bake `-n 4` into the tests / tests-32bit / tests-with-cov tasks so every CI job (and local `pixi run tests`) distributes across 4 workers; `--dist loadfile` (already in addopts) keeps each file on one worker. Set `XLA_PYTHON_CLIENT_PREALLOCATE=false` on the tasks — a no-op on CPU, but required on GPU so the 4 worker processes don't each preallocate ~75% of the device and OOM; they grow to their actual small per-test need, and loadfile keeps the one memory-heavy file (Mahler-Yum) isolated to a single worker. Verified locally: 4/4 workers spin up and pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The state-action-param bundle threaded through `Q_and_F`, `max_Q_over_a`, and their `_transform_and_average` / `_invert_joint` helpers carries every flat param, including lookup-table params represented as `MappingLeaf` / `SequenceLeaf` (tax schedules, pension accrual tables). The bundle was annotated `FloatND | IntND | BoolND`, which excludes those leaf types, so the package beartype claw rejected a `MappingLeaf` param whenever its O(1) sampling happened to draw that key — an intermittent failure independent of the model's correctness. Widen the bundle annotation to the canonical `_ParamsLeaf` union. The two concrete taste-shock extractions (`taste_shocks__scale`, `taste_shock_key`) are narrowed back at their call sites with `cast`, since `logsum_and_softmax` and `draw_taste_shock_noise` require the scalar/array types. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iles) The single self-hosted GPU runner is starved of host CPU/memory when four xdist workers each AOT-compile the DC-EGM oracle suite in parallel, and the runner loses communication with the server (~60 min, then death) instead of finishing the suite in roughly a quarter of that serially. Append `-n0` to the two GPU steps so pytest-xdist's last `-n` wins and the suite runs in-process. The CPU matrix keeps `-n 4`, where the four workers are a clear speedup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The macOS runner has 3 vCPUs and 7 GB of RAM; four xdist workers each AOT-compiling the DC-EGM solve/simulate/oracle tests peak past its memory and swap-thrash the suite to ~4 hours. A `pytest_collection_modifyitems` hook marks that battery `slow` (all of `tests/solution/`, plus the DC-EGM-family modules elsewhere), and the macOS matrix leg runs `-m "not slow"`. Linux and the GPU runner still run the full suite, so the platform-independent kernel stays covered; macOS keeps the lighter engine/regime/grid/simulation tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The committed `pixi.lock` is written in the format pixi 0.71.x emits; pixi 0.70.1 cannot parse it and aborts every job at install with `expected a string, found table`. Pin the `setup-pixi` steps to the version that produced the lock so the frozen install succeeds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`test_simulate_using_raw_inputs` calls `simulate` directly, bypassing the public `Model.simulate` canonicalization, so its `flat_params` must already carry canonical dtypes. `jnp.asarray(0)` is `int64` under x64, but the params-leaf contract is `int32`; `_invert_joint`'s beartype check rejected it whenever its O(1) sampling drew that key. Pin the leaf to `int32` — the dtype `cast_params_to_canonical_dtypes` produces on the public path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the DC-EGM solver (discrete-continuous endogenous grid method) behind the
SolverABC seam. It inverts the Euler equation on an exogenous end-of-period gridinstead of grid-searching the continuous action, with a FUES upper envelope to drop
dominated candidates. Selected per regime via
solver=DCEGM(...);GridSearch(thedefault) is untouched, and the engine dispatches polymorphically — no fork on solver
type.
Commit structure (reviewable in steps)
envelope, the closed-form constrained segment, multi-target continuation, and
asset-row mode; wired through the ABC
DCEGM.validate/build_period_kernels.lax.scanblocks — a memory knob for the childstochastic-node mesh.
an acknowledgments page.
step.py(5 pure-move commits) —regime_introspection→kernel_scope→continuation→step_core→asset_row, taking the monolith from~2900 to ~800 lines so the simple single-post-state path reads cleanly apart from the
asset-row and continuation machinery.
savings → (expected_value, expected_marginal)callable (ContinuationPlan).package claw, so cyclic field types use the two-definition alias pattern (precise for
ty, bare container at runtime); plus a valid-DCEGM seam build test.
PeriodKernel/KernelResultadapter; the solve loopcalls one kernel and branches only on optional generic outputs, not on solver type.
FUES scan (
refine_to_bracket+publish_node_from_bracket), removing then_padintra-node envelope scratch. Single-post-state still publishes the full refined
envelope as its carry, unchanged.
Solvers and accuracy work built on the DC-EGM seam
On top of the kernel, this branch adds the prime-time solvers and the Dobrescu–Shanker
benchmark reproductions:
(
DCEGM(upper_envelope=...)), plus the 2-D RFC/G2EGM two-asset foundation for theDS-2024 pension comparison.
EGM-FUES, discrete-housing-with-tax); full paper-scale matrix run on gpu-01.
tests/solution/_branch_aware_vfi_oracle.py): pinsthe App.2 (S,s) EGM-vs-VFI gap to a comparator-ordering term that is identically zero
on the scored interior, correcting the earlier attribution; the tolerated interior gap
is DC-EGM durable-corner discretization. Standard-mode oracle == brute to 8e-5.
ds2024_housing*.py,benchmarks/ds_replication/ds2024_housing_comparison.py): NEGM (nested continuoushousing) vs RFC/FUES (discrete-housing DC-EGM, the source's per-housing-column 1-D
rooftop cut). Reproduces the paper headline — RFC runtime < NEGM runtime — against a
grid-search VFI oracle (faithful at
delta = 0; the paper'sdelta = 0.10keeperdepreciation awaits a shared housing-axis carry extension).
Validation
-n 4, on a free GPU): 1405 passed, 15skipped, 2 xfailed. Covers the DC-EGM-vs-brute value-function-parity battery, the
19-case refine-to-query unit equivalence (streamed ≡ row-then-interp), and the Layer-2
distinct-compiled-cores regression.
tyandprekclean.Pending (does not block reviewing the engine)
delta = 0.10fidelity: the keeper depreciates the held stock offthe housing grid; the strict-identity NEGM keeper needs a housing-axis carry extension.
n_padruntime win is only visible atACA scale — the in-suite oracles are too small to exercise it; handed to the aca side.
experiment this work unlocks, not settled here.
🤖 Generated with Claude Code