OpenSourceEconomics
diff --git a/‎.ai-instructions‎ b/‎.ai-instructions‎
diff --git a/‎AGENTS.md‎
Lines changed: 144 additions & 5 deletions b/‎AGENTS.md‎
Lines changed: 144 additions & 5 deletions
diff --git a/‎benchmarks/bench_aca_baseline.py‎
Lines changed: 18 additions & 2 deletions b/‎benchmarks/bench_aca_baseline.py‎
Lines changed: 18 additions & 2 deletions
diff --git a/‎pixi.lock‎
Lines changed: 4 additions & 4 deletions b/‎pixi.lock‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 19 additions & 2 deletions b/‎pyproject.toml‎
Lines changed: 19 additions & 2 deletions
diff --git a/‎src/lcm/ages.py‎
Lines changed: 8 additions & 6 deletions b/‎src/lcm/ages.py‎
Lines changed: 8 additions & 6 deletions
@@ -294,6 +294,150 @@ initial_conditions = {
 - `model.n_periods` - Number of periods in the model (derived from `ages`)
 - `model.regime_names_to_ids` - Immutable mapping from regime names to integer indices
 
+## Testing
+
+### Test-Driven Development — always
+
+**Always write the test first, watch it fail, then implement.** No exceptions for new
+behavior or bug fixes. Tests are not an afterthought, they are the spec.
+
+The cycle:
+
+1. **Red.** Write a failing test that asserts the desired behavior in user-facing terms.
+   Run it. Confirm it fails for the *right* reason (the missing behavior — not a typo,
+   not an import error).
+1. **Green.** Write the smallest amount of code that makes the test pass.
+1. **Refactor.** Clean up while keeping the test green.
+
+Apply per case:
+
+- **New feature** → red-green-refactor.
+- **Bug fix** → reproduce as a failing test before writing the fix. The test then
+  prevents regression.
+- **Refactor (no behavior change)** → existing tests are the spec. Keep them green
+  before, during, and after. No new test needed if behavior is unchanged; if you find a
+  behavior gap, fill it with a new test *before* refactoring.
+
+### Test docstrings — describe behavior, not history
+
+Test docstrings state what *should* be true, in user-facing terms. Pretend the reader
+has never seen the PR. They should not need to.
+
+```python
+# Good — behavior, in plain language
+def test_simulate_with_chained_transitions_yields_expected_next_wealth():
+    """`next_wealth_t = wealth_t - c_t + 0.1 * next_aime_t` holds in simulation."""
+
+
+# Bad — rehearses the prior bug or implementation history
+def test_solve_resolves_chain_via_dags():
+    """Before the fix, `_resolve_fixed_params` raised
+    `InvalidParamsError: Missing required parameter: ...` because
+    `create_regime_params_template` classified ..."""
+```
+
+Rule of thumb: **would the docstring still make sense in 9 months without the PR
+context?** If not, rewrite it.
+
+### Concrete-value assertions
+
+Assert *what* the result is, not just that it didn't crash.
+
+```python
+# Good — analytical value with explicit tolerance
+np.testing.assert_allclose(curr["wealth"], expected_next_wealth, atol=1e-6)
+
+# Bad — passes whether the math is right or not
+assert not jnp.any(jnp.isnan(V_arr))
+assert df["wealth"].notna().all()
+```
+
+`not isnan` and `no exception raised` belong in CI smoke tests, not in the unit tests
+for the feature itself.
+
+### Mechanics
+
+- Use plain pytest functions, never test classes (`class TestFoo`)
+- Use `@pytest.mark.parametrize` for test variations
+
+## Docstring Style
+
+Docstrings and inline comments describe the code's *current* state in user-facing terms.
+The 9-month-without-PR-context reader is the audience: a docstring that survives that
+test stays useful; one that rehearses the diff or the prior implementation rots
+immediately.
+
+This applies to **all** docstrings and comments — source and tests. For tests
+specifically, see also "Test docstrings — describe behavior, not history" above.
+
+### Describe state, not history
+
+State what is true now. Don't reference prior designs, removed code, or what was
+changed. Words like "earlier", "previously", "now", "formerly", "the old", "before the
+fix" are red flags.
+
+```python
+# Good — forward-looking constraint
+class _DiagnosticRow:
+    """Metadata captured during the backward-induction loop.
+
+    Holds only Python-scalar metadata — no device-array references —
+    so every (regime, period) row stays at a few bytes regardless of
+    grid size.
+    """
+
+
+# Bad — rehearses prior design
+class _DiagnosticRow:
+    """Metadata captured during the backward-induction loop.
+
+    Holds only Python-scalar metadata. The earlier design captured
+    state_action_space and a closure directly on each row, which
+    pinned every period's V template in device memory until the
+    post-loop flush.
+    """
+```
+
+### No PR numbers, no model-specific magic numbers
+
+PR references (`#334 removed the host stalls`, `the bug was fixed in #42`) rot as the
+codebase evolves and provide no useful signal to a reader who isn't already in context.
+Magic numbers tied to a specific model size or hardware
+(`~2 MB at production grid sizes`, `fits on a 16 GB device`) imply a fixed scale that's
+only true on whichever model/box the comment was written against. State the qualitative
+dependency instead.
+
+```python
+# Good — qualitative dependency
+# Frees per-period intermediate buffers (V_arr-shaped, so
+# model-dependent) so they don't stack up across the loop.
+
+# Bad — PR reference + magic number
+# Frees per-period intermediate buffers (~2 MB each at production
+# grid sizes) so we don't re-introduce the host stalls that #334
+# removed.
+```
+
+### Bulleted lists for enumerated cases
+
+When describing a fixed set of cases (log levels, regime kinds, parameter types,
+dispatch strategies), use one bullet per case rather than running prose. Bullets scan;
+prose hides cases.
+
+```python
+# Good — scannable
+# Gate falls out of the public log level:
+# - `"off"` ⇒ nothing (skips even the NaN fail-fast)
+# - `"warning"` / `"progress"` ⇒ NaN/Inf only
+# - `"debug"` ⇒ adds the min/max/mean trio
+
+
+# Bad — buried in prose
+# Gate falls out of the public log level: `"off"` ⇒ nothing,
+# `"warning"` / `"progress"` ⇒ NaN/Inf only, `"debug"` ⇒ adds the
+# min/max/mean trio. `"off"` skips even the NaN fail-fast.
+```
+
 ## Development Notes
 
 ### JAX Integration
@@ -401,11 +545,6 @@ Code structure should be self-evident from function names and ordering.
   display math, and `[text](url)` for links. Never use rST-style ``` `` code `` ```,
   `:math:`, `:func:`, or `` `link <url>`_ ``.
 
-### Testing Style
-
-- Use plain pytest functions, never test classes (`class TestFoo`)
-- Use `@pytest.mark.parametrize` for test variations
-
 ### Plotting
 
 - Always use **plotly** for visualizations, never matplotlib. Use `plotly.graph_objects`
 
@@ -47,14 +47,30 @@
 
 
 def _build() -> tuple[object, object, object]:
-    """Build the aca-baseline model, params, and initial conditions."""
+    """Build the aca-baseline model, params, and initial conditions.
+
+    aca_model and lcm imports are deferred to the function body — ASV's
+    forkserver runs `preimport` to discover benchmarks across every
+    `bench_*.py` module before forking workers. Importing JAX at module
+    top loads the multithreaded XLA backend into the forkserver; every
+    subsequent `os.fork()` inherits a corrupted CUDA context and the
+    first device op in the worker aborts with
+    `CUDA_ERROR_NOT_INITIALIZED`. Per-call imports keep JAX out of the
+    forkserver and confine it to the worker process.
+    """
+    from aca_model.agent.preferences import BenchmarkPrefType
     from aca_model.benchmark import (
         create_benchmark_model,
         get_benchmark_initial_conditions,
         get_benchmark_params,
     )
 
-    model = create_benchmark_model()
+    from lcm import DiscreteGrid
+
+    model = create_benchmark_model(
+        n_subjects=_N_SUBJECTS,
+        pref_type_grid=DiscreteGrid(BenchmarkPrefType),
+    )
     _, model_params = get_benchmark_params(model=model)
     initial_conditions = get_benchmark_initial_conditions(
         model=model, n_subjects=_N_SUBJECTS, seed=0
 
@@ -98,7 +98,7 @@ tests-cuda13 = { features = [ "tests", "cuda13" ], solve-group = "cuda13" }
 tests-metal = { features = [ "tests", "metal" ], solve-group = "metal" }
 type-checking = { features = [ "type-checking", "tests" ], solve-group = "default" }
 [tool.pixi.feature.benchmarks.pypi-dependencies]
-aca-model = { git = "https://github.com/OpenSourceEconomics/aca-model.git", rev = "134286108b7445f3e17e8824bcdd1739a98b6089" }
+aca-model = { git = "https://github.com/OpenSourceEconomics/aca-model.git", rev = "9ac20430f499a8b1cdb056af85bc2a26e850bad2" }
 [tool.pixi.feature.cuda12]
 platforms = [ "linux-64" ]
 system-requirements = { cuda = "12" }
@@ -242,6 +242,15 @@ per-file-ignores."tests/*" = [
   "S301",    # Use of pickle
   "SLF001",  # Private member access
 ]
+per-file-ignores."tests/test_dtypes.py" = [
+  "ARG001", # Unused function argument (x64_enabled / x64_disabled fixtures)
+]
+per-file-ignores."tests/test_explicit_dtype_filter.py" = [
+  "ARG001", # Unused function argument (x64_disabled fixture)
+]
+per-file-ignores."tests/test_float_dtype_invariants.py" = [
+  "ARG001", # Unused function argument (x64_disabled fixture)
+]
 per-file-ignores."tests/test_next_state.py" = [
   "ARG001", # Unused function argument
   "ARG005", # Unused lambda argument
@@ -294,7 +303,15 @@ ini_options.addopts = [
   "--dist",
   "loadfile",
 ]
-ini_options.filterwarnings = []
+ini_options.filterwarnings = [
+  # JAX emits this UserWarning when user code asks for a dtype wider
+  # than the active x64 setting allows. Under `--precision=32` it
+  # surfaces every stray `jnp.int64` / `jnp.float64` / `dtype=int64`
+  # literal in src/ — the only files that legitimately trigger it are
+  # the dtype-invariant test modules, which opt out via a local
+  # `pytestmark` filter.
+  "error:Explicitly requested dtype.*:UserWarning",
+]
 ini_options.markers = [
   "illustrative: Tests are designed for illustrative purposes",
   "gpu: Tests that require a GPU (skipped on CPU-only machines)",
 
@@ -10,7 +10,7 @@
 import jax.numpy as jnp
 
 from lcm.exceptions import GridInitializationError, format_messages
-from lcm.typing import Age, Float1D, Int1D
+from lcm.typing import Float1D, Int1D
 
 STEP_UNITS: MappingProxyType[str, Fraction] = MappingProxyType(
     {
@@ -129,7 +129,7 @@ def exact_step_size(self) -> int | Fraction | None:
         """
         return self._exact_step_size
 
-    def period_to_age(self, period: int) -> Age:
+    def period_to_age(self, period: int) -> int | float:
         """Convert a period index to the corresponding age.
 
         Args:
@@ -151,7 +151,7 @@ def period_to_age(self, period: int) -> Age:
             return int(self._values[period])
         return float(self._values[period])
 
-    def age_to_period(self, age: Age) -> int:
+    def age_to_period(self, age: float) -> int:
         """Convert an age to the corresponding period index.
 
         Args:
@@ -172,12 +172,14 @@ def age_to_period(self, age: Age) -> int:
             raise ValueError(msg) from None
 
     @functools.cached_property
-    def _age_to_period_map(self) -> dict[Age, int]:
+    def _age_to_period_map(self) -> dict[int | float, int]:
         if self._is_integer:
             return {int(v): i for i, v in enumerate(self._exact_values)}
         return {float(v): i for i, v in enumerate(self._exact_values)}
 
-    def get_periods_where(self, predicate: Callable[[Age], bool]) -> tuple[int, ...]:
+    def get_periods_where(
+        self, predicate: Callable[[int | float], bool]
+    ) -> tuple[int, ...]:
         """Get period indices where predicate is True.
 
         Args:
@@ -187,7 +189,7 @@ def get_periods_where(self, predicate: Callable[[Age], bool]) -> tuple[int, ...]
             Tuple of period indices where predicate(age) is True.
 
         """
-        _convert: Callable[[object], Age] = int if self._is_integer else float  # ty: ignore[invalid-assignment]
+        _convert: Callable[[object], int | float] = int if self._is_integer else float  # ty: ignore[invalid-assignment]
         return tuple(
             period
             for period in range(self.n_periods)