@@ -294,6 +294,150 @@ initial_conditions = {
294294- ` model.n_periods ` - Number of periods in the model (derived from ` ages ` )
295295- ` model.regime_names_to_ids ` - Immutable mapping from regime names to integer indices
296296
297+ ## Testing
298+
299+ ### Test-Driven Development — always
300+
301+ ** Always write the test first, watch it fail, then implement.** No exceptions for new
302+ behavior or bug fixes. Tests are not an afterthought, they are the spec.
303+
304+ The cycle:
305+
306+ 1 . ** Red.** Write a failing test that asserts the desired behavior in user-facing terms.
307+ Run it. Confirm it fails for the * right* reason (the missing behavior — not a typo,
308+ not an import error).
309+ 1 . ** Green.** Write the smallest amount of code that makes the test pass.
310+ 1 . ** Refactor.** Clean up while keeping the test green.
311+
312+ Apply per case:
313+
314+ - ** New feature** → red-green-refactor.
315+ - ** Bug fix** → reproduce as a failing test before writing the fix. The test then
316+ prevents regression.
317+ - ** Refactor (no behavior change)** → existing tests are the spec. Keep them green
318+ before, during, and after. No new test needed if behavior is unchanged; if you find a
319+ behavior gap, fill it with a new test * before* refactoring.
320+
321+ ### Test docstrings — describe behavior, not history
322+
323+ Test docstrings state what * should* be true, in user-facing terms. Pretend the reader
324+ has never seen the PR. They should not need to.
325+
326+ ``` python
327+ # Good — behavior, in plain language
328+ def test_simulate_with_chained_transitions_yields_expected_next_wealth ():
329+ """ `next_wealth_t = wealth_t - c_t + 0.1 * next_aime_t` holds in simulation."""
330+
331+
332+ # Bad — rehearses the prior bug or implementation history
333+ def test_solve_resolves_chain_via_dags ():
334+ """ Before the fix, `_resolve_fixed_params` raised
335+ `InvalidParamsError: Missing required parameter: ...` because
336+ `create_regime_params_template` classified ..."""
337+ ```
338+
339+ Rule of thumb: ** would the docstring still make sense in 9 months without the PR
340+ context?** If not, rewrite it.
341+
342+ ### Concrete-value assertions
343+
344+ Assert * what* the result is, not just that it didn't crash.
345+
346+ ``` python
347+ # Good — analytical value with explicit tolerance
348+ np.testing.assert_allclose(curr[" wealth" ], expected_next_wealth, atol = 1e-6 )
349+
350+ # Bad — passes whether the math is right or not
351+ assert not jnp.any(jnp.isnan(V_arr))
352+ assert df[" wealth" ].notna().all()
353+ ```
354+
355+ ` not isnan ` and ` no exception raised ` belong in CI smoke tests, not in the unit tests
356+ for the feature itself.
357+
358+ ### Mechanics
359+
360+ - Use plain pytest functions, never test classes (` class TestFoo ` )
361+ - Use ` @pytest.mark.parametrize ` for test variations
362+
363+ ## Docstring Style
364+
365+ Docstrings and inline comments describe the code's * current* state in user-facing terms.
366+ The 9-month-without-PR-context reader is the audience: a docstring that survives that
367+ test stays useful; one that rehearses the diff or the prior implementation rots
368+ immediately.
369+
370+ This applies to ** all** docstrings and comments — source and tests. For tests
371+ specifically, see also "Test docstrings — describe behavior, not history" above.
372+
373+ ### Describe state, not history
374+
375+ State what is true now. Don't reference prior designs, removed code, or what was
376+ changed. Words like "earlier", "previously", "now", "formerly", "the old", "before the
377+ fix" are red flags.
378+
379+ ``` python
380+ # Good — forward-looking constraint
381+ class _DiagnosticRow :
382+ """ Metadata captured during the backward-induction loop.
383+
384+ Holds only Python-scalar metadata — no device-array references —
385+ so every (regime, period) row stays at a few bytes regardless of
386+ grid size.
387+ """
388+
389+
390+ # Bad — rehearses prior design
391+ class _DiagnosticRow :
392+ """ Metadata captured during the backward-induction loop.
393+
394+ Holds only Python-scalar metadata. The earlier design captured
395+ state_action_space and a closure directly on each row, which
396+ pinned every period's V template in device memory until the
397+ post-loop flush.
398+ """
399+ ```
400+
401+ ### No PR numbers, no model-specific magic numbers
402+
403+ PR references (` #334 removed the host stalls ` , ` the bug was fixed in #42 ` ) rot as the
404+ codebase evolves and provide no useful signal to a reader who isn't already in context.
405+ Magic numbers tied to a specific model size or hardware
406+ (` ~2 MB at production grid sizes ` , ` fits on a 16 GB device ` ) imply a fixed scale that's
407+ only true on whichever model/box the comment was written against. State the qualitative
408+ dependency instead.
409+
410+ ``` python
411+ # Good — qualitative dependency
412+ # Frees per-period intermediate buffers (V_arr-shaped, so
413+ # model-dependent) so they don't stack up across the loop.
414+
415+ # Bad — PR reference + magic number
416+ # Frees per-period intermediate buffers (~2 MB each at production
417+ # grid sizes) so we don't re-introduce the host stalls that #334
418+ # removed.
419+ ```
420+
421+ ### Bulleted lists for enumerated cases
422+
423+ When describing a fixed set of cases (log levels, regime kinds, parameter types,
424+ dispatch strategies), use one bullet per case rather than running prose. Bullets scan;
425+ prose hides cases.
426+
427+ ``` python
428+ # Good — scannable
429+ # Gate falls out of the public log level:
430+ # - `"off"` ⇒ nothing (skips even the NaN fail-fast)
431+ # - `"warning"` / `"progress"` ⇒ NaN/Inf only
432+ # - `"debug"` ⇒ adds the min/max/mean trio
433+
434+
435+ # Bad — buried in prose
436+ # Gate falls out of the public log level: `"off"` ⇒ nothing,
437+ # `"warning"` / `"progress"` ⇒ NaN/Inf only, `"debug"` ⇒ adds the
438+ # min/max/mean trio. `"off"` skips even the NaN fail-fast.
439+ ```
440+
297441## Development Notes
298442
299443### JAX Integration
@@ -401,11 +545,6 @@ Code structure should be self-evident from function names and ordering.
401545 display math, and ` [text](url) ` for links. Never use rST-style ``` `` code `` ``` ,
402546 ` :math: ` , ` :func: ` , or `` `link <url>`_ `` .
403547
404- ### Testing Style
405-
406- - Use plain pytest functions, never test classes (` class TestFoo ` )
407- - Use ` @pytest.mark.parametrize ` for test variations
408-
409548### Plotting
410549
411550- Always use ** plotly** for visualizations, never matplotlib. Use ` plotly.graph_objects `
0 commit comments