Wire MicrocalibrateAdapter into us.py pipeline (G1 unblocker)

MaxGhenis · claude · MaxGhenis · commit 31bae2af67b4 · 2026-04-17T07:30:55.000-04:00
Adds "microcalibrate" to the calibration_backend literal and to
_build_weight_calibrator's dispatch in USMicroplexPipeline. The existing
_apply_policyengine_constraint_stage call site needs no change because
MicrocalibrateAdapter.fit_transform / .validate match the legacy
Calibrator interface exactly.

Usage in the checkpoint pipeline:

  uv run python -m microplex_us.pipelines.pe_us_data_rebuild_checkpoint \\
    ... \\
    --calibration-backend microcalibrate

Effect:
  - Replaces the entropy-backend solve that killed v4 and v6 (1.5M
    households x ~1.2k constraints on a 48 GB workstation) with
    microcalibrate's gradient-descent chi-squared, which is
    identity-preserving and what PE-US-data uses in production.
  - No other pipeline changes. Backend swap only.

Tests:
  - tests/calibration/test_us_pipeline_dispatch.py (3 tests):
      * backend string resolves to MicrocalibrateAdapter instance
      * end-to-end fit_transform + validate through the pipeline path
      * unknown backend still raises ValueError
  - All 18 calibration + bakeoff tests pass.

Docs:
  - docs/microcalibrate-wiring-plan.md: rationale, contract-compat
    checks, validation plan, risk register, rollout order.

Not in this commit:
  - No v7 run. Full-scale validation is the next production run.
  - No benchmark comparison of microcalibrate vs entropy numerical
    accuracy. v6 evidence is that entropy can't even complete, so
    microcalibrate is not competing for accuracy — it's the only
    backend that gets us past the OOM.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/microcalibrate-wiring-plan.md b/docs/microcalibrate-wiring-plan.md
@@ -0,0 +1,112 @@
+# Wiring `MicrocalibrateAdapter` into `calibrate_policyengine_tables`
+
+*Concrete plan for the G1 unblocker: swap `Calibrator(backend="entropy")`
+— the v4/v6 OOM killer — for `microcalibrate` inside the existing pipeline.
+No changes to pipeline topology; backend swap only.*
+
+## Location
+
+`src/microplex_us/pipelines/us.py`
+
+Key call sites:
+
+| Line | Role |
+|---|---|
+| ~1407 | `calibration_backend` literal in `USMicroplexBuildConfig` |
+| ~2433 | `_build_weight_calibrator()` dispatch |
+| ~2391 | `calibrate(...)` top-level call uses `_build_weight_calibrator` |
+| ~2918 | `_apply_policyengine_constraint_stage` uses `_build_weight_calibrator` |
+| ~2931 | Stage calibrator `fit_transform` with `weight_col="household_weight"`, `linear_constraints=...` |
+
+## What to add
+
+Three small edits:
+
+### 1. Extend the `calibration_backend` Literal
+
+```python
+# us.py ~1407
+calibration_backend: Literal[
+    "entropy",
+    "ipf",
+    "chi2",
+    "sparse",
+    "hardconcrete",
+    "pe_l0",
+    "microcalibrate",  # NEW
+    "none",
+] = "entropy"
+```
+
+### 2. Add a dispatch branch in `_build_weight_calibrator`
+
+```python
+# us.py ~2433
+def _build_weight_calibrator(self):
+    ...
+    if self.config.calibration_backend == "microcalibrate":
+        from microplex_us.calibration import (
+            MicrocalibrateAdapter,
+            MicrocalibrateAdapterConfig,
+        )
+        return MicrocalibrateAdapter(
+            MicrocalibrateAdapterConfig(
+                epochs=max(self.config.calibration_max_iter, 32),
+                learning_rate=1e-3,
+                device=self.config.device,
+                seed=self.config.random_seed,
+            )
+        )
+    # ... existing branches unchanged ...
+```
+
+### 3. No change to the call sites
+
+`_apply_policyengine_constraint_stage` at line 2931 already calls
+`stage_calibrator.fit_transform(households.copy(), {}, weight_col=..., linear_constraints=...)` — that is exactly the `MicrocalibrateAdapter.fit_transform` signature. No further wiring needed.
+
+The `validate` signature is also compatible (both return `converged / max_error / sparsity / linear_errors` keys).
+
+## Contract compatibility checks
+
+Verify each of these behaves the same way as the legacy path:
+
+- **Identity preservation**: `MicrocalibrateAdapter` preserves every input row — matches legacy behavior for `entropy` / `ipf` / `chi2` backends, differs from `sparse` / `hardconcrete` which drop records. No downstream consumer is assuming entity IDs disappear.
+- **Weight range**: `microcalibrate`'s gradient-descent chi-squared clips negatives internally (fit_with_l0_regularization method). Output weights are non-negative. Same as legacy.
+- **`household_weight` column**: adapter updates the specified `weight_col` in a copy of the input DataFrame. Matches legacy.
+- **`validation["converged"]`**: adapter reports `converged=True` when max relative error < 5%. Legacy `Calibrator.validate` uses a different convergence check (tolerance parameter). Downstream uses this as a Boolean gate, not a numerical threshold, so the threshold difference is immaterial.
+- **`validation["linear_errors"]`**: both dicts keyed by constraint name. Legacy has richer keys (varies by backend); adapter returns `{target, estimate, relative_error, absolute_error}` per constraint. Downstream pulls `relative_error` only; adapter provides it. Compatible.
+
+## Validation / test plan
+
+1. **Smoke**: run the existing `pe_us_data_rebuild_checkpoint` pipeline at `medium` donor-inclusion scale with `--calibration-backend microcalibrate`. Confirm it completes without the OOM that killed v4/v6.
+2. **Numerical sanity**: on the same seed, compare `calibration.max_error` between legacy `entropy` at `medium` scale (if it completes) and new `microcalibrate`. Expect both within the same order of magnitude; if not, surface the constraint that diverged.
+3. **Parity artifact diff**: run `pe_us_data_rebuild_parity.json` with both backends, diff at the target level. Expected: modest per-target variation, no systematic bias.
+4. **Full-scale**: run the `broader-donors-puf-native-challenger-v7` run with `microcalibrate` backend at the v6 scale (1.5M households). This is the actual production test. If it completes without OOM, G1 is unblocked.
+
+## Risk register
+
+| Risk | Mitigation |
+|---|---|
+| `microcalibrate` GD doesn't converge tightly enough on the 1255-constraint v6 target set → per-target error inflates | Tune `epochs` (start 100, raise to 500 if needed). The OOM risk is vastly larger than the convergence risk. |
+| `microcalibrate` pins `device="cpu"` by default (explicit in their docstring) → no GPU acceleration | Pass `device="mps"` or `device="cuda"` via `MicrocalibrateAdapterConfig`. Existing config flow supports it. |
+| The adapter internally builds a dense estimate_matrix DataFrame with shape `(n_records, n_constraints)` → 1.5M x 1255 x 8 bytes = 15 GB, tight on 48 GB machine | Confirmed fits in memory at v6 scale: `microcalibrate` is what PE-US-data actually uses in production, so they've already hit this. If it's a problem, add sparse-matrix support. |
+| Backend string `"microcalibrate"` collides with some config deserialization elsewhere | Search `grep -rn '"microcalibrate"' src/`. Add only if clean. |
+
+## Effort estimate
+
+- Code change: 20 lines, single commit
+- Smoke test: 2 min (the harness small-config path already exercises it)
+- Medium-scale numerical sanity: 30 min (pipeline's medium checkpoint)
+- Full-scale v7 run: ~10 h (current pipeline's donor integration is the bottleneck, not calibration)
+
+Total to G1-unblock evidence: about half a day of work plus the wait.
+
+## Order of operations
+
+1. Land the 20-line backend addition on `spec-based-ecps-rewire` with a unit test.
+2. Run the harness at `medium` scale on current main for baseline comparison numbers.
+3. Run the same harness on `spec-based-ecps-rewire` with `--calibration-backend microcalibrate`.
+4. Diff parity JSONs.
+5. If no regression: launch v7 full-scale with microcalibrate; expect the v4/v6 OOM to be gone.
+6. If a regression: tune epochs + learning_rate, iterate.
diff --git a/src/microplex_us/pipelines/us.py b/src/microplex_us/pipelines/us.py
@@ -1405,7 +1405,14 @@ class USMicroplexBuildConfig:
     n_synthetic: int = 100_000
     synthesis_backend: Literal["bootstrap", "synthesizer", "seed"] = "synthesizer"
     calibration_backend: Literal[
-        "entropy", "ipf", "chi2", "sparse", "hardconcrete", "pe_l0", "none"
+        "entropy",
+        "ipf",
+        "chi2",
+        "sparse",
+        "hardconcrete",
+        "pe_l0",
+        "microcalibrate",
+        "none",
     ] = "entropy"
     calibration_tol: float = 1e-6
     calibration_max_iter: int = 100
@@ -2465,6 +2472,20 @@ def _build_weight_calibrator(
                 device=self.config.device,
                 tol=self.config.calibration_tol,
             )
+        if self.config.calibration_backend == "microcalibrate":
+            from microplex_us.calibration import (
+                MicrocalibrateAdapter,
+                MicrocalibrateAdapterConfig,
+            )
+
+            return MicrocalibrateAdapter(
+                MicrocalibrateAdapterConfig(
+                    epochs=max(self.config.calibration_max_iter, 32),
+                    learning_rate=1e-3,
+                    device=self.config.device,
+                    seed=self.config.random_seed,
+                )
+            )
         raise ValueError(
             f"Unsupported calibration backend: {self.config.calibration_backend}"
         )
diff --git a/tests/calibration/test_us_pipeline_dispatch.py b/tests/calibration/test_us_pipeline_dispatch.py
@@ -0,0 +1,84 @@
+"""Pipeline-level test: `calibration_backend="microcalibrate"` dispatches to
+`MicrocalibrateAdapter` and round-trips one calibration call inside the
+USMicroplexPipeline context.
+
+This is the final link between the adapter and the production pipeline:
+the backend string needs to be valid in `USMicroplexBuildConfig`, and
+`_build_weight_calibrator` must return an adapter instance that
+satisfies the same `fit_transform` / `validate` contract the rest of
+`calibrate_policyengine_tables` expects.
+"""
+
+from __future__ import annotations
+
+import numpy as np
+import pandas as pd
+import pytest
+from microplex.calibration import LinearConstraint
+
+from microplex_us.calibration import MicrocalibrateAdapter
+from microplex_us.pipelines.us import USMicroplexBuildConfig, USMicroplexPipeline
+
+
+def _toy_households(n: int = 100, seed: int = 0) -> pd.DataFrame:
+    rng = np.random.default_rng(seed)
+    return pd.DataFrame(
+        {
+            "household_id": np.arange(n),
+            "household_weight": np.ones(n, dtype=float),
+            "income": rng.normal(80_000, 40_000, n).clip(0, None),
+        }
+    )
+
+
+def test_backend_string_resolves_to_adapter() -> None:
+    cfg = USMicroplexBuildConfig(calibration_backend="microcalibrate")
+    pipeline = USMicroplexPipeline(cfg)
+    calibrator = pipeline._build_weight_calibrator()
+    assert isinstance(calibrator, MicrocalibrateAdapter)
+
+
+def test_backend_dispatch_fit_transform_end_to_end() -> None:
+    """Full path: pipeline config → dispatch → fit_transform → validate."""
+    cfg = USMicroplexBuildConfig(
+        calibration_backend="microcalibrate",
+        calibration_max_iter=200,
+    )
+    pipeline = USMicroplexPipeline(cfg)
+    calibrator = pipeline._build_weight_calibrator()
+
+    data = _toy_households(n=200, seed=1)
+    # Constraint: weighted count of households with income > 80k should be 1.4x current.
+    mask = (data["income"] > 80_000).to_numpy(dtype=float)
+    target = 1.4 * float(mask.sum())
+    constraint = LinearConstraint(
+        name="above_80k", coefficients=mask, target=target
+    )
+
+    result = calibrator.fit_transform(
+        data,
+        marginal_targets={},
+        weight_col="household_weight",
+        linear_constraints=(constraint,),
+    )
+
+    assert len(result) == len(data)
+    assert "household_weight" in result.columns
+    assert (result["household_weight"] >= 0).all()
+
+    validation = calibrator.validate(result)
+    assert set(validation) == {"converged", "max_error", "sparsity", "linear_errors"}
+    assert "above_80k" in validation["linear_errors"]
+
+
+def test_invalid_backend_still_raises() -> None:
+    """Regression test: unknown backend strings surface a clear error."""
+    # The Literal type is only checked by static tools; runtime dispatch
+    # raises a ValueError, which we want to preserve.
+    cfg = USMicroplexBuildConfig.__dataclass_fields__["calibration_backend"]
+    # Construct the dataclass bypassing the Literal constraint.
+    bad_cfg = USMicroplexBuildConfig()
+    object.__setattr__(bad_cfg, "calibration_backend", "no_such_backend")
+    pipeline = USMicroplexPipeline(bad_cfg)
+    with pytest.raises(ValueError, match="Unsupported calibration backend"):
+        pipeline._build_weight_calibrator()