PolicyEngine
diff --git a/‎.gitignore‎
Lines changed: 7 additions & 0 deletions b/‎.gitignore‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎.python-version‎
Lines changed: 1 addition & 0 deletions b/‎.python-version‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 24 additions & 0 deletions b/‎README.md‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎docs/README.md‎
Lines changed: 14 additions & 0 deletions b/‎docs/README.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎docs/architecture.md‎
Lines changed: 92 additions & 0 deletions b/‎docs/architecture.md‎
Lines changed: 92 additions & 0 deletions
diff --git a/‎docs/benchmarking.md‎
Lines changed: 123 additions & 0 deletions b/‎docs/benchmarking.md‎
Lines changed: 123 additions & 0 deletions
@@ -0,0 +1,7 @@
+.venv/
+.pytest_cache/
+.ruff_cache/
+artifacts/
+.DS_Store
+__pycache__/
+*.pyc
@@ -0,0 +1 @@
+3.14
@@ -0,0 +1,24 @@
+# microplex-us
+
+US-specific survey adapters, calibration targets, pipelines, and PolicyEngine integration
+built on top of the generic `microplex` engine.
+
+## Docs
+
+- [Docs index](./docs/README.md)
+- [Architecture](./docs/architecture.md)
+- [Source semantics](./docs/source-semantics.md)
+- [Benchmarking](./docs/benchmarking.md)
+
+## Current focus
+
+`microplex-us` is being built as a library-first replacement path for
+`policyengine-us-data`:
+
+- canonical source and target metadata
+- PE-US-compatible export
+- full-target benchmarking against the active targets DB
+- run registry and DuckDB index for frontier analysis
+
+The architecture is still evolving, so the docs are deliberately technical and
+operational rather than paper-like.
@@ -0,0 +1,14 @@
+# microplex-us docs
+
+- [Architecture](./architecture.md)
+- [Source semantics](./source-semantics.md)
+- [Benchmarking](./benchmarking.md)
+
+This doc set is intentionally technical. It is meant to answer three questions:
+
+1. What is the current architecture?
+2. How do source semantics and variable semantics drive donor integration?
+3. How do we measure progress against `policyengine-us-data` on real targets?
+
+The docs describe the code that exists today. They do not try to freeze a final
+paper narrative while the architecture is still moving.
@@ -0,0 +1,92 @@
+# Architecture
+
+`microplex-us` is the US-specific country package built on top of the generic
+`microplex` engine.
+
+## Package split
+
+- `microplex`: generic engine pieces
+  - source descriptors and observation frames
+  - fusion planning
+  - synthesis and calibration
+  - canonical target spec and provider protocol
+  - generic geography and entity abstractions
+- `microplex-us`: US-specific implementations
+  - CPS, PUF, and other source providers
+  - PE-US target import and compilation
+  - PE-US export and evaluation
+  - US experiment, registry, and artifact layers
+
+## Current build flow
+
+Main entrypoint:
+
+- `microplex_us.pipelines.USMicroplexPipeline`
+
+Current broad flow:
+
+1. Load one or more `SourceProvider`s into `ObservationFrame`s.
+2. Build a `FusionPlan` from the source descriptors.
+3. Choose a public structured scaffold source.
+4. Prepare canonical seed data from the scaffold.
+5. Integrate donor-only variables from other sources using source and variable
+   capability metadata, with donor-block-specific automatic condition selection,
+   declared condition-entity policy, and native-entity projection when entity
+   IDs are available.
+6. Synthesize a new population.
+7. Build PolicyEngine-style entity tables.
+8. Materialize PE-derived features needed by targets.
+9. Calibrate against PE-US DB targets.
+10. Export a PE-ingestable H5 and evaluate against the full active target set.
+
+Important files:
+
+- `src/microplex_us/pipelines/us.py`
+- `src/microplex_us/policyengine/us.py`
+- `src/microplex_us/policyengine/comparison.py`
+- `src/microplex_us/pipelines/artifacts.py`
+- `src/microplex_us/pipelines/index_db.py`
+
+## What is already true
+
+- The package is library-first. The core build, artifact saving, experiment
+  running, and frontier tracking all live in importable APIs.
+- PolicyEngine evaluation uses the real `policyengine-us-data` targets DB as
+  truth targets.
+- Saved runs persist:
+  - artifact bundle
+  - `policyengine_harness.json`
+  - `run_registry.jsonl`
+  - `run_index.duckdb`
+
+## What is not final yet
+
+- Broad PE-US parity is not stable yet.
+- The current US path is still scaffold-plus-donors rather than a fully
+  symmetric multientity latent-population model.
+- Held-out target evaluation is not the default loop yet.
+- Local-area production replacement is still future work.
+
+## Design direction
+
+The intended long-run shape is:
+
+- canonical source metadata
+- canonical variable semantics
+- multientity fusion
+- derived-variable materialization after atomic modeling
+- target compilation as a generic feature/filter/aggregation problem
+
+The current implementation is already moving in that direction:
+
+- canonical target spec
+- source capability registry
+- variable semantic registry
+- donor block specs with declared match strategies
+- donor block specs with declared condition-entity policy
+- variable semantics with declared projection aggregation for group-level donor fits
+- automatic donor condition selection from source overlap plus data signal
+- native-entity donor execution for tax-unit-native blocks when IDs are present
+- full-target PE-US harness
+
+But it is still an actively evolving system, not a finished paper architecture.
@@ -0,0 +1,123 @@
+# Benchmarking
+
+The benchmark question is:
+
+> Is Microplex closer to the real target DB than `policyengine-us-data` is?
+
+## What is truth
+
+Truth is the active target set loaded from the PE-US targets DB.
+
+Main provider:
+
+- `microplex_us.policyengine.PolicyEngineUSDBTargetProvider`
+
+The baseline dataset is not truth. It is only the incumbent comparator.
+
+## What PolicyEngine does
+
+`policyengine-us` is the shared measurement operator.
+
+Both:
+
+- the Microplex candidate dataset
+- the `policyengine-us-data` baseline dataset
+
+are run through the same PE-US variable materialization and the same target
+compiler before being compared to the same targets.
+
+So the benchmark shape is:
+
+`dataset -> policyengine-us -> implied aggregates -> compare to target DB`
+
+## Current default harness
+
+Default saved-build evaluation now uses:
+
+- the full active PE-US target estate
+- one `all_targets` slice
+
+Main files:
+
+- `src/microplex_us/policyengine/harness.py`
+- `src/microplex_us/policyengine/comparison.py`
+
+## Main metrics
+
+Per run:
+
+- `candidate_composite_parity_loss`
+- `baseline_composite_parity_loss`
+- `candidate_mean_abs_relative_error`
+- `baseline_mean_abs_relative_error`
+- `target_win_rate`
+- `supported_target_rate`
+
+The frontier metric is currently:
+
+- `candidate_composite_parity_loss`
+
+This is a diversity-aware outer loss over the target set rather than a raw
+target-count-weighted mean alone.
+
+## Saved outputs
+
+Every serious saved run can write:
+
+- artifact bundle directory
+- `policyengine_harness.json`
+- `run_registry.jsonl`
+- `run_index.duckdb`
+
+These live under the selected artifact root.
+
+## Inspecting runs
+
+Useful Python APIs:
+
+- `select_us_microplex_frontier_entry(...)`
+- `select_us_microplex_frontier_index_row(...)`
+- `list_us_microplex_target_delta_rows(...)`
+- `compare_us_microplex_target_delta_rows(...)`
+
+The last helper is meant for questions like:
+
+- what changed between two broad runs?
+- which targets improved under a source-policy change?
+- which target families regressed even when overall loss improved?
+
+## Current broad reference point
+
+As of March 27, 2026, the best recorded broad `national + state` `CPS+PUF`
+frontier in the main artifact root was:
+
+- artifact id: `cps_puf_500_native_wages`
+- candidate composite parity loss: `0.8906`
+- baseline composite parity loss: `4.5412`
+- candidate mean absolute relative error: `0.9928`
+- baseline mean absolute relative error: `1.1920`
+
+That does **not** mean Microplex is already better on most targets. The same run
+had a low `target_win_rate`, meaning the gain comes from improving the overall
+loss surface rather than beating the incumbent on a majority of individual
+targets.
+
+## Important caveats
+
+- This is parity evaluation, not held-out evaluation.
+- Calibration and evaluation still overlap unless explicitly separated in build
+  config.
+- A broad win on the composite loss is not the same thing as a majority-target
+  win.
+- Local-area production parity is not finished yet.
+
+## Repro pattern
+
+Broad versioned builds use:
+
+- `build_and_save_versioned_us_microplex(...)`
+- `build_and_save_versioned_us_microplex_from_source_provider(...)`
+- `build_and_save_versioned_us_microplex_from_source_providers(...)`
+
+The resulting run can then be inspected through the JSON artifacts or via the
+DuckDB index.