Skip to content
This repository was archived by the owner on Jun 14, 2026. It is now read-only.

Commit 74553a8

Browse files
authored
Merge pull request #1 from CosilicoAI/codex/us-atomic-ladder
Advance PE rebuild diagnostics and source alignment
2 parents 9c8dd11 + 87f6034 commit 74553a8

123 files changed

Lines changed: 58594 additions & 2370 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
name: Site Snapshot
2+
3+
on:
4+
pull_request:
5+
push:
6+
branches:
7+
- main
8+
workflow_dispatch:
9+
10+
permissions:
11+
contents: read
12+
13+
jobs:
14+
site-snapshot:
15+
runs-on: ubuntu-latest
16+
defaults:
17+
run:
18+
working-directory: microplex-us
19+
steps:
20+
- name: Check out microplex-us
21+
uses: actions/checkout@v4
22+
with:
23+
path: microplex-us
24+
25+
- name: Check out core microplex
26+
uses: actions/checkout@v4
27+
with:
28+
repository: CosilicoAI/microplex
29+
ref: 71f270edecac3ef748411deb3beb77109c56a721
30+
path: microplex
31+
32+
- name: Set up Python
33+
uses: actions/setup-python@v5
34+
with:
35+
python-version: "3.13"
36+
37+
- name: Set up uv
38+
uses: astral-sh/setup-uv@v6
39+
40+
- name: Verify snapshot tooling
41+
run: |
42+
uv run --extra dev --with pydantic --with-editable ../microplex pytest -q \
43+
tests/test_package_imports.py \
44+
tests/pipelines/test_check_site_snapshot.py \
45+
tests/pipelines/test_imputation_ablation.py \
46+
tests/pipelines/test_site_snapshot.py \
47+
tests/pipelines/test_version_benchmark.py
48+
49+
- name: Check generated site snapshot
50+
run: |
51+
snapshot_path="$(uv run python - <<'PY'
52+
import json
53+
import tempfile
54+
from pathlib import Path
55+
56+
from microplex_us.pipelines.site_snapshot import write_us_microplex_site_snapshot
57+
58+
root = Path(tempfile.mkdtemp()).resolve()
59+
artifact_dir = root / "run-1"
60+
artifact_dir.mkdir()
61+
for filename in (
62+
"seed_data.parquet",
63+
"synthetic_data.parquet",
64+
"calibrated_data.parquet",
65+
"targets.json",
66+
):
67+
(artifact_dir / filename).write_text("{}" if filename == "targets.json" else "")
68+
69+
(artifact_dir / "manifest.json").write_text(
70+
json.dumps(
71+
{
72+
"created_at": "2026-03-29T00:00:00+00:00",
73+
"config": {"n_synthetic": 2000},
74+
"artifacts": {
75+
"seed_data": "seed_data.parquet",
76+
"synthetic_data": "synthetic_data.parquet",
77+
"calibrated_data": "calibrated_data.parquet",
78+
"targets": "targets.json",
79+
"policyengine_harness": "policyengine_harness.json",
80+
},
81+
"synthesis": {
82+
"scaffold_source": "cps_asec_2023",
83+
"state_program_support_proxies": {
84+
"available": ["ssi"],
85+
"missing": ["snap"],
86+
},
87+
},
88+
"calibration": {
89+
"n_loaded_targets": 100,
90+
"n_supported_targets": 90,
91+
"converged": False,
92+
"weight_collapse_suspected": False,
93+
},
94+
"policyengine_harness": {
95+
"candidate_mean_abs_relative_error": 0.9,
96+
"baseline_mean_abs_relative_error": 1.1,
97+
"mean_abs_relative_error_delta": -0.2,
98+
},
99+
}
100+
)
101+
)
102+
(artifact_dir / "policyengine_harness.json").write_text(
103+
json.dumps(
104+
{
105+
"summary": {
106+
"candidate_mean_abs_relative_error": 0.9,
107+
"baseline_mean_abs_relative_error": 1.1,
108+
"mean_abs_relative_error_delta": -0.2,
109+
"candidate_composite_parity_loss": 0.8,
110+
"baseline_composite_parity_loss": 1.2,
111+
"target_win_rate": 0.2,
112+
"slice_win_rate": 0.5,
113+
"supported_target_rate": 0.9,
114+
"tag_summaries": {},
115+
"parity_scorecard": {},
116+
"attribute_cell_summaries": {},
117+
}
118+
}
119+
)
120+
)
121+
snapshot_path = root / "snapshots" / "site_snapshot_us.json"
122+
write_us_microplex_site_snapshot(artifact_dir, snapshot_path)
123+
print(snapshot_path)
124+
PY
125+
)"
126+
uv run microplex-us-check-site-snapshot "$snapshot_path"

AGENTS.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# AGENTS.md
2+
3+
This repo is the US country pack for `microplex`. Keep it thin where possible and push shared abstractions upstream into core.
4+
5+
## Default posture
6+
7+
- Prefer spec-driven behavior over ad hoc logic in large pipeline files.
8+
- If a seam is useful for both UK and US, move it to `microplex` instead of polishing a US-only local helper.
9+
- Keep PolicyEngine-US execution details local unless there is a clean shared protocol.
10+
11+
## Current architectural intent
12+
13+
- `microplex-us` owns:
14+
- US source manifests and raw source adapters
15+
- PolicyEngine-US execution/materialization
16+
- US-specific target providers and benchmark harnesses
17+
- US-local pipeline orchestration
18+
- `microplex` core owns:
19+
- targets specs/providers/protocols
20+
- reweighting bundles and solver
21+
- benchmark metrics/comparisons/suites
22+
- shared result-based benchmark builders
23+
24+
## Current mission notes
25+
26+
- For US, the canonical mission metric is the PE-native broad loss frontier, not composite parity.
27+
- When evaluating progress, prefer:
28+
- matched-size `Microplex@N` vs `PE@N`
29+
- full `enhanced_cps_2024` only as a stretch reference
30+
- Recent direct-objective testing showed that changing only the post-export weight objective moves loss very little on the same fixed candidate.
31+
- Bias effort toward:
32+
- better candidate records
33+
- fuller support coverage
34+
- budgeted selection on larger candidates
35+
- Bias away from:
36+
- repeated small-candidate donor-backend A/Bs
37+
- more entropy tuning without evidence that the candidate population itself improved
38+
39+
## Review checklist
40+
41+
When reviewing recent changes here, check:
42+
43+
1. Is this still duplicating something that should now live in core?
44+
2. Is the US harness using shared core benchmarking helpers instead of rebuilding them inline?
45+
3. Are any benchmark claims relying on non-common-target comparisons?
46+
4. Is the work using PE-native broad loss when it claims mission progress?
47+
5. Does PE-US materialization handle dependency chains and partial failures safely?
48+
6. Is this baking in fixed tax-unit structure more deeply than necessary?
49+
50+
## Be careful around
51+
52+
- `src/microplex_us/policyengine/us.py`
53+
- Large file with execution/materialization logic and remaining monolith risk.
54+
- `src/microplex_us/policyengine/harness.py`
55+
- Should keep delegating more suite/result logic to core.
56+
- `src/microplex_us/pipelines/local_reweighting.py`
57+
- Should remain a thin adapter over core bundle/reweighting surfaces.
58+
59+
## Standard commands
60+
61+
- Ruff: `uv run ruff check src tests`
62+
- Focused comparison/harness tests: `uv run pytest -q tests/policyengine/test_comparison.py tests/policyengine/test_harness.py`
63+
- Local reweighting tests: `uv run pytest -q tests/pipelines/test_local_reweighting.py`
64+
65+
## Claude/Codex review shortcut
66+
67+
For a quick review, read:
68+
69+
1. [`/Users/maxghenis/CosilicoAI/microplex-us/AGENTS.md`](/Users/maxghenis/CosilicoAI/microplex-us/AGENTS.md)
70+
2. [`/Users/maxghenis/CosilicoAI/microplex-us/_WORKSPACE.md`](/Users/maxghenis/CosilicoAI/microplex-us/_WORKSPACE.md)
71+
3. [`/Users/maxghenis/CosilicoAI/microplex-us/_BUILD_LOG.md`](/Users/maxghenis/CosilicoAI/microplex-us/_BUILD_LOG.md)
72+
73+
Then inspect changed files and return findings first.
74+
75+
## Review handoff
76+
77+
To avoid rebuilding long prompts in chat:
78+
79+
1. Treat [`/Users/maxghenis/CosilicoAI/microplex-us/reviews/PENDING_CLAUDE_REVIEW.md`](/Users/maxghenis/CosilicoAI/microplex-us/reviews/PENDING_CLAUDE_REVIEW.md) as the current review request.
80+
2. Read that file after the standard repo context files above.
81+
3. Write the full review to a dated file under [`/Users/maxghenis/CosilicoAI/microplex-us/reviews/`](/Users/maxghenis/CosilicoAI/microplex-us/reviews/).
82+
4. Append only a concise summary to [`/Users/maxghenis/CosilicoAI/microplex-us/_BUILD_LOG.md`](/Users/maxghenis/CosilicoAI/microplex-us/_BUILD_LOG.md).

README.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,19 @@ built on top of the generic `microplex` engine.
88
- [Docs index](./docs/README.md)
99
- [Architecture](./docs/architecture.md)
1010
- [Source semantics](./docs/source-semantics.md)
11+
- [Imputation conditioning contract](./docs/imputation-conditioning-contract.md)
1112
- [Benchmarking](./docs/benchmarking.md)
13+
- [Methodology ledger](./docs/methodology-ledger.md)
14+
- [PolicyEngine oracle compatibility path](./docs/policyengine-oracle-compatibility.md)
15+
- [PE construction parity](./docs/pe-construction-parity.md)
16+
- [Superseding `policyengine-us-data`](./docs/superseding-policyengine-us-data.md)
1217

1318
## Current focus
1419

15-
`microplex-us` is being built as a library-first replacement path for
16-
`policyengine-us-data`:
20+
`microplex-us` is being built as a library-first US runtime with
21+
`policyengine-us` as the shared measurement operator and
22+
`policyengine-us-data` as the incumbent comparator, not as the thing we are
23+
trying to clone wholesale:
1724

1825
- canonical source and target metadata
1926
- PE-US-compatible export
@@ -22,3 +29,8 @@ built on top of the generic `microplex` engine.
2229

2330
The architecture is still evolving, so the docs are deliberately technical and
2431
operational rather than paper-like.
32+
33+
Method-level decomposable-family bakeoffs now live in the sibling eval repo:
34+
`/Users/maxghenis/CosilicoAI/microplex-evals`. `microplex-us` should keep the
35+
runtime helpers and pipeline-adjacent diagnostics, not the long-lived eval
36+
orchestration and artifact curation.

0 commit comments

Comments
 (0)