Skip to content

Commit ae20bab

Browse files
committed
Add Stage 2 package payload reader
1 parent 4544761 commit ae20bab

14 files changed

Lines changed: 983 additions & 239 deletions

File tree

changelog.d/1073.changed

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add typed Stage 2 calibration package payload reader and writer helpers.

docs/engineering/pipeline-map.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,7 @@ Build sparse calibration matrix (targets x households x clones)
379379
| `takeup_rerand` Block-Level Takeup Re-randomization | `process` | `unknown` | `unknown` | |
380380
| `sparse_build` Sparse Matrix Construction | `process` | `unknown` | `unknown` | |
381381
| `out_pkg` calibration_package.pkl | `artifact` | `unknown` | `unknown` | |
382+
| `out_metadata` calibration_package_meta.json | `artifact` | `unknown` | `unknown` | |
382383
| `out_contract` calibration_package_contract.json | `artifact` | `unknown` | `unknown` | |
383384
| `util_sql` sqlalchemy | `utility` | `unknown` | `unknown` | |
384385
| `util_pool` ProcessPoolExecutor | `utility` | `unknown` | `unknown` | |
@@ -395,6 +396,9 @@ Build sparse calibration matrix (targets x households x clones)
395396
| `clone_assembly` Clone Value Assembly | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder._assemble_clone_values_standalone` |
396397
| `build_matrix` Build Calibration Matrix | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix` |
397398
| `build_matrix_chunked` Build Calibration Matrix In Chunks | `library` | `current` | `experimental` | `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix_chunked` |
399+
| `stage2_payload_boundary` Stage 2 Package Payload | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.payload.CalibrationPackagePayload` |
400+
| `stage2_payload_writer` Stage 2 Payload Writer | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.payload.CalibrationPackageWriter` |
401+
| `stage2_payload_reader` Stage 2 Payload Reader | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.payload.CalibrationPackageReader` |
398402
| `stage2_calibration_package_contract_writer` Stage 2 Contract Writer | `library` | `current` | `moving` | `policyengine_us_data.stage_contracts.calibration_package.write_calibration_package_contract` |
399403
| `stage2_calibration_package_contract_validator` Stage 2 Contract Validator | `validation` | `current` | `moving` | `policyengine_us_data.stage_contracts.calibration_package.validate_calibration_package_contract` |
400404

@@ -423,13 +427,19 @@ Build sparse calibration matrix (targets x households x clones)
423427
- `takeup_rerand` -> `sparse_build` `data_flow`
424428
- `sparse_build` -> `build_matrix` `uses_library` (non-chunked path)
425429
- `sparse_build` -> `build_matrix_chunked` `uses_library` (chunked path)
426-
- `build_matrix` -> `stage2_calibration_package_writer` `data_flow`
427-
- `build_matrix_chunked` -> `stage2_calibration_package_writer` `data_flow`
430+
- `build_matrix` -> `stage2_payload_boundary` `data_flow`
431+
- `build_matrix_chunked` -> `stage2_payload_boundary` `data_flow`
432+
- `stage2_payload_boundary` -> `stage2_calibration_package_writer` `data_flow` (typed package payload)
428433
- `stage2_artifact_specs` -> `stage2_calibration_package_writer` `uses_utility` (package path)
429-
- `stage2_calibration_package_writer` -> `out_pkg` `produces_artifact`
434+
- `stage2_calibration_package_writer` -> `stage2_payload_writer` `uses_library` (pickle write)
435+
- `stage2_payload_writer` -> `out_pkg` `produces_artifact`
436+
- `out_pkg` -> `stage2_payload_reader` `data_flow`
430437
- `out_pkg` -> `stage2_calibration_package_contract_writer` `data_flow`
438+
- `stage2_payload_reader` -> `stage2_calibration_package_contract_writer` `uses_library` (summary and checksum)
431439
- `stage2_artifact_specs` -> `stage2_calibration_package_contract_writer` `uses_utility` (contract path)
432440
- `stage2_calibration_package_contract_writer` -> `out_contract` `produces_artifact`
441+
- `out_contract` -> `stage2_payload_writer` `data_flow` (sidecar contract material)
442+
- `stage2_payload_writer` -> `out_metadata` `produces_artifact` (sidecar metadata)
433443
- `out_pkg` -> `stage2_calibration_package_contract_validator` `validates`
434444
- `out_contract` -> `stage2_calibration_package_contract_validator` `validates`
435445
- `in_cps_s5` -> `stage2_calibration_package_contract_validator` `validates`

docs/generated/pipeline_api.json

Lines changed: 95 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -727,7 +727,7 @@
727727
"docstring": "",
728728
"id": "calibration_diagnostics",
729729
"kind": "function",
730-
"line": 1249,
730+
"line": 1245,
731731
"metadata": {
732732
"api_refs": [
733733
"policyengine_us_data.calibration.unified_calibration.compute_diagnostics"
@@ -1091,7 +1091,7 @@
10911091
"docstring": "Fit L0-regularized calibration weights.\n\nArgs:\n X_sparse: Sparse matrix (targets x records).\n targets: Target values array.\n lambda_l0: L0 regularization strength.\n epochs: Training epochs.\n device: Torch device.\n verbose_freq: Print frequency. Defaults to 10%.\n beta: L0 gate temperature.\n lambda_l2: L2 regularization strength.\n learning_rate: Optimizer learning rate.\n log_freq: Epochs between per-target CSV logs.\n None disables logging.\n log_path: Path for the per-target calibration log CSV.\n target_names: Human-readable target names for the log.\n initial_weights: Pre-computed initial weights. If None,\n computed from targets_df age targets.\n targets_df: Targets DataFrame, used to compute\n initial_weights when not provided.\n target_groups: Optional group ID per target row for balanced loss.\n resume_from: Path to a `.checkpoint.pt` file or `.npy`\n weights file to continue fitting from.\n checkpoint_path: Where to save resumable fit checkpoints.\n\nReturns:\n Weight array of shape (n_records,).",
10921092
"id": "fit_model",
10931093
"kind": "function",
1094-
"line": 893,
1094+
"line": 889,
10951095
"metadata": {
10961096
"api_refs": [
10971097
"policyengine_us_data.calibration.unified_calibration.fit_l0_weights"
@@ -1410,7 +1410,7 @@
14101410
"docstring": "Compute population-based initial weights from age targets.\n\nFor each congressional district, sums person_count targets where\ndomain_variable == \"age\" to get district population, then divides\nby the number of columns (households) active in that district.\n\nArgs:\n X_sparse: Sparse matrix (targets x records).\n targets_df: Targets DataFrame with columns: variable,\n domain_variable, geo_level, geographic_id, value.\n\nReturns:\n Weight array of shape (n_records,).",
14111411
"id": "init_weights",
14121412
"kind": "function",
1413-
"line": 814,
1413+
"line": 810,
14141414
"metadata": {
14151415
"api_refs": [
14161416
"policyengine_us_data.calibration.unified_calibration.compute_initial_weights"
@@ -3472,7 +3472,7 @@
34723472
"docstring": "Run unified calibration pipeline.\n\nArgs:\n dataset_path: Path to CPS h5 file.\n db_path: Path to policy_data.db.\n n_clones: Number of dataset clones.\n lambda_l0: L0 regularization strength.\n epochs: Training epochs.\n device: Torch device.\n seed: Random seed.\n domain_variables: Filter targets by domain variable.\n hierarchical_domains: Domains for hierarchical\n uprating + CD reconciliation.\n skip_takeup_rerandomize: Skip takeup step.\n skip_source_impute: Skip ACS/SIPP/SCF imputations.\n target_config: Parsed target config dict.\n target_config_path: Path to target config, for provenance.\n target_config_identity: Resolved target config path/checksum identity.\n build_only: If True, save package and skip fitting.\n package_path: Load pre-built package (skip build).\n package_output_path: Where to save calibration package.\n beta: L0 gate temperature.\n lambda_l2: L2 regularization strength.\n learning_rate: Optimizer learning rate.\n log_freq: Epochs between per-target CSV logs.\n log_path: Path for per-target calibration log CSV.\n resume_from: Path to a checkpoint or weights file to\n continue fitting from.\n checkpoint_path: Where to save resumable fit checkpoints.\n chunked_matrix: Build matrix in clone-household chunks.\n chunk_size: Clone-household columns per chunk.\n chunk_dir: Directory for chunked COO/H5 artifacts.\n keep_chunks: Keep temporary chunk H5 files.\n resume_chunks: Reuse existing chunk COO files.\n\nReturns:\n (weights, targets_df, X_sparse, target_names, geography_info)\n weights is None when build_only=True.\n geography_info is a dict with cd_geoid and base_n_records.",
34733473
"id": "run_calibration",
34743474
"kind": "function",
3475-
"line": 1375,
3475+
"line": 1371,
34763476
"metadata": {
34773477
"api_refs": [
34783478
"policyengine_us_data.calibration.unified_calibration.run_calibration"
@@ -3801,7 +3801,7 @@
38013801
"docstring": "Validate that a Stage 2 sidecar describes the calibration package.",
38023802
"id": "stage2_calibration_package_contract_validator",
38033803
"kind": "function",
3804-
"line": 379,
3804+
"line": 252,
38053805
"metadata": {
38063806
"api_refs": [
38073807
"policyengine_us_data.stage_contracts.calibration_package.validate_calibration_package_contract"
@@ -3822,14 +3822,14 @@
38223822
]
38233823
},
38243824
"object_path": "policyengine_us_data.stage_contracts.calibration_package.validate_calibration_package_contract",
3825-
"signature": "def validate_calibration_package_contract(*, package_path: Path, contract_path: Path | None = None, package: Mapping[str, Any] | None = None, dataset_path: Path | None = None, db_path: Path | None = None) -> StageContract",
3825+
"signature": "def validate_calibration_package_contract(*, package_path: Path, contract_path: Path | None = None, package: CalibrationPackagePayload | Mapping[str, Any] | None = None, dataset_path: Path | None = None, db_path: Path | None = None) -> StageContract",
38263826
"source_file": "policyengine_us_data/stage_contracts/calibration_package.py"
38273827
},
38283828
"stage2_calibration_package_contract_writer": {
38293829
"docstring": "Write and return the Stage 2 calibration-package contract.",
38303830
"id": "stage2_calibration_package_contract_writer",
38313831
"kind": "function",
3832-
"line": 322,
3832+
"line": 195,
38333833
"metadata": {
38343834
"api_refs": [
38353835
"policyengine_us_data.stage_contracts.calibration_package.write_calibration_package_contract"
@@ -3853,14 +3853,14 @@
38533853
]
38543854
},
38553855
"object_path": "policyengine_us_data.stage_contracts.calibration_package.write_calibration_package_contract",
3856-
"signature": "def write_calibration_package_contract(*, package_path: Path, dataset_path: Path, db_path: Path, package: Mapping[str, Any], parameters: CalibrationPackageParameters | Mapping[str, Any], run_id: str | None, completed_at: str, started_at: str | None = None, duration_s: float | None = None, code_sha: str | None = None, package_version: str | None = None, contract_path: Path | None = None) -> StageContract",
3856+
"signature": "def write_calibration_package_contract(*, package_path: Path, dataset_path: Path, db_path: Path, package: CalibrationPackagePayload | Mapping[str, Any], parameters: CalibrationPackageParameters | Mapping[str, Any], run_id: str | None, completed_at: str, started_at: str | None = None, duration_s: float | None = None, code_sha: str | None = None, package_version: str | None = None, contract_path: Path | None = None) -> StageContract",
38573857
"source_file": "policyengine_us_data/stage_contracts/calibration_package.py"
38583858
},
38593859
"stage2_calibration_package_writer": {
38603860
"docstring": "Save calibration package to pickle.\n\nArgs:\n path: Output file path.\n X_sparse: Sparse matrix.\n targets_df: Targets DataFrame.\n target_names: Target name list.\n metadata: Run metadata dict.\n initial_weights: Pre-computed initial weight array.\n cd_geoid: CD GEOID array from geography assignment.\n block_geoid: Block GEOID array from geography assignment.",
38613861
"id": "stage2_calibration_package_writer",
38623862
"kind": "function",
3863-
"line": 661,
3863+
"line": 663,
38643864
"metadata": {
38653865
"api_refs": [
38663866
"policyengine_us_data.calibration.unified_calibration.save_calibration_package"
@@ -3914,11 +3914,95 @@
39143914
"signature": "def stage2_input_bundle_from_artifacts_dir(artifacts_dir: str | Path) -> Stage2InputBundle",
39153915
"source_file": "policyengine_us_data/calibration_package/specs.py"
39163916
},
3917+
"stage2_payload_boundary": {
3918+
"docstring": "Typed access to the dictionary persisted in `calibration_package.pkl`.",
3919+
"id": "stage2_payload_boundary",
3920+
"kind": "class",
3921+
"line": 114,
3922+
"metadata": {
3923+
"api_refs": [
3924+
"policyengine_us_data.calibration_package.payload.CalibrationPackagePayload"
3925+
],
3926+
"artifacts_in": "[CALIBRATION_PACKAGE_FILENAME]",
3927+
"description": "Typed access to the calibration_package.pkl matrix, targets, metadata, geography arrays, and compatibility warnings.",
3928+
"id": "stage2_payload_boundary",
3929+
"label": "Stage 2 Package Payload",
3930+
"node_type": "library",
3931+
"pathways": [
3932+
"calibration_package"
3933+
],
3934+
"source_file": "policyengine_us_data/calibration_package/payload.py",
3935+
"stability": "moving",
3936+
"status": "current",
3937+
"validation_commands": [
3938+
"uv run pytest tests/unit/calibration_package/test_payload.py"
3939+
]
3940+
},
3941+
"object_path": "policyengine_us_data.calibration_package.payload.CalibrationPackagePayload",
3942+
"signature": "class CalibrationPackagePayload",
3943+
"source_file": "policyengine_us_data/calibration_package/payload.py"
3944+
},
3945+
"stage2_payload_reader": {
3946+
"docstring": "Read typed Stage 2 package payloads from disk.",
3947+
"id": "stage2_payload_reader",
3948+
"kind": "class",
3949+
"line": 328,
3950+
"metadata": {
3951+
"api_refs": [
3952+
"policyengine_us_data.calibration_package.payload.CalibrationPackageReader"
3953+
],
3954+
"artifacts_in": "[CALIBRATION_PACKAGE_FILENAME]",
3955+
"description": "Load calibration_package.pkl through the typed Stage 2 payload boundary and expose checksum/summary material.",
3956+
"id": "stage2_payload_reader",
3957+
"label": "Stage 2 Payload Reader",
3958+
"node_type": "library",
3959+
"pathways": [
3960+
"calibration_package"
3961+
],
3962+
"source_file": "policyengine_us_data/calibration_package/payload.py",
3963+
"stability": "moving",
3964+
"status": "current",
3965+
"validation_commands": [
3966+
"uv run pytest tests/unit/calibration_package/test_payload.py"
3967+
]
3968+
},
3969+
"object_path": "policyengine_us_data.calibration_package.payload.CalibrationPackageReader",
3970+
"signature": "class CalibrationPackageReader",
3971+
"source_file": "policyengine_us_data/calibration_package/payload.py"
3972+
},
3973+
"stage2_payload_writer": {
3974+
"docstring": "Write typed Stage 2 package payloads and metadata sidecars.",
3975+
"id": "stage2_payload_writer",
3976+
"kind": "class",
3977+
"line": 385,
3978+
"metadata": {
3979+
"api_refs": [
3980+
"policyengine_us_data.calibration_package.payload.CalibrationPackageWriter"
3981+
],
3982+
"artifacts_out": "[CALIBRATION_PACKAGE_FILENAME, CALIBRATION_PACKAGE_METADATA_FILENAME]",
3983+
"description": "Persist calibration_package.pkl and derive calibration_package_meta.json from typed payload and contract material.",
3984+
"id": "stage2_payload_writer",
3985+
"label": "Stage 2 Payload Writer",
3986+
"node_type": "library",
3987+
"pathways": [
3988+
"calibration_package"
3989+
],
3990+
"source_file": "policyengine_us_data/calibration_package/payload.py",
3991+
"stability": "moving",
3992+
"status": "current",
3993+
"validation_commands": [
3994+
"uv run pytest tests/unit/calibration_package/test_payload.py"
3995+
]
3996+
},
3997+
"object_path": "policyengine_us_data.calibration_package.payload.CalibrationPackageWriter",
3998+
"signature": "class CalibrationPackageWriter",
3999+
"source_file": "policyengine_us_data/calibration_package/payload.py"
4000+
},
39174001
"stage2_target_config_apply": {
39184002
"docstring": "Filter target rows before matrix construction.",
39194003
"id": "stage2_target_config_apply",
39204004
"kind": "function",
3921-
"line": 631,
4005+
"line": 633,
39224006
"metadata": {
39234007
"api_refs": [
39244008
"policyengine_us_data.calibration.unified_calibration.apply_target_config_to_targets"
@@ -3973,7 +4057,7 @@
39734057
"docstring": "Load target include/exclude config from YAML.\n\nArgs:\n path: Path to YAML config file.\n\nReturns:\n Parsed config dict with include and exclude lists.",
39744058
"id": "stage2_target_config_load",
39754059
"kind": "function",
3976-
"line": 525,
4060+
"line": 527,
39774061
"metadata": {
39784062
"api_refs": [
39794063
"policyengine_us_data.calibration.unified_calibration.load_target_config"

0 commit comments

Comments
 (0)