Skip to content

Commit 25cf067

Browse files
committed
Add Stage 2 input artifact bundles
1 parent a475205 commit 25cf067

12 files changed

Lines changed: 751 additions & 79 deletions

File tree

changelog.d/1065.changed

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Stage 2 calibration package construction now resolves its inputs and outputs through run-scoped artifact bundles.

docs/engineering/pipeline-map.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,7 @@ Build sparse calibration matrix (targets x households x clones)
367367

368368
| Node | Type | Status | Stability | API refs |
369369
| --- | --- | --- | --- | --- |
370+
| `in_stage1_contract_s2` dataset_build_output.json | `artifact` | `unknown` | `unknown` | |
370371
| `in_cps_s5` source_imputed_stratified_extended_cps.h5 | `artifact` | `unknown` | `unknown` | |
371372
| `in_db_s5` policy_data.db | `external` | `unknown` | `unknown` | |
372373
| `in_config_s5` target_config.yaml | `artifact` | `unknown` | `unknown` | |
@@ -383,21 +384,29 @@ Build sparse calibration matrix (targets x households x clones)
383384
| `util_pool` ProcessPoolExecutor | `utility` | `unknown` | `unknown` | |
384385
| `util_takeup_s5` compute_block_takeup_for_entities() | `utility` | `unknown` | `unknown` | |
385386
| `util_scipy` scipy.sparse | `utility` | `unknown` | `unknown` | |
387+
| `stage2_input_bundle` Stage 2 Input Bundle | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.specs.stage2_input_bundle_from_artifacts_dir` |
388+
| `stage2_build_context` Stage 2 Build Context | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.specs.stage2_build_context_for_run` |
389+
| `stage2_artifact_specs` Stage 2 Artifact Specs | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.specs.calibration_package_artifact_paths` |
390+
| `stage2_calibration_package_writer` Stage 2 Package Writer | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_calibration.save_calibration_package` |
386391
| `stage2_target_config_identity` Stage 2 Target Config Identity | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.specs.resolve_target_config_identity` |
387392
| `stage2_target_config_load` Load Stage 2 Target Config | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_calibration.load_target_config` |
388393
| `stage2_target_config_apply` Apply Stage 2 Target Config | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_calibration.apply_target_config_to_targets` |
389394
| `state_precomp` Per-State Simulation Precomputation | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder._compute_single_state` |
390395
| `clone_assembly` Clone Value Assembly | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder._assemble_clone_values_standalone` |
391396
| `build_matrix` Build Calibration Matrix | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix` |
392397
| `build_matrix_chunked` Build Calibration Matrix In Chunks | `library` | `current` | `experimental` | `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix_chunked` |
393-
| `stage2_calibration_package_writer` Stage 2 Package Writer | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_calibration.save_calibration_package` |
394-
| `stage2_artifact_specs` Stage 2 Artifact Specs | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.specs.calibration_package_artifact_paths` |
395398
| `stage2_calibration_package_contract_writer` Stage 2 Contract Writer | `library` | `current` | `moving` | `policyengine_us_data.stage_contracts.calibration_package.write_calibration_package_contract` |
396399
| `stage2_calibration_package_contract_validator` Stage 2 Contract Validator | `validation` | `current` | `moving` | `policyengine_us_data.stage_contracts.calibration_package.validate_calibration_package_contract` |
397400

398401
#### Edges
399402

400-
- `in_cps_s5` -> `target_resolve` `data_flow`
403+
- `in_stage1_contract_s2` -> `stage2_input_bundle` `data_flow` (preferred input contract)
404+
- `in_cps_s5` -> `stage2_input_bundle` `data_flow` (compatibility fallback)
405+
- `in_db_s5` -> `stage2_input_bundle` `external_source` (compatibility fallback)
406+
- `stage2_input_bundle` -> `stage2_build_context` `data_flow` (validated inputs)
407+
- `stage2_artifact_specs` -> `stage2_build_context` `uses_utility` (output bundle paths)
408+
- `stage2_build_context` -> `target_resolve` `data_flow` (dataset and database paths)
409+
- `stage2_build_context` -> `stage2_calibration_package_writer` `uses_utility` (package output bundle)
401410
- `in_db_s5` -> `target_resolve` `external_source` (SQL targets)
402411
- `in_config_s5` -> `stage2_target_config_identity` `data_flow` (config file)
403412
- `stage2_target_config_identity` -> `stage2_target_config_load` `data_flow` (resolved path and checksum)

docs/generated/pipeline_api.json

Lines changed: 66 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3086,7 +3086,7 @@
30863086
"docstring": "Promote a completed pipeline run to production.\n\n1. Verify run status is \"completed\"\n2. Promote every staged artifact in one Hugging Face commit\n3. Upload/copy every artifact to GCS\n4. Finalize release_manifest.json, tag the release, and update\n version_manifest.json\n5. Update run status to \"promoted\"\n\nArgs:\n run_id: The run ID to promote.\n candidate_version: Candidate staging scope used for staged source files.\n release_version: Stable version used for final release metadata.\n\nReturns:\n Summary message.",
30873087
"id": "promote_pipeline_run",
30883088
"kind": "function",
3089-
"line": 1910,
3089+
"line": 1922,
30903090
"metadata": {
30913091
"api_refs": [
30923092
"modal_app.pipeline.promote_run"
@@ -3507,7 +3507,7 @@
35073507
"docstring": "Run the full pipeline end-to-end.\n\nArgs:\n branch: Git branch to build from.\n gpu: GPU type for regional calibration.\n epochs: Training epochs for regional calibration.\n national_gpu: GPU type for national calibration.\n national_epochs: Training epochs for national.\n num_workers: Number of parallel H5 workers.\n n_clones: Number of clones for H5 building.\n skip_national: Skip national calibration/H5.\n resume_run_id: Resume a previously failed run.\n clear_checkpoints: Wipe ALL checkpoints before building\n (default False). Normally not needed \u2014 checkpoints are\n scoped by commit SHA, so stale ones from other commits\n are cleaned automatically. Use True only to force a\n full rebuild of the current commit.\n candidate_version: Candidate staging scope used for HF staging.\n release_version: Final stable release version. Usually empty until\n promotion.\n base_release_version: Stable release current when this candidate was\n built.\n release_bump: Intended SemVer bump for this candidate.\n sha_override: Exact source SHA deployed by GitHub Actions. When\n provided, this is recorded instead of reading the current\n branch tip.\n run_id: Cross-system run ID created by GitHub.\n run_context: Serialized run context from the launcher workflow.\n modal_app_name: Deployed Modal app name for this run.\n modal_environment: Modal environment used for this run.\n chunked_matrix: Build the calibration matrix in clone-household\n chunks instead of the non-chunked path. Opt-in; default off.\n chunk_size: Clone-household columns per chunk when\n ``chunked_matrix`` is True.\n parallel_matrix: Fan chunked matrix building across Modal\n workers via ``build_matrix_chunk_worker``. Only meaningful\n when ``chunked_matrix`` is True; ignored otherwise.\n num_matrix_workers: Number of Modal workers when\n ``parallel_matrix`` is True.\n\nReturns:\n The run ID for use with promote.",
35083508
"id": "run_modal_pipeline",
35093509
"kind": "function",
3510-
"line": 943,
3510+
"line": 944,
35113511
"metadata": {
35123512
"api_refs": [
35133513
"modal_app.pipeline.run_pipeline"
@@ -3709,13 +3709,14 @@
37093709
"docstring": "Return canonical Stage 2 paths rooted in an artifacts directory.",
37103710
"id": "stage2_artifact_specs",
37113711
"kind": "function",
3712-
"line": 96,
3712+
"line": 378,
37133713
"metadata": {
37143714
"api_refs": [
37153715
"policyengine_us_data.calibration_package.specs.calibration_package_artifact_paths"
37163716
],
3717-
"artifacts_out": "[CALIBRATION_PACKAGE_FILENAME, CALIBRATION_PACKAGE_CONTRACT_FILENAME]",
3718-
"description": "Centralize calibration package, contract, metadata, and matrix-build artifact paths.",
3717+
"artifacts_in": "[SOURCE_DATASET_FILENAME, TARGET_DATABASE_FILENAME]",
3718+
"artifacts_out": "[CALIBRATION_PACKAGE_FILENAME, CALIBRATION_PACKAGE_METADATA_FILENAME, CALIBRATION_PACKAGE_CONTRACT_FILENAME]",
3719+
"description": "Centralize Stage 2 input, package, contract, metadata, report, and matrix-build artifact names.",
37193720
"id": "stage2_artifact_specs",
37203721
"label": "Stage 2 Artifact Specs",
37213722
"node_type": "library",
@@ -3730,7 +3731,36 @@
37303731
]
37313732
},
37323733
"object_path": "policyengine_us_data.calibration_package.specs.calibration_package_artifact_paths",
3733-
"signature": "def calibration_package_artifact_paths(artifacts_dir: str | Path) -> CalibrationPackageArtifactPaths",
3734+
"signature": "def calibration_package_artifact_paths(artifacts_dir: str | Path) -> CalibrationPackageOutputBundle",
3735+
"source_file": "policyengine_us_data/calibration_package/specs.py"
3736+
},
3737+
"stage2_build_context": {
3738+
"docstring": "Return Stage 2 run context, preferring the Stage 1 handoff contract.",
3739+
"id": "stage2_build_context",
3740+
"kind": "function",
3741+
"line": 323,
3742+
"metadata": {
3743+
"api_refs": [
3744+
"policyengine_us_data.calibration_package.specs.stage2_build_context_for_run"
3745+
],
3746+
"artifacts_in": "[DATASET_BUILD_OUTPUT_CONTRACT_FILENAME, SOURCE_DATASET_FILENAME, TARGET_DATABASE_FILENAME]",
3747+
"artifacts_out": "[CALIBRATION_PACKAGE_FILENAME, CALIBRATION_PACKAGE_CONTRACT_FILENAME]",
3748+
"description": "Bind one run_id to canonical Stage 2 input and output bundles before remote package construction starts.",
3749+
"id": "stage2_build_context",
3750+
"label": "Stage 2 Build Context",
3751+
"node_type": "library",
3752+
"pathways": [
3753+
"calibration_package"
3754+
],
3755+
"source_file": "policyengine_us_data/calibration_package/specs.py",
3756+
"stability": "moving",
3757+
"status": "current",
3758+
"validation_commands": [
3759+
"uv run pytest tests/unit/calibration_package/test_specs.py"
3760+
]
3761+
},
3762+
"object_path": "policyengine_us_data.calibration_package.specs.stage2_build_context_for_run",
3763+
"signature": "def stage2_build_context_for_run(pipeline_mount: str | Path, run_id: str | None = '', *, stage1_contract_path: str | Path | None = None) -> Stage2BuildContext",
37343764
"source_file": "policyengine_us_data/calibration_package/specs.py"
37353765
},
37363766
"stage2_calibration_package_contract_validator": {
@@ -3822,6 +3852,34 @@
38223852
"signature": "def save_calibration_package(path: str, X_sparse, targets_df: 'pd.DataFrame', target_names: list, metadata: dict, initial_weights: np.ndarray = None, cd_geoid: np.ndarray = None, block_geoid: np.ndarray = None) -> None",
38233853
"source_file": "policyengine_us_data/calibration/unified_calibration.py"
38243854
},
3855+
"stage2_input_bundle": {
3856+
"docstring": "Return a compatibility Stage 2 input bundle from canonical filenames.",
3857+
"id": "stage2_input_bundle",
3858+
"kind": "function",
3859+
"line": 237,
3860+
"metadata": {
3861+
"api_refs": [
3862+
"policyengine_us_data.calibration_package.specs.stage2_input_bundle_from_artifacts_dir"
3863+
],
3864+
"artifacts_in": "[DATASET_BUILD_OUTPUT_CONTRACT_FILENAME, SOURCE_DATASET_FILENAME, TARGET_DATABASE_FILENAME]",
3865+
"description": "Resolve the source-imputed dataset and policy target database from a Stage 1 contract or compatibility filename fallback.",
3866+
"id": "stage2_input_bundle",
3867+
"label": "Stage 2 Input Bundle",
3868+
"node_type": "library",
3869+
"pathways": [
3870+
"calibration_package"
3871+
],
3872+
"source_file": "policyengine_us_data/calibration_package/specs.py",
3873+
"stability": "moving",
3874+
"status": "current",
3875+
"validation_commands": [
3876+
"uv run pytest tests/unit/calibration_package/test_specs.py"
3877+
]
3878+
},
3879+
"object_path": "policyengine_us_data.calibration_package.specs.stage2_input_bundle_from_artifacts_dir",
3880+
"signature": "def stage2_input_bundle_from_artifacts_dir(artifacts_dir: str | Path) -> Stage2InputBundle",
3881+
"source_file": "policyengine_us_data/calibration_package/specs.py"
3882+
},
38253883
"stage2_target_config_apply": {
38263884
"docstring": "Filter target rows before matrix construction.",
38273885
"id": "stage2_target_config_apply",
@@ -3853,7 +3911,7 @@
38533911
"docstring": "Resolve the target config identity used by Stage 2 package construction.",
38543912
"id": "stage2_target_config_identity",
38553913
"kind": "function",
3856-
"line": 127,
3914+
"line": 410,
38573915
"metadata": {
38583916
"api_refs": [
38593917
"policyengine_us_data.calibration_package.specs.resolve_target_config_identity"
@@ -4387,7 +4445,7 @@
43874445
"docstring": "Verify deployed-image imports and subprocess seams.",
43884446
"id": "verify_runtime_seams",
43894447
"kind": "function",
4390-
"line": 569,
4448+
"line": 570,
43914449
"metadata": {
43924450
"api_refs": [
43934451
"modal_app.pipeline.verify_runtime_seams"

0 commit comments

Comments
 (0)