Skip to content

Commit c7929cd

Browse files
committed
Refresh Stage 2 docs and identity checks
1 parent 578d1bf commit c7929cd

7 files changed

Lines changed: 601 additions & 88 deletions

File tree

docs/engineering/pipeline-map.md

Lines changed: 29 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -378,19 +378,32 @@ Build sparse calibration matrix (targets x households x clones)
378378
| `takeup_rerand` Block-Level Takeup Re-randomization | `process` | `unknown` | `unknown` | |
379379
| `sparse_build` Sparse Matrix Construction | `process` | `unknown` | `unknown` | |
380380
| `out_pkg` calibration_package.pkl | `artifact` | `unknown` | `unknown` | |
381+
| `out_contract` calibration_package_contract.json | `artifact` | `unknown` | `unknown` | |
381382
| `util_sql` sqlalchemy | `utility` | `unknown` | `unknown` | |
382383
| `util_pool` ProcessPoolExecutor | `utility` | `unknown` | `unknown` | |
383384
| `util_takeup_s5` compute_block_takeup_for_entities() | `utility` | `unknown` | `unknown` | |
384385
| `util_scipy` scipy.sparse | `utility` | `unknown` | `unknown` | |
386+
| `stage2_target_config_identity` Stage 2 Target Config Identity | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.specs.resolve_target_config_identity` |
387+
| `stage2_target_catalog_load` Load Stage 2 Target Config | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_calibration.load_target_config` |
388+
| `stage2_target_config_apply` Apply Stage 2 Target Config | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_calibration.apply_target_config_to_targets` |
385389
| `state_precomp` Per-State Simulation Precomputation | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder._compute_single_state` |
386390
| `clone_assembly` Clone Value Assembly | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder._assemble_clone_values_standalone` |
391+
| `build_matrix` Build Calibration Matrix | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix` |
392+
| `build_matrix_chunked` Build Calibration Matrix In Chunks | `library` | `current` | `experimental` | `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix_chunked` |
393+
| `stage2_calibration_package_writer` Stage 2 Package Writer | `library` | `current` | `moving` | `policyengine_us_data.calibration.unified_calibration.save_calibration_package` |
394+
| `stage2_artifact_specs` Stage 2 Artifact Specs | `library` | `current` | `moving` | `policyengine_us_data.calibration_package.specs.calibration_package_artifact_paths` |
395+
| `stage2_calibration_package_contract_writer` Stage 2 Contract Writer | `library` | `current` | `moving` | `policyengine_us_data.stage_contracts.calibration_package.write_calibration_package_contract` |
396+
| `stage2_calibration_package_contract_validator` Stage 2 Contract Validator | `validation` | `current` | `moving` | `policyengine_us_data.stage_contracts.calibration_package.validate_calibration_package_contract` |
387397

388398
#### Edges
389399

390400
- `in_cps_s5` -> `target_resolve` `data_flow`
391401
- `in_db_s5` -> `target_resolve` `external_source` (SQL targets)
392-
- `in_config_s5` -> `target_resolve` `data_flow` (include list)
393-
- `target_resolve` -> `target_uprate` `data_flow`
402+
- `in_config_s5` -> `stage2_target_config_identity` `data_flow` (config file)
403+
- `stage2_target_config_identity` -> `stage2_target_catalog_load` `data_flow` (resolved path and checksum)
404+
- `stage2_target_catalog_load` -> `stage2_target_config_apply` `data_flow` (include/exclude rules)
405+
- `target_resolve` -> `stage2_target_config_apply` `data_flow` (candidate targets)
406+
- `stage2_target_config_apply` -> `target_uprate` `data_flow` (selected targets)
394407
- `target_uprate` -> `geo_build` `data_flow`
395408
- `geo_build` -> `constraint_resolve` `data_flow`
396409
- `constraint_resolve` -> `state_precomp` `data_flow`
@@ -399,7 +412,19 @@ Build sparse calibration matrix (targets x households x clones)
399412
- `in_blocks_s5` -> `clone_assembly` `data_flow` (block populations)
400413
- `clone_assembly` -> `takeup_rerand` `data_flow`
401414
- `takeup_rerand` -> `sparse_build` `data_flow`
402-
- `sparse_build` -> `out_pkg` `produces_artifact`
415+
- `sparse_build` -> `build_matrix` `uses_library` (non-chunked path)
416+
- `sparse_build` -> `build_matrix_chunked` `uses_library` (chunked path)
417+
- `build_matrix` -> `stage2_calibration_package_writer` `data_flow`
418+
- `build_matrix_chunked` -> `stage2_calibration_package_writer` `data_flow`
419+
- `stage2_artifact_specs` -> `stage2_calibration_package_writer` `uses_utility` (package path)
420+
- `stage2_calibration_package_writer` -> `out_pkg` `produces_artifact`
421+
- `out_pkg` -> `stage2_calibration_package_contract_writer` `data_flow`
422+
- `stage2_artifact_specs` -> `stage2_calibration_package_contract_writer` `uses_utility` (contract path)
423+
- `stage2_calibration_package_contract_writer` -> `out_contract` `produces_artifact`
424+
- `out_pkg` -> `stage2_calibration_package_contract_validator` `validates`
425+
- `out_contract` -> `stage2_calibration_package_contract_validator` `validates`
426+
- `in_cps_s5` -> `stage2_calibration_package_contract_validator` `validates`
427+
- `in_db_s5` -> `stage2_calibration_package_contract_validator` `validates`
403428
- `util_sql` -> `target_resolve` `uses_utility`
404429
- `util_pool` -> `state_precomp` `uses_utility`
405430
- `util_takeup_s5` -> `takeup_rerand` `uses_utility`
@@ -778,22 +803,6 @@ def build_datasets(upload: bool = False, branch: str = 'main', sequential: bool
778803

779804
Build all datasets with preemption-resilient checkpointing.
780805

781-
### `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix`
782-
783-
```python
784-
def build_matrix(self, geography, sim, target_filter: Optional[dict] = None, hierarchical_domains: Optional[List[str]] = None, cache_dir: Optional[str] = None, sim_modifier = None, rerandomize_takeup: bool = True, county_level: bool = True, workers: int = 1) -> Tuple[pd.DataFrame, sparse.csr_matrix, List[str]]
785-
```
786-
787-
Build sparse calibration matrix.
788-
789-
### `policyengine_us_data.calibration.unified_matrix_builder.UnifiedMatrixBuilder.build_matrix_chunked`
790-
791-
```python
792-
def build_matrix_chunked(self, geography, sim, target_filter: Optional[dict] = None, hierarchical_domains: Optional[List[str]] = None, chunk_size: int = 25000, chunk_dir: Optional[str] = None, keep_chunks: bool = False, resume_chunks: bool = False, rerandomize_takeup: bool = True, parallel: bool = False, num_matrix_workers: int = 50, run_id: str = '') -> Tuple[pd.DataFrame, sparse.csr_matrix, List[str]]
793-
```
794-
795-
Build a sparse matrix by materializing mixed-geography chunks.
796-
797806
### `modal_app.local_area._build_publishing_input_bundle`
798807

799808
```python
@@ -1389,7 +1398,7 @@ Compute the scope fingerprint while preserving pinned resume values.
13891398
### `policyengine_us_data.calibration.unified_calibration.run_calibration`
13901399

13911400
```python
1392-
def run_calibration(dataset_path: str, db_path: str, n_clones: int = DEFAULT_N_CLONES, lambda_l0: float = 1e-08, epochs: int = DEFAULT_EPOCHS, device: str = 'cpu', seed: int = 42, domain_variables: list = None, hierarchical_domains: list = None, skip_takeup_rerandomize: bool = False, skip_source_impute: bool = True, skip_county: bool = True, target_config: dict = None, target_config_path: str = None, build_only: bool = False, package_path: str = None, package_output_path: str = None, beta: float = BETA, lambda_l2: float = LAMBDA_L2, learning_rate: float = LEARNING_RATE, log_freq: int = None, log_path: str = None, workers: int = 1, resume_from: str = None, checkpoint_path: str = None, chunked_matrix: bool = False, chunk_size: int = 25000, chunk_dir: str = None, keep_chunks: bool = False, resume_chunks: bool = False, parallel: bool = False, num_matrix_workers: int = 50, run_id: str = '')
1401+
def run_calibration(dataset_path: str, db_path: str, n_clones: int = DEFAULT_N_CLONES, lambda_l0: float = 1e-08, epochs: int = DEFAULT_EPOCHS, device: str = 'cpu', seed: int = 42, domain_variables: list = None, hierarchical_domains: list = None, skip_takeup_rerandomize: bool = False, skip_source_impute: bool = True, skip_county: bool = True, target_config: dict = None, target_config_path: str = None, target_config_identity: TargetConfigIdentity | None = None, build_only: bool = False, package_path: str = None, package_output_path: str = None, beta: float = BETA, lambda_l2: float = LAMBDA_L2, learning_rate: float = LEARNING_RATE, log_freq: int = None, log_path: str = None, workers: int = 1, resume_from: str = None, checkpoint_path: str = None, chunked_matrix: bool = False, chunk_size: int = 25000, chunk_dir: str = None, keep_chunks: bool = False, resume_chunks: bool = False, parallel: bool = False, num_matrix_workers: int = 50, run_id: str = '')
13931402
```
13941403

13951404
Run unified calibration pipeline.

0 commit comments

Comments
 (0)