Skip to content

Commit 0ecf3d1

Browse files
baogorekclaude
andcommitted
Upload source_imputed H5 to HF calibration/ path in data_build.py
The data_build.py upload step now pushes source_imputed to calibration/source_imputed_stratified_extended_cps.h5 on HF so the downstream calibration pipeline (build-matrices, calibrate) can download it. This closes the gap in the all-Modal pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c31e88a commit 0ecf3d1

1 file changed

Lines changed: 20 additions & 0 deletions

File tree

modal_app/data_build.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -476,6 +476,26 @@ def build_datasets(
476476
"policyengine_us_data/storage/upload_completed_datasets.py",
477477
env=env,
478478
)
479+
# Upload source_imputed to calibration/ path for downstream pipeline
480+
print("Uploading source_imputed dataset to HF calibration/...")
481+
subprocess.run(
482+
[
483+
"uv",
484+
"run",
485+
"python",
486+
"-c",
487+
"from policyengine_us_data.utils.huggingface import upload; "
488+
"upload("
489+
"'policyengine_us_data/storage/"
490+
"source_imputed_stratified_extended_cps_2024.h5', "
491+
"'policyengine/policyengine-us-data', "
492+
"'calibration/"
493+
"source_imputed_stratified_extended_cps.h5')",
494+
],
495+
check=True,
496+
env=env,
497+
)
498+
print("Source imputed dataset uploaded to HF")
479499

480500
# Clean up checkpoints after successful completion
481501
cleanup_checkpoints(branch, checkpoint_volume)

0 commit comments

Comments
 (0)