Skip to content

Add Stage 1 artifact staging boundary#1050

Merged
anth-volk merged 1 commit into
mainfrom
agent/stage-1/pr-2-build-context-artifact-staging
May 21, 2026
Merged

Add Stage 1 artifact staging boundary#1050
anth-volk merged 1 commit into
mainfrom
agent/stage-1/pr-2-build-context-artifact-staging

Conversation

@anth-volk
Copy link
Copy Markdown
Collaborator

@anth-volk anth-volk commented May 19, 2026

Fixes #1048

Summary

  • Add DatasetBuildContext, PipelineArtifactStager, and a contract-builder facade for the Stage 1 dataset-build handoff.
  • Replace open-coded Modal artifact copy logic with declared Stage 1 artifact staging, including the yearless source-imputed alias and checkpoint stats artifact.
  • Emit dataset_inventory.json, source_dataset_schema_summary.json, and target_database_schema_summary.json, then attach those as canonical DiagnosticRef entries on dataset_build_output.json.
  • Rebase the PR onto current main and keep the docs-facing pipeline metadata in sync with the new diagnostic artifacts.

Validation

  • ruff check modal_app/data_build.py policyengine_us_data/build_datasets/__init__.py policyengine_us_data/build_datasets/artifacts.py policyengine_us_data/build_datasets/context.py policyengine_us_data/build_datasets/contracts.py policyengine_us_data/build_datasets/diagnostics.py policyengine_us_data/build_datasets/staging.py policyengine_us_data/stage_contracts/dataset_build.py tests/unit/test_build_dataset_specs.py tests/unit/test_build_dataset_staging.py tests/unit/test_dataset_build_stage_contract.py tests/unit/test_modal_data_build.py
  • ruff format --check modal_app/data_build.py policyengine_us_data/build_datasets/__init__.py policyengine_us_data/build_datasets/artifacts.py policyengine_us_data/build_datasets/context.py policyengine_us_data/build_datasets/contracts.py policyengine_us_data/build_datasets/diagnostics.py policyengine_us_data/build_datasets/staging.py policyengine_us_data/stage_contracts/dataset_build.py tests/unit/test_build_dataset_specs.py tests/unit/test_build_dataset_staging.py tests/unit/test_dataset_build_stage_contract.py tests/unit/test_modal_data_build.py
  • uv run --no-sync pytest tests/unit/test_build_dataset_specs.py tests/unit/test_build_dataset_staging.py tests/unit/test_dataset_build_stage_contract.py tests/unit/test_modal_data_build.py tests/unit/test_pipeline_doc_guards.py tests/unit/test_pipeline_docs_extractor.py (45 passed)
  • uv run --no-sync --with pyyaml python scripts/run_quality_guards.py
  • uv run --no-sync --with pyyaml python scripts/extract_pipeline_docs.py --json /private/tmp/pr2-pipeline-docs-check/pipeline_map.json --api-json /private/tmp/pr2-pipeline-docs-check/pipeline_api.json --markdown /private/tmp/pr2-pipeline-docs-check/pipeline-map.md
  • make lint

@anth-volk anth-volk changed the base branch from agent/stage-1/pr-1-specs-foundation to main May 20, 2026 20:37
@anth-volk anth-volk force-pushed the agent/stage-1/pr-2-build-context-artifact-staging branch 3 times, most recently from 21eba44 to 1c95362 Compare May 21, 2026 16:55
@anth-volk anth-volk force-pushed the agent/stage-1/pr-2-build-context-artifact-staging branch from 1c95362 to 9877333 Compare May 21, 2026 17:11
@anth-volk anth-volk marked this pull request as ready for review May 21, 2026 17:48
@anth-volk anth-volk merged commit c968309 into main May 21, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stage 1 build context and artifact staging boundary

1 participant