Skip to content

Commit ec1b9c7

Browse files
committed
Merge remote-tracking branch 'origin/main' into agent/stage-1/pr-1-specs-foundation
2 parents f4d5a4d + 1017bb8 commit ec1b9c7

18 files changed

Lines changed: 3272 additions & 2 deletions

File tree

changelog.d/1022.added

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Added typed Stage 5 release candidate schemas and Stage 4 candidate bundle readers.

docs/engineering/skills/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,10 @@ Current skills:
2424
Stage-specific AI-facing engineering guides live under `docs/engineering/stages/`.
2525
Use them alongside these cross-cutting skills when modifying a stage-specific
2626
pipeline path.
27+
28+
Current stage guides:
29+
30+
- `build_outputs.md`: Stage 4 output-build library boundaries and test
31+
expectations.
32+
- `release_promotion.md`: Stage 5 release candidate identity, validation-report
33+
schema, rerun comparison material, and side-effect boundaries.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Release Promotion Stage AI Guide
2+
3+
This guide is for AI agents and maintainers modifying Stage 5
4+
(`5_validate_and_promote_release`) code. Stage 5 validates a staged release
5+
candidate, promotes the exact candidate to public Hugging Face and GCS
6+
destinations, writes release/version/completion metadata, and cleans staging
7+
only after completion is certified.
8+
9+
## Candidate Identity
10+
11+
Use `policyengine_us_data.release_promotion.ReleasePromotionContext` as the
12+
typed Stage 5 identity boundary. The context must keep these values distinct:
13+
14+
- `run_id`: the canonical publication run correlation key.
15+
- `candidate_version`: the candidate staging scope used in Hugging Face staging
16+
paths such as `staging/{candidate_version}-{run_id}/...`.
17+
- `release_version`: the final stable public release version.
18+
- `base_release_version` and `release_bump`: optional provenance for how the
19+
candidate scope was chosen.
20+
21+
Do not resolve a different run ID from the environment inside lower-level
22+
release-promotion logic. Environment resolution belongs at orchestration edges;
23+
Stage 5 library code should receive explicit context.
24+
25+
## Release Candidate Bundles
26+
27+
Use `ReleaseCandidateInputBundle` to describe the artifacts Stage 5 is allowed
28+
to validate and promote. Each artifact should be represented by a
29+
`ReleaseArtifactSpec` with a production-relative path, artifact family, source
30+
stage, and optional checksum/size metadata.
31+
32+
The current compatibility path may build a bundle from the legacy staged path
33+
set produced by Modal orchestration. Mark that reader as compatibility-only and
34+
keep it retirable.
35+
36+
The Stage 4 contract/inventory reader API now exists for migration work:
37+
`build_release_candidate_bundle_from_stage4_contract()` accepts an in-memory
38+
Stage 4 contract plus inventory records, and
39+
`read_stage4_release_candidate_bundle()` reads the same shape from files.
40+
Production Stage 5 code should not depend on Stage 4 contracts until the
41+
contract and inventory are canonical, complete, and populated with semantic
42+
artifact identity plus checksum/size material.
43+
44+
Candidate bundles may record validation reports as path-only
45+
`validation_report_paths` for compatibility. When Stage 4 or another upstream
46+
producer can provide report checksums, prefer `validation_report_refs` with
47+
canonical `DiagnosticRef` / `ArtifactRef` identity so rerun comparison can
48+
distinguish an overwritten report at the same diagnostics path.
49+
50+
## Validation Reports
51+
52+
Stage 5 must use the shared validation schema for durable validation output:
53+
54+
- `policyengine_us_data.stage_contracts.ValidationReport`
55+
- `policyengine_us_data.stage_contracts.ValidationFinding`
56+
- `policyengine_us_data.stage_contracts.DiagnosticRef`
57+
58+
Do not create a Stage 5-specific durable validation report, check, finding, or
59+
error schema for contracts, diagnostics, release candidates, status endpoints,
60+
or step manifests. Release-specific details such as missing staged artifacts,
61+
missing validation reports, finalized-release conflicts, version mismatches, or
62+
destination conflicts should live in canonical finding metadata.
63+
64+
## Rerun Comparison Material
65+
66+
Before public writes, rerun and reuse decisions should compare semantic
67+
candidate identity rather than only checking whether output files exist. The
68+
comparison material should include:
69+
70+
- run ID, candidate version, release version, HF repository, and GCS bucket;
71+
- Stage 4 output contract fingerprint when available;
72+
- output inventory paths/checksums when available;
73+
- validation report paths and `DiagnosticRef` checksum identities when
74+
available;
75+
- expected production-relative artifact paths;
76+
- the Stage 5 candidate bundle fingerprint.
77+
78+
When required artifacts only have paths and no checksum/size identity, treat
79+
the bundle as path-only and do not use its fingerprint for promotion reuse
80+
decisions.
81+
82+
Already-finalized releases are an idempotency case, not a shortcut around
83+
candidate identity. A finalized release can be reused only when its completion
84+
marker is valid and it matches the requested candidate.
85+
86+
## Side Effects
87+
88+
Candidate builders, schema adapters, and rerun comparison helpers should not
89+
perform Hugging Face writes, GCS uploads, Modal calls, staging cleanup, or
90+
release-manifest publication. Keep those operations behind explicit adapters or
91+
services so tests can exercise candidate shape and validation logic without
92+
credentials or network access.

docs/pipeline_map.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -759,7 +759,7 @@ stages:
759759
node_type: artifact
760760
description: Policy target database copied into the pipeline volume
761761
- id: hf_staging_base_s1g
762-
label: HuggingFace staging/{candidate_version}/{run_id}
762+
label: HuggingFace staging/{candidate_version}-{run_id}
763763
node_type: external
764764
description: Run-scoped staging prefix for base datasets
765765
- id: stage_base_datasets
@@ -1554,7 +1554,7 @@ stages:
15541554
node_type: artifact
15551555
description: Output set from substage 5a
15561556
- id: hf_staging_s5b
1557-
label: HuggingFace staging/{candidate_version}/{run_id}
1557+
label: HuggingFace staging/{candidate_version}-{run_id}
15581558
node_type: external
15591559
description: Run-scoped staging prefix containing validated artifacts
15601560
- id: out_hf_prod
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
"""Typed Stage 5 release promotion boundaries.
2+
3+
This package starts with release-candidate identity and candidate-bundle
4+
schemas. Promotion side effects still live in the existing transaction engine
5+
until later Stage 5 migration slices move them behind typed services.
6+
"""
7+
8+
from .artifacts import (
9+
BASE_RELEASE_ARTIFACT_PATHS,
10+
ReleaseArtifactSpec,
11+
dedupe_normalized_release_paths,
12+
infer_artifact_identity,
13+
infer_release_artifact_spec,
14+
logical_name_for_release_path,
15+
normalize_release_path,
16+
strip_staging_prefix,
17+
)
18+
from .candidate import (
19+
ReleaseCandidateInputBundle,
20+
)
21+
from .candidate_builders import build_legacy_release_candidate_bundle
22+
from .context import ReleasePromotionContext
23+
from .stage4_reader import (
24+
build_release_candidate_bundle_from_stage4_contract,
25+
read_stage4_release_candidate_bundle,
26+
)
27+
from .validation import build_release_candidate_shape_report
28+
29+
__all__ = [
30+
"BASE_RELEASE_ARTIFACT_PATHS",
31+
"ReleaseArtifactSpec",
32+
"ReleaseCandidateInputBundle",
33+
"ReleasePromotionContext",
34+
"build_legacy_release_candidate_bundle",
35+
"build_release_candidate_bundle_from_stage4_contract",
36+
"build_release_candidate_shape_report",
37+
"dedupe_normalized_release_paths",
38+
"infer_artifact_identity",
39+
"infer_release_artifact_spec",
40+
"logical_name_for_release_path",
41+
"normalize_release_path",
42+
"read_stage4_release_candidate_bundle",
43+
"strip_staging_prefix",
44+
]
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
"""URI parsing helpers for Stage 5 release candidate artifacts."""
2+
3+
from __future__ import annotations
4+
5+
from urllib.parse import ParseResult, urlparse
6+
7+
from .artifacts import (
8+
BASE_RELEASE_ARTIFACT_PATHS,
9+
normalize_release_path,
10+
strip_staging_prefix,
11+
)
12+
from .context import ReleasePromotionContext
13+
14+
15+
def release_path_from_artifact_uri(
16+
uri: str,
17+
*,
18+
context: ReleasePromotionContext,
19+
) -> str | None:
20+
"""Return a release-relative path from a staged artifact URI."""
21+
22+
parsed = urlparse(uri)
23+
if parsed.scheme or parsed.netloc:
24+
validate_uri_repo(parsed, context)
25+
raw_path = parsed.path.lstrip("/") if parsed.scheme else uri
26+
candidate_paths = [raw_path]
27+
if parsed.netloc:
28+
candidate_paths.append(f"{parsed.netloc}/{raw_path}")
29+
for candidate in candidate_paths:
30+
if context.hf_staging_prefix and context.hf_staging_prefix in candidate:
31+
return strip_staging_prefix(
32+
candidate[candidate.index(context.hf_staging_prefix) :],
33+
context.hf_staging_prefix,
34+
)
35+
if parsed.scheme or parsed.netloc:
36+
if contains_release_artifact_path(candidate):
37+
raise ValueError(
38+
"external artifact URI must point under the expected staging prefix"
39+
)
40+
continue
41+
if candidate.startswith("staging/"):
42+
return strip_staging_prefix(candidate, context.hf_staging_prefix)
43+
for prefix in ("states/", "districts/", "cities/", "national/"):
44+
if prefix in candidate:
45+
return normalize_release_path(candidate[candidate.index(prefix) :])
46+
for path in BASE_RELEASE_ARTIFACT_PATHS:
47+
if candidate == path or candidate.endswith(f"/{path}"):
48+
return normalize_release_path(
49+
candidate[candidate.rindex(path) :],
50+
)
51+
return None
52+
53+
54+
def contains_release_artifact_path(candidate: str) -> bool:
55+
"""Return whether a path-shaped string names a release artifact."""
56+
57+
if any(
58+
prefix in candidate
59+
for prefix in ("states/", "districts/", "cities/", "national/")
60+
):
61+
return True
62+
return any(
63+
candidate.endswith(f"/{path}") or candidate == path
64+
for path in BASE_RELEASE_ARTIFACT_PATHS
65+
)
66+
67+
68+
def validate_uri_repo(
69+
parsed_uri: ParseResult,
70+
context: ReleasePromotionContext,
71+
) -> None:
72+
"""Require external artifact URIs to reference the expected HF repository."""
73+
74+
if not parsed_uri.scheme or not parsed_uri.netloc:
75+
return
76+
path_parts = parsed_uri.path.strip("/").split("/")
77+
if not path_parts or not path_parts[0]:
78+
return
79+
repo_name = f"{parsed_uri.netloc}/{path_parts[0]}"
80+
if repo_name != context.hf_repo_name:
81+
raise ValueError("external artifact URI repo must match context.hf_repo_name")

0 commit comments

Comments
 (0)