Skip to content

Commit f8b27ce

Browse files
committed
Introduce Stage 5 release candidate schemas
1 parent 49c22a5 commit f8b27ce

10 files changed

Lines changed: 2791 additions & 2 deletions

File tree

changelog.d/1022.added

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Added typed Stage 5 release candidate schemas and Stage 4 candidate bundle readers.

docs/engineering/skills/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,10 @@ Current skills:
2424
Stage-specific AI-facing engineering guides live under `docs/engineering/stages/`.
2525
Use them alongside these cross-cutting skills when modifying a stage-specific
2626
pipeline path.
27+
28+
Current stage guides:
29+
30+
- `build_outputs.md`: Stage 4 output-build library boundaries and test
31+
expectations.
32+
- `release_promotion.md`: Stage 5 release candidate identity, validation-report
33+
schema, rerun comparison material, and side-effect boundaries.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Release Promotion Stage AI Guide
2+
3+
This guide is for AI agents and maintainers modifying Stage 5
4+
(`5_validate_and_promote_release`) code. Stage 5 validates a staged release
5+
candidate, promotes the exact candidate to public Hugging Face and GCS
6+
destinations, writes release/version/completion metadata, and cleans staging
7+
only after completion is certified.
8+
9+
## Candidate Identity
10+
11+
Use `policyengine_us_data.release_promotion.ReleasePromotionContext` as the
12+
typed Stage 5 identity boundary. The context must keep these values distinct:
13+
14+
- `run_id`: the canonical publication run correlation key.
15+
- `candidate_version`: the candidate staging scope used in Hugging Face staging
16+
paths such as `staging/{candidate_version}-{run_id}/...`.
17+
- `release_version`: the final stable public release version.
18+
- `base_release_version` and `release_bump`: optional provenance for how the
19+
candidate scope was chosen.
20+
21+
Do not resolve a different run ID from the environment inside lower-level
22+
release-promotion logic. Environment resolution belongs at orchestration edges;
23+
Stage 5 library code should receive explicit context.
24+
25+
## Release Candidate Bundles
26+
27+
Use `ReleaseCandidateInputBundle` to describe the artifacts Stage 5 is allowed
28+
to validate and promote. Each artifact should be represented by a
29+
`ReleaseArtifactSpec` with a production-relative path, artifact family, source
30+
stage, and optional checksum/size metadata.
31+
32+
The current compatibility path may build a bundle from the legacy staged path
33+
set produced by Modal orchestration. Mark that reader as compatibility-only and
34+
keep it retirable.
35+
36+
The Stage 4 contract/inventory reader API now exists for migration work:
37+
`build_release_candidate_bundle_from_stage4_contract()` accepts an in-memory
38+
Stage 4 contract plus inventory records, and
39+
`read_stage4_release_candidate_bundle()` reads the same shape from files.
40+
Production Stage 5 code should not depend on Stage 4 contracts until the
41+
contract and inventory are canonical, complete, and populated with semantic
42+
artifact identity plus checksum/size material.
43+
44+
## Validation Reports
45+
46+
Stage 5 must use the shared validation schema for durable validation output:
47+
48+
- `policyengine_us_data.stage_contracts.ValidationReport`
49+
- `policyengine_us_data.stage_contracts.ValidationFinding`
50+
- `policyengine_us_data.stage_contracts.DiagnosticRef`
51+
52+
Do not create a Stage 5-specific durable validation report, check, finding, or
53+
error schema for contracts, diagnostics, release candidates, status endpoints,
54+
or step manifests. Release-specific details such as missing staged artifacts,
55+
missing validation reports, finalized-release conflicts, version mismatches, or
56+
destination conflicts should live in canonical finding metadata.
57+
58+
## Rerun Comparison Material
59+
60+
Before public writes, rerun and reuse decisions should compare semantic
61+
candidate identity rather than only checking whether output files exist. The
62+
comparison material should include:
63+
64+
- run ID, candidate version, release version, HF repository, and GCS bucket;
65+
- Stage 4 output contract fingerprint when available;
66+
- output inventory paths/checksums when available;
67+
- validation report paths and their identities when available;
68+
- expected production-relative artifact paths;
69+
- the Stage 5 candidate bundle fingerprint.
70+
71+
When required artifacts only have paths and no checksum/size identity, treat
72+
the bundle as path-only and do not use its fingerprint for promotion reuse
73+
decisions.
74+
75+
Already-finalized releases are an idempotency case, not a shortcut around
76+
candidate identity. A finalized release can be reused only when its completion
77+
marker is valid and it matches the requested candidate.
78+
79+
## Side Effects
80+
81+
Candidate builders, schema adapters, and rerun comparison helpers should not
82+
perform Hugging Face writes, GCS uploads, Modal calls, staging cleanup, or
83+
release-manifest publication. Keep those operations behind explicit adapters or
84+
services so tests can exercise candidate shape and validation logic without
85+
credentials or network access.

docs/pipeline_map.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -759,7 +759,7 @@ stages:
759759
node_type: artifact
760760
description: Policy target database copied into the pipeline volume
761761
- id: hf_staging_base_s1g
762-
label: HuggingFace staging/{candidate_version}/{run_id}
762+
label: HuggingFace staging/{candidate_version}-{run_id}
763763
node_type: external
764764
description: Run-scoped staging prefix for base datasets
765765
- id: stage_base_datasets
@@ -1504,7 +1504,7 @@ stages:
15041504
node_type: artifact
15051505
description: Output set from substage 5a
15061506
- id: hf_staging_s5b
1507-
label: HuggingFace staging/{candidate_version}/{run_id}
1507+
label: HuggingFace staging/{candidate_version}-{run_id}
15081508
node_type: external
15091509
description: Run-scoped staging prefix containing validated artifacts
15101510
- id: out_hf_prod
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
"""Typed Stage 5 release promotion boundaries.
2+
3+
This package starts with release-candidate identity and candidate-bundle
4+
schemas. Promotion side effects still live in the existing transaction engine
5+
until later Stage 5 migration slices move them behind typed services.
6+
"""
7+
8+
from .artifacts import (
9+
BASE_RELEASE_ARTIFACT_PATHS,
10+
ReleaseArtifactSpec,
11+
dedupe_normalized_release_paths,
12+
infer_artifact_identity,
13+
infer_release_artifact_spec,
14+
logical_name_for_release_path,
15+
normalize_release_path,
16+
strip_staging_prefix,
17+
)
18+
from .candidate import (
19+
ReleaseCandidateInputBundle,
20+
build_legacy_release_candidate_bundle,
21+
build_release_candidate_bundle_from_stage4_contract,
22+
read_stage4_release_candidate_bundle,
23+
)
24+
from .context import ReleasePromotionContext
25+
from .validation import build_release_candidate_shape_report
26+
27+
__all__ = [
28+
"BASE_RELEASE_ARTIFACT_PATHS",
29+
"ReleaseArtifactSpec",
30+
"ReleaseCandidateInputBundle",
31+
"ReleasePromotionContext",
32+
"build_legacy_release_candidate_bundle",
33+
"build_release_candidate_bundle_from_stage4_contract",
34+
"build_release_candidate_shape_report",
35+
"dedupe_normalized_release_paths",
36+
"infer_artifact_identity",
37+
"infer_release_artifact_spec",
38+
"logical_name_for_release_path",
39+
"normalize_release_path",
40+
"read_stage4_release_candidate_bundle",
41+
"strip_staging_prefix",
42+
]

0 commit comments

Comments
 (0)