|
| 1 | +--- |
| 2 | +navigation_title: Changelog bundle registry |
| 3 | +--- |
| 4 | + |
| 5 | +# Changelog bundle registry and CDN delivery |
| 6 | + |
| 7 | +This page describes how changelog **bundles** are published to a public, CDN-fronted |
| 8 | +S3 bucket, how the per-product `registry-index.json` manifest is produced, and the |
| 9 | +planned `cdn:` mode for the [`{changelog}` directive](/syntax/changelog.md) that will |
| 10 | +consume bundles directly from the CDN instead of from a local folder. |
| 11 | + |
| 12 | +:::{note} |
| 13 | +The **producer** side (manifest generation + scrubber pass-through) is implemented. |
| 14 | +The **consumer** side (`{changelog}` directive `cdn:` mode) is **planned** and is |
| 15 | +documented here as a design, not as shipped behavior. |
| 16 | +::: |
| 17 | + |
| 18 | +## Motivation |
| 19 | + |
| 20 | +Today the `{changelog}` directive only renders bundles that live in a folder inside the |
| 21 | +docset (default `changelog/bundles/`). That requires every consuming repository to vendor |
| 22 | +a copy of the bundle YAML it wants to render. |
| 23 | + |
| 24 | +The link service ([building block](/building-blocks/link-service.md)) already demonstrates |
| 25 | +the pattern we want: an S3 bucket fronted by CloudFront, publicly readable, with a small |
| 26 | +JSON index at a well-known key. We apply the same approach to changelog bundles so a docset |
| 27 | +can render another product's release notes by pointing the directive at the CDN — no vendored |
| 28 | +copies, no cross-repo file syncing. |
| 29 | + |
| 30 | +## Architecture |
| 31 | + |
| 32 | +``` |
| 33 | +┌──────────────┐ changelog upload ┌────────────────────┐ s3:ObjectCreated ┌───────────────────┐ |
| 34 | +│ Client CI │ --artifact-type │ Private bundles │ ───────────────────▶ │ Changelog scrubber │ |
| 35 | +│ (docs-actions)│ bundle ───────────▶ │ S3 bucket │ │ Lambda │ |
| 36 | +└──────────────┘ │ │ └─────────┬─────────┘ |
| 37 | + │ │ {product}/bundles/*.yaml │ scrub + copy |
| 38 | + │ also refreshes │ {product}/registry-index.json │ (pass-through for |
| 39 | + └──────────────────────────────▶ │ │ registry-index.json) |
| 40 | + └────────────────────┘ ▼ |
| 41 | + ┌───────────────────┐ |
| 42 | + {changelog} directive (planned) │ Public bundles │ |
| 43 | + reads via CDN ◀─────────────── │ S3 bucket + CDN │ |
| 44 | + └───────────────────┘ |
| 45 | +``` |
| 46 | + |
| 47 | +1. **Producer** — `changelog upload --artifact-type bundle --target s3` (invoked by the |
| 48 | + docs-actions changelog upload workflow) uploads each bundle to |
| 49 | + `{product}/bundles/{file}` in the **private** bucket, then refreshes |
| 50 | + `{product}/registry-index.json` for every product the run touched. |
| 51 | +2. **Scrubber Lambda** — triggered by `s3:ObjectCreated` on the private bucket, it scrubs |
| 52 | + private repository references out of bundle YAML and writes the sanitized copy to the |
| 53 | + **public** bucket. The `registry-index.json` object is copied through **verbatim**. |
| 54 | +3. **Consumer (planned)** — the `{changelog}` directive in `cdn:` mode reads |
| 55 | + `{product}/registry-index.json` from the CDN, then fetches each listed bundle. |
| 56 | + |
| 57 | +### Why a registry instead of an S3 listing |
| 58 | + |
| 59 | +The public surface is a CDN (CloudFront) in front of S3. CloudFront does not expose bucket |
| 60 | +listing, so the consumer cannot enumerate `{product}/bundles/`. The registry is a stable, |
| 61 | +cacheable manifest at a predictable key that lists exactly which bundles exist for a product. |
| 62 | + |
| 63 | +## `registry-index.json` format |
| 64 | + |
| 65 | +Stored at `{product}/registry-index.json`. Serialized with `snake_case` keys. |
| 66 | + |
| 67 | +```json |
| 68 | +{ |
| 69 | + "schema_version": 1, |
| 70 | + "product": "elasticsearch", |
| 71 | + "generated_at": "2026-05-06T12:00:00+00:00", |
| 72 | + "bundles": [ |
| 73 | + { "file": "9.4.0.yaml", "target": "9.4.0", "etag": "…" }, |
| 74 | + { "file": "9.3.0.yaml", "target": "9.3.0", "etag": "…" } |
| 75 | + ] |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +| Field | Meaning | |
| 80 | +|---|---| |
| 81 | +| `schema_version` | Bumped when consumers must change their parser. | |
| 82 | +| `product` | Product identifier; matches the first S3 key segment. | |
| 83 | +| `generated_at` | UTC timestamp of the last regeneration. | |
| 84 | +| `bundles[].file` | Bundle file name, resolved at `{product}/bundles/{file}`. | |
| 85 | +| `bundles[].target` | Target version/date from the bundle's declaration of **this** product (may be null). | |
| 86 | +| `bundles[].etag` | See the ETag caveat below. | |
| 87 | + |
| 88 | +Bundles are sorted by `target` descending (newest first) with a deterministic tiebreak on |
| 89 | +`file`, so the JSON is stable across reruns. |
| 90 | + |
| 91 | +### ETag caveat |
| 92 | + |
| 93 | +`bundles[].etag` is the ETag of the bundle object **as uploaded to the private bucket** |
| 94 | +(pre-scrub). The scrubber rewrites any bundle that contains private references, so for |
| 95 | +scrubbed bundles this value **will not match** the public (CDN) object's ETag. |
| 96 | + |
| 97 | +Consumers **must not** use it for integrity checks or HTTP cache validation against the |
| 98 | +public bucket — use the CDN response's own `ETag`/`Last-Modified` for caching. The field is |
| 99 | +only a best-effort change hint (e.g. detecting whether a bundle changed between two manifest |
| 100 | +reads of the same bucket). |
| 101 | + |
| 102 | +## Producer details (implemented) |
| 103 | + |
| 104 | +The refresh runs inside `ChangelogUploadService` after a successful **bundle** upload (it is |
| 105 | +skipped for `--artifact-type changelog`). `RegistryIndexBuilder`: |
| 106 | + |
| 107 | +- Groups the run's upload targets by product (from the `{product}/bundles/{file}` key). |
| 108 | +- For each product, derives one `registry-index` entry per bundle (file name, that product's |
| 109 | + target, locally-computed S3 ETag). |
| 110 | +- Reads the existing manifest from S3, merges by file name (re-uploads replace their entry; |
| 111 | + others are preserved), and writes the merged manifest back. |
| 112 | + |
| 113 | +### Concurrency: optimistic, conditional writes |
| 114 | + |
| 115 | +Two uploads that touch the same product (for example two repositories that both map to one |
| 116 | +product, or parallel CI) could otherwise clobber each other's index via a naive |
| 117 | +read-modify-write. The writer instead uses **S3 conditional PUT**: |
| 118 | + |
| 119 | +- On **update**: `If-Match: <etag-from-read>` — only succeeds if the object hasn't changed. |
| 120 | +- On **create**: `If-None-Match: *` — only succeeds if the object still doesn't exist. |
| 121 | + |
| 122 | +A `412 Precondition Failed` means another writer won the race; the builder re-reads, |
| 123 | +re-merges, and retries (bounded retries). This mirrors the link-index writer |
| 124 | +(`AwsS3LinkIndexReaderWriter.SaveRegistry`). If the merge result already equals what's |
| 125 | +published, the write is skipped so re-uploads stay idempotent. |
| 126 | + |
| 127 | +The refresh is **best-effort**: any failure is logged and surfaced as a warning but never |
| 128 | +fails the upload, because the bundle objects themselves are already in S3. |
| 129 | + |
| 130 | +### Buckets and infrastructure |
| 131 | + |
| 132 | +The registry is written to the **private** bucket |
| 133 | +(`elastic-docs-v3-changelog-bundles-private`) — the same bucket and key space as the bundles |
| 134 | +themselves — and reaches the **public** bucket (`elastic-docs-v3-changelog-bundles`, served |
| 135 | +only via CloudFront + OAC) through the scrubber's verbatim pass-through. The uploader never |
| 136 | +writes to the public bucket; the scrubber Lambda is the sole writer there, which preserves the |
| 137 | +invariant that everything on the public surface has been vetted. |
| 138 | + |
| 139 | +The required infrastructure already exists in `docs-infra` |
| 140 | +(`aws/elastic-web/us-east-1/elastic-docs-v3-changelog-bundles/`) — **no infra change is needed |
| 141 | +for the producer**: |
| 142 | + |
| 143 | +- The private bucket's S3 → SQS notification fires on `s3:ObjectCreated:*` / `s3:ObjectRemoved:*` |
| 144 | + with **no suffix filter**, so registry `.json` events already reach the scrubber. |
| 145 | +- The uploader (GitHub Actions OIDC) role already has `s3:GetObject`/`s3:PutObject`/`s3:ListBucket` |
| 146 | + on the private bucket, so the producer's conditional GET + PUT work. Conditional |
| 147 | + (`If-Match`/`If-None-Match`) writes need no extra permission. |
| 148 | +- The scrubber role has `s3:GetObject` on private and `s3:PutObject`/`s3:DeleteObject` on public, |
| 149 | + covering the registry `CopyObject` pass-through and the `ObjectRemoved` delete. |
| 150 | +- A CloudFront cache policy tuned for the manifest already exists (default TTL 1h, min 60s). |
| 151 | + |
| 152 | +The scrubber only passes through keys accepted by `RegistryIndexKey.IsRegistryIndex` (a single |
| 153 | +`{product}/registry-index.json` segment), so arbitrary JSON cannot reach the public surface. |
| 154 | + |
| 155 | +**No new docs-actions workflow logic is required** for the producer either: the refresh is a |
| 156 | +side-effect of the existing `changelog upload` step; docs-actions only needs a docs-builder |
| 157 | +build that includes this feature. |
| 158 | + |
| 159 | +:::{note} |
| 160 | +The CDN cache policy comment refers to `registry.json` while the implementation uses |
| 161 | +`registry-index.json`. This is only a comment (there is no path-based cache behavior), so it is |
| 162 | +harmless, but the two should be aligned to avoid confusion. |
| 163 | +::: |
| 164 | + |
| 165 | +### Consistency notes the consumer must tolerate |
| 166 | + |
| 167 | +- The manifest pass-through and the per-bundle scrub are independent S3 events, so the index |
| 168 | + may briefly reference a bundle that is not yet on the public bucket. |
| 169 | +- A bundle that fails scrubbing (private references that cannot be allowlisted) is never |
| 170 | + written to the public bucket, even though the index may list it. |
| 171 | + |
| 172 | +Consumers must therefore treat a missing bundle as non-fatal (skip + warn), not an error. |
| 173 | + |
| 174 | +## Consumer: `{changelog}` directive `cdn:` mode (planned) |
| 175 | + |
| 176 | +### Proposed syntax |
| 177 | + |
| 178 | +```markdown |
| 179 | +:::{changelog} |
| 180 | +:cdn: elasticsearch |
| 181 | +::: |
| 182 | +``` |
| 183 | + |
| 184 | +The directive would accept a `:cdn:` option naming the **product** to fetch. The CDN base |
| 185 | +URL is environment configuration (not authored per page), defaulting to the public changelog |
| 186 | +bundles distribution and overridable for staging/local. |
| 187 | + |
| 188 | +When `:cdn:` is set, the local-folder argument is ignored and the directive sources bundles |
| 189 | +from the CDN instead. |
| 190 | + |
| 191 | +### Fetch flow |
| 192 | + |
| 193 | +1. `GET {cdnBase}/{product}/registry-index.json`. |
| 194 | +2. Parse it; for each `bundles[].file`, `GET {cdnBase}/{product}/bundles/{file}`. |
| 195 | +3. Feed the downloaded YAML into the existing `BundleLoader` → `MergeBundlesByTarget` → |
| 196 | + render pipeline. **Rendering is unchanged**; only the source of the bundle bytes differs. |
| 197 | + |
| 198 | +Because public bundles are already scrubbed and resolved, the existing private-repo link and |
| 199 | +description visibility logic still applies via `assembler.yml`, exactly as for local bundles. |
| 200 | + |
| 201 | +### Open design decisions |
| 202 | + |
| 203 | +- **Build-time network access.** Fetching at build time makes builds depend on the CDN. |
| 204 | + Options: (a) fetch during the build with an on-disk cache under the docs-builder app-data |
| 205 | + directory (mirrors `CrossLinkFetcher`/link-index); (b) a separate fetch step that |
| 206 | + materializes bundles into the working tree before the build. Caching + ETag revalidation |
| 207 | + against the CDN is the likely answer. |
| 208 | +- **Local/offline development.** The directive must degrade gracefully when the CDN is |
| 209 | + unreachable (use cache; otherwise emit a clear, actionable diagnostic) so local builds and |
| 210 | + PR previews don't hard-fail on transient network issues. |
| 211 | +- **Missing/partial bundles.** Skip-and-warn per the consistency notes above; never fail the |
| 212 | + whole page on a single missing bundle. |
| 213 | +- **Schema evolution.** Honor `schema_version`; a newer major than the consumer understands |
| 214 | + should produce a clear error rather than a silent mis-parse. |
| 215 | +- **Filtering.** `:type:`, `:link-visibility:`, `:description-visibility:`, `:dropdowns:` and |
| 216 | + `hide-features` apply identically to CDN-sourced bundles. |
| 217 | +- **Caching key.** Use the CDN response ETag (not the registry `etag` field) for revalidation. |
| 218 | +- **CDN staleness.** The distribution caches the manifest with a 1h default TTL (60s min), so a |
| 219 | + freshly uploaded bundle may not appear in the CDN-served `registry-index.json` for up to an |
| 220 | + hour. Acceptable for release notes, but if faster propagation is needed the producer (or a |
| 221 | + docs-actions step) would have to issue a CloudFront invalidation on registry write. Out of |
| 222 | + scope for the first iteration. |
| 223 | + |
| 224 | +### Out of scope for the first iteration |
| 225 | + |
| 226 | +- Cross-product aggregation in a single directive block (start with one product per block). |
| 227 | +- Authenticated/private CDN access (the public bucket is anonymous-read by design). |
| 228 | + |
| 229 | +## Related |
| 230 | + |
| 231 | +- [Changelog directive](/syntax/changelog.md) — current (local-folder) behavior. |
| 232 | +- [Publish changelogs](/contribute/publish-changelogs.md) — the upload workflow. |
| 233 | +- [Link service](/building-blocks/link-service.md) — the S3 + CloudFront pattern this reuses. |
0 commit comments