Skip to content

Commit 3a316ab

Browse files
cotticlaudecursoragent
committed
changelog: publish per-product registry-index for bundle uploads
Bundle uploads now refresh a per-product registry-index.json so consumers (the upcoming changelog directive cdn: mode) can enumerate bundles without an S3 listing. The builder does a read-merge-conditional-PUT (If-Match/If-None-Match) with bounded retries to stay safe under concurrent uploads, writes the manifest to the private bucket, and relies on the scrubber's existing pass-through to mirror it to the public CDN bucket. Refresh failures are non-fatal: bundle objects are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 4856b14 commit 3a316ab

10 files changed

Lines changed: 1207 additions & 2 deletions

File tree

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
---
2+
navigation_title: Changelog bundle registry
3+
---
4+
5+
# Changelog bundle registry and CDN delivery
6+
7+
This page describes how changelog **bundles** are published to a public, CDN-fronted
8+
S3 bucket, how the per-product `registry-index.json` manifest is produced, and the
9+
planned `cdn:` mode for the [`{changelog}` directive](/syntax/changelog.md) that will
10+
consume bundles directly from the CDN instead of from a local folder.
11+
12+
:::{note}
13+
The **producer** side (manifest generation + scrubber pass-through) is implemented.
14+
The **consumer** side (`{changelog}` directive `cdn:` mode) is **planned** and is
15+
documented here as a design, not as shipped behavior.
16+
:::
17+
18+
## Motivation
19+
20+
Today the `{changelog}` directive only renders bundles that live in a folder inside the
21+
docset (default `changelog/bundles/`). That requires every consuming repository to vendor
22+
a copy of the bundle YAML it wants to render.
23+
24+
The link service ([building block](/building-blocks/link-service.md)) already demonstrates
25+
the pattern we want: an S3 bucket fronted by CloudFront, publicly readable, with a small
26+
JSON index at a well-known key. We apply the same approach to changelog bundles so a docset
27+
can render another product's release notes by pointing the directive at the CDN — no vendored
28+
copies, no cross-repo file syncing.
29+
30+
## Architecture
31+
32+
```
33+
┌──────────────┐ changelog upload ┌────────────────────┐ s3:ObjectCreated ┌───────────────────┐
34+
│ Client CI │ --artifact-type │ Private bundles │ ───────────────────▶ │ Changelog scrubber │
35+
│ (docs-actions)│ bundle ───────────▶ │ S3 bucket │ │ Lambda │
36+
└──────────────┘ │ │ └─────────┬─────────┘
37+
│ │ {product}/bundles/*.yaml │ scrub + copy
38+
│ also refreshes │ {product}/registry-index.json │ (pass-through for
39+
└──────────────────────────────▶ │ │ registry-index.json)
40+
└────────────────────┘ ▼
41+
┌───────────────────┐
42+
{changelog} directive (planned) │ Public bundles │
43+
reads via CDN ◀─────────────── │ S3 bucket + CDN │
44+
└───────────────────┘
45+
```
46+
47+
1. **Producer**`changelog upload --artifact-type bundle --target s3` (invoked by the
48+
docs-actions changelog upload workflow) uploads each bundle to
49+
`{product}/bundles/{file}` in the **private** bucket, then refreshes
50+
`{product}/registry-index.json` for every product the run touched.
51+
2. **Scrubber Lambda** — triggered by `s3:ObjectCreated` on the private bucket, it scrubs
52+
private repository references out of bundle YAML and writes the sanitized copy to the
53+
**public** bucket. The `registry-index.json` object is copied through **verbatim**.
54+
3. **Consumer (planned)** — the `{changelog}` directive in `cdn:` mode reads
55+
`{product}/registry-index.json` from the CDN, then fetches each listed bundle.
56+
57+
### Why a registry instead of an S3 listing
58+
59+
The public surface is a CDN (CloudFront) in front of S3. CloudFront does not expose bucket
60+
listing, so the consumer cannot enumerate `{product}/bundles/`. The registry is a stable,
61+
cacheable manifest at a predictable key that lists exactly which bundles exist for a product.
62+
63+
## `registry-index.json` format
64+
65+
Stored at `{product}/registry-index.json`. Serialized with `snake_case` keys.
66+
67+
```json
68+
{
69+
"schema_version": 1,
70+
"product": "elasticsearch",
71+
"generated_at": "2026-05-06T12:00:00+00:00",
72+
"bundles": [
73+
{ "file": "9.4.0.yaml", "target": "9.4.0", "etag": "" },
74+
{ "file": "9.3.0.yaml", "target": "9.3.0", "etag": "" }
75+
]
76+
}
77+
```
78+
79+
| Field | Meaning |
80+
|---|---|
81+
| `schema_version` | Bumped when consumers must change their parser. |
82+
| `product` | Product identifier; matches the first S3 key segment. |
83+
| `generated_at` | UTC timestamp of the last regeneration. |
84+
| `bundles[].file` | Bundle file name, resolved at `{product}/bundles/{file}`. |
85+
| `bundles[].target` | Target version/date from the bundle's declaration of **this** product (may be null). |
86+
| `bundles[].etag` | See the ETag caveat below. |
87+
88+
Bundles are sorted by `target` descending (newest first) with a deterministic tiebreak on
89+
`file`, so the JSON is stable across reruns.
90+
91+
### ETag caveat
92+
93+
`bundles[].etag` is the ETag of the bundle object **as uploaded to the private bucket**
94+
(pre-scrub). The scrubber rewrites any bundle that contains private references, so for
95+
scrubbed bundles this value **will not match** the public (CDN) object's ETag.
96+
97+
Consumers **must not** use it for integrity checks or HTTP cache validation against the
98+
public bucket — use the CDN response's own `ETag`/`Last-Modified` for caching. The field is
99+
only a best-effort change hint (e.g. detecting whether a bundle changed between two manifest
100+
reads of the same bucket).
101+
102+
## Producer details (implemented)
103+
104+
The refresh runs inside `ChangelogUploadService` after a successful **bundle** upload (it is
105+
skipped for `--artifact-type changelog`). `RegistryIndexBuilder`:
106+
107+
- Groups the run's upload targets by product (from the `{product}/bundles/{file}` key).
108+
- For each product, derives one `registry-index` entry per bundle (file name, that product's
109+
target, locally-computed S3 ETag).
110+
- Reads the existing manifest from S3, merges by file name (re-uploads replace their entry;
111+
others are preserved), and writes the merged manifest back.
112+
113+
### Concurrency: optimistic, conditional writes
114+
115+
Two uploads that touch the same product (for example two repositories that both map to one
116+
product, or parallel CI) could otherwise clobber each other's index via a naive
117+
read-modify-write. The writer instead uses **S3 conditional PUT**:
118+
119+
- On **update**: `If-Match: <etag-from-read>` — only succeeds if the object hasn't changed.
120+
- On **create**: `If-None-Match: *` — only succeeds if the object still doesn't exist.
121+
122+
A `412 Precondition Failed` means another writer won the race; the builder re-reads,
123+
re-merges, and retries (bounded retries). This mirrors the link-index writer
124+
(`AwsS3LinkIndexReaderWriter.SaveRegistry`). If the merge result already equals what's
125+
published, the write is skipped so re-uploads stay idempotent.
126+
127+
The refresh is **best-effort**: any failure is logged and surfaced as a warning but never
128+
fails the upload, because the bundle objects themselves are already in S3.
129+
130+
### Buckets and infrastructure
131+
132+
The registry is written to the **private** bucket
133+
(`elastic-docs-v3-changelog-bundles-private`) — the same bucket and key space as the bundles
134+
themselves — and reaches the **public** bucket (`elastic-docs-v3-changelog-bundles`, served
135+
only via CloudFront + OAC) through the scrubber's verbatim pass-through. The uploader never
136+
writes to the public bucket; the scrubber Lambda is the sole writer there, which preserves the
137+
invariant that everything on the public surface has been vetted.
138+
139+
The required infrastructure already exists in `docs-infra`
140+
(`aws/elastic-web/us-east-1/elastic-docs-v3-changelog-bundles/`) — **no infra change is needed
141+
for the producer**:
142+
143+
- The private bucket's S3 → SQS notification fires on `s3:ObjectCreated:*` / `s3:ObjectRemoved:*`
144+
with **no suffix filter**, so registry `.json` events already reach the scrubber.
145+
- The uploader (GitHub Actions OIDC) role already has `s3:GetObject`/`s3:PutObject`/`s3:ListBucket`
146+
on the private bucket, so the producer's conditional GET + PUT work. Conditional
147+
(`If-Match`/`If-None-Match`) writes need no extra permission.
148+
- The scrubber role has `s3:GetObject` on private and `s3:PutObject`/`s3:DeleteObject` on public,
149+
covering the registry `CopyObject` pass-through and the `ObjectRemoved` delete.
150+
- A CloudFront cache policy tuned for the manifest already exists (default TTL 1h, min 60s).
151+
152+
The scrubber only passes through keys accepted by `RegistryIndexKey.IsRegistryIndex` (a single
153+
`{product}/registry-index.json` segment), so arbitrary JSON cannot reach the public surface.
154+
155+
**No new docs-actions workflow logic is required** for the producer either: the refresh is a
156+
side-effect of the existing `changelog upload` step; docs-actions only needs a docs-builder
157+
build that includes this feature.
158+
159+
:::{note}
160+
The CDN cache policy comment refers to `registry.json` while the implementation uses
161+
`registry-index.json`. This is only a comment (there is no path-based cache behavior), so it is
162+
harmless, but the two should be aligned to avoid confusion.
163+
:::
164+
165+
### Consistency notes the consumer must tolerate
166+
167+
- The manifest pass-through and the per-bundle scrub are independent S3 events, so the index
168+
may briefly reference a bundle that is not yet on the public bucket.
169+
- A bundle that fails scrubbing (private references that cannot be allowlisted) is never
170+
written to the public bucket, even though the index may list it.
171+
172+
Consumers must therefore treat a missing bundle as non-fatal (skip + warn), not an error.
173+
174+
## Consumer: `{changelog}` directive `cdn:` mode (planned)
175+
176+
### Proposed syntax
177+
178+
```markdown
179+
:::{changelog}
180+
:cdn: elasticsearch
181+
:::
182+
```
183+
184+
The directive would accept a `:cdn:` option naming the **product** to fetch. The CDN base
185+
URL is environment configuration (not authored per page), defaulting to the public changelog
186+
bundles distribution and overridable for staging/local.
187+
188+
When `:cdn:` is set, the local-folder argument is ignored and the directive sources bundles
189+
from the CDN instead.
190+
191+
### Fetch flow
192+
193+
1. `GET {cdnBase}/{product}/registry-index.json`.
194+
2. Parse it; for each `bundles[].file`, `GET {cdnBase}/{product}/bundles/{file}`.
195+
3. Feed the downloaded YAML into the existing `BundleLoader``MergeBundlesByTarget`
196+
render pipeline. **Rendering is unchanged**; only the source of the bundle bytes differs.
197+
198+
Because public bundles are already scrubbed and resolved, the existing private-repo link and
199+
description visibility logic still applies via `assembler.yml`, exactly as for local bundles.
200+
201+
### Open design decisions
202+
203+
- **Build-time network access.** Fetching at build time makes builds depend on the CDN.
204+
Options: (a) fetch during the build with an on-disk cache under the docs-builder app-data
205+
directory (mirrors `CrossLinkFetcher`/link-index); (b) a separate fetch step that
206+
materializes bundles into the working tree before the build. Caching + ETag revalidation
207+
against the CDN is the likely answer.
208+
- **Local/offline development.** The directive must degrade gracefully when the CDN is
209+
unreachable (use cache; otherwise emit a clear, actionable diagnostic) so local builds and
210+
PR previews don't hard-fail on transient network issues.
211+
- **Missing/partial bundles.** Skip-and-warn per the consistency notes above; never fail the
212+
whole page on a single missing bundle.
213+
- **Schema evolution.** Honor `schema_version`; a newer major than the consumer understands
214+
should produce a clear error rather than a silent mis-parse.
215+
- **Filtering.** `:type:`, `:link-visibility:`, `:description-visibility:`, `:dropdowns:` and
216+
`hide-features` apply identically to CDN-sourced bundles.
217+
- **Caching key.** Use the CDN response ETag (not the registry `etag` field) for revalidation.
218+
- **CDN staleness.** The distribution caches the manifest with a 1h default TTL (60s min), so a
219+
freshly uploaded bundle may not appear in the CDN-served `registry-index.json` for up to an
220+
hour. Acceptable for release notes, but if faster propagation is needed the producer (or a
221+
docs-actions step) would have to issue a CloudFront invalidation on registry write. Out of
222+
scope for the first iteration.
223+
224+
### Out of scope for the first iteration
225+
226+
- Cross-product aggregation in a single directive block (start with one product per block).
227+
- Authenticated/private CDN access (the public bucket is anonymous-read by design).
228+
229+
## Related
230+
231+
- [Changelog directive](/syntax/changelog.md) — current (local-folder) behavior.
232+
- [Publish changelogs](/contribute/publish-changelogs.md) — the upload workflow.
233+
- [Link service](/building-blocks/link-service.md) — the S3 + CloudFront pattern this reuses.

docs/development/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
toc:
22
- file: index.md
3+
- file: changelog-bundle-registry.md
34
- folder: ingest
45
children:
56
- file: index.md

src/infra/docs-lambda-changelog-scrubber/Program.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
using Amazon.S3.Model;
1414
using Amazon.S3.Util;
1515
using Elastic.Changelog.Bundling;
16+
using Elastic.Changelog.Uploading;
1617
using Elastic.Documentation.Configuration.Assembler;
1718
using Elastic.Documentation.Configuration.ReleaseNotes;
1819
using Elastic.Documentation.Diagnostics;
@@ -121,8 +122,7 @@ async Task ScrubAndCopyToPublicBucket(IAmazonS3 s3Client, string sourceBucket, s
121122
{
122123
context.Logger.LogDebug("Scrubbing {Key} to public bucket", key);
123124

124-
var fileName = Path.GetFileName(key);
125-
if (string.Equals(fileName, "registry-index.json", StringComparison.OrdinalIgnoreCase))
125+
if (RegistryIndexKey.IsRegistryIndex(key))
126126
{
127127
await CopyPassThrough(s3Client, sourceBucket, key, context);
128128
return;

src/services/Elastic.Changelog/Uploading/ChangelogUploadService.cs

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,9 +94,38 @@ public async Task<bool> Upload(IDiagnosticsCollector collector, ChangelogUploadA
9494
if (result.Failed > 0)
9595
collector.EmitError(string.Empty, $"{result.Failed} file(s) failed to upload");
9696

97+
// On a successful bundle upload, refresh the per-product registry-index.json so consumers
98+
// (e.g. the changelog directive in cdn: mode) can enumerate bundles without an S3 listing.
99+
// Failures here are logged but don't fail the upload — the bundles themselves are already in S3.
100+
if (result.Failed == 0 && args.ArtifactType == ArtifactType.Bundle && targets.Count > 0)
101+
await RefreshRegistryIndexes(collector, client, etagCalculator, args, targets, ctx);
102+
97103
return result.Failed == 0;
98104
}
99105

106+
private async Task RefreshRegistryIndexes(
107+
IDiagnosticsCollector collector,
108+
IAmazonS3 client,
109+
IS3EtagCalculator etagCalculator,
110+
ChangelogUploadArguments args,
111+
IReadOnlyList<UploadTarget> bundleTargets,
112+
Cancel ctx)
113+
{
114+
try
115+
{
116+
var builder = new RegistryIndexBuilder(logFactory, _fileSystem, client, etagCalculator, args.S3BucketName);
117+
var result = await builder.RefreshAsync(collector, bundleTargets, ctx);
118+
_logger.LogInformation("Registry-index refresh: {Updated} updated, {Unchanged} unchanged, {Failed} failed",
119+
result.Updated, result.Unchanged, result.Failed);
120+
}
121+
catch (Exception ex) when (ex is not OperationCanceledException)
122+
{
123+
// Leaving the manifest stale is non-fatal — bundle objects are unaffected.
124+
_logger.LogWarning(ex, "Registry-index refresh failed; bundles uploaded successfully but manifests may be stale");
125+
collector.EmitWarning(string.Empty, $"Failed to refresh registry-index manifest(s): {ex.Message}");
126+
}
127+
}
128+
100129
internal IReadOnlyList<UploadTarget> DiscoverUploadTargets(IDiagnosticsCollector collector, string changelogDir)
101130
{
102131
var rootDir = _fileSystem.DirectoryInfo.New(changelogDir);
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
// Licensed to Elasticsearch B.V under one or more agreements.
2+
// Elasticsearch B.V licenses this file to you under the Apache 2.0 License.
3+
// See the LICENSE file in the project root for more information
4+
5+
using System.Text.Json.Serialization;
6+
7+
namespace Elastic.Changelog.Uploading;
8+
9+
/// <summary>
10+
/// Per-product manifest published alongside scrubbed changelog bundles.
11+
/// Lets consumers (e.g. the <c>changelog</c> directive in <c>cdn:</c> mode) enumerate
12+
/// bundle files without an S3 listing call.
13+
/// </summary>
14+
/// <remarks>
15+
/// Stored at <c>{product}/registry-index.json</c> in the changelog bundles bucket.
16+
/// The scrubber Lambda mirrors it verbatim to the public bucket (pass-through).
17+
/// </remarks>
18+
public sealed record RegistryIndex
19+
{
20+
/// <summary>
21+
/// Manifest schema version. Incremented when consumers must change their parser.
22+
/// </summary>
23+
public int SchemaVersion { get; init; } = 1;
24+
25+
/// <summary>
26+
/// Product identifier (matches the first segment of the S3 key).
27+
/// </summary>
28+
public required string Product { get; init; }
29+
30+
/// <summary>
31+
/// Time the manifest was last regenerated, in UTC.
32+
/// </summary>
33+
public required DateTimeOffset GeneratedAt { get; init; }
34+
35+
/// <summary>
36+
/// Bundles currently known for this product, sorted by <see cref="RegistryBundle.Target"/>
37+
/// descending (newest first), with a deterministic tiebreak on <see cref="RegistryBundle.File"/>.
38+
/// </summary>
39+
public required IReadOnlyList<RegistryBundle> Bundles { get; init; }
40+
}
41+
42+
/// <summary>
43+
/// One entry in <see cref="RegistryIndex.Bundles"/>.
44+
/// </summary>
45+
public sealed record RegistryBundle
46+
{
47+
/// <summary>
48+
/// Bundle file name (e.g. <c>9.3.0.yaml</c> or <c>2025-11.yaml</c>),
49+
/// resolved at <c>{product}/bundles/{file}</c>.
50+
/// </summary>
51+
public required string File { get; init; }
52+
53+
/// <summary>
54+
/// Target version or release date as declared in the bundle's first product
55+
/// (e.g. <c>9.3.0</c> or <c>2025-11-01</c>). May be null if the bundle declares no products.
56+
/// </summary>
57+
public string? Target { get; init; }
58+
59+
/// <summary>
60+
/// S3 ETag of the bundle object as uploaded to the <em>private</em> bundles bucket (pre-scrub).
61+
/// For single-part uploads smaller than
62+
/// <see cref="Elastic.Documentation.Integrations.S3.S3EtagCalculator.PartSize"/> this is the MD5 of the body.
63+
/// </summary>
64+
/// <remarks>
65+
/// Best-effort identity / change hint only. The public (CDN) object is produced by the changelog
66+
/// scrubber Lambda, which rewrites any bundle that contains private references — so for scrubbed
67+
/// bundles this value will <em>not</em> match the public object's ETag. Consumers MUST NOT use it
68+
/// for integrity checks or HTTP cache validation against the public bucket; use the CDN response's
69+
/// own ETag for that. It is safe to use to detect whether a bundle changed between manifest reads.
70+
/// </remarks>
71+
public required string ETag { get; init; }
72+
}
73+
74+
[JsonSourceGenerationOptions(
75+
WriteIndented = true,
76+
PropertyNamingPolicy = JsonKnownNamingPolicy.SnakeCaseLower,
77+
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull
78+
)]
79+
[JsonSerializable(typeof(RegistryIndex))]
80+
[JsonSerializable(typeof(RegistryBundle))]
81+
public sealed partial class RegistryIndexJsonContext : JsonSerializerContext;

0 commit comments

Comments
 (0)