Skip to content

Commit 654fcce

Browse files
cotticursoragent
andauthored
changelog: source bundle entries from the CDN with a per-product registry (#3470)
* changelog: source bundle entries from the CDN with a per-product registry Source the individual changelog entries that make up a bundle from the public CDN by default, scoped to the bundle's product(s), instead of the local folder. Because the bundle filter is content-based (an entry's products/prs/issues live inside the YAML, not its name) and CloudFront has no ListObjects, a per-product entry index ({product}/changelog/registry.json) is published on upload so the client can enumerate then fetch+filter. - Add bundle.use_local_changelogs opt-out, plus automatic local fallback when no concrete product can scope the per-product CDN fetch. - Extend RegistryBuilder/RegistryKey to write and pass-through the entry index (scrubber recognizes {product}/changelog/registry.json). - Add CdnChangelogEntryFetcher and centralize CDN base resolution in ChangelogCdn (shared with the changelog directive's cdn: mode). - Emit cdn_url from `changelog bundle --plan` so CI can poll for the scrubbed bundle ({base}/{product}/bundle/{file}). - Harden entry sourcing: a registry-listed entry that has not yet propagated to the CDN is retried with short backoff and cache-busting; a persistent miss fails the bundle instead of silently shipping an incomplete release. dotnet format, AOT publish (0 trim/AOT warnings), and the affected unit tests all pass; cli-schema.json regenerated for the new --plan output. Co-authored-by: Cursor <cursoragent@cursor.com> * changelog: gate CDN bundle sourcing on docset.yml release_notes Make `changelog bundle` source entries from the CDN only when every in-scope product is declared under `release_notes` in docset.yml, matching the directive's opt-in model. Undeclared products — and `use_local_changelogs` or `--directory` — fall back to local sourcing. The same declared-gate drives the `--plan` `needs_network` output so the planning step and bundle run agree. Also make CdnChangelogEntryFetcher IDisposable and fully async (shared HttpClient in production, owned client only for injected test handlers, async retry backoff), addressing the HttpClient review feedback. Updates bundle configuration/registry docs and tests. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 86a8ec4 commit 654fcce

22 files changed

Lines changed: 1415 additions & 61 deletions

docs/cli-schema.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2784,7 +2784,7 @@
27842784
"name": "plan",
27852785
"type": "boolean",
27862786
"required": false,
2787-
"summary": "Emit GitHub Actions step outputs (needs_network, needs_github_token, output_path) describing network requirements and the resolved output path, then exit without generating the bundle. Intended for CI actions.",
2787+
"summary": "Emit GitHub Actions step outputs (needs_network, needs_github_token, output_path, and cdn_url when a product is resolvable) describing network requirements, the resolved output path, and the public CDN URL of the scrubbed bundle, then exit without generating the bundle. Intended for CI actions.",
27882788
"defaultValue": "false"
27892789
},
27902790
{

docs/contribute/configure-changelogs-ref.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ These settings are relevant to one or all of the `changelog bundle`, `changelog
5353
| `bundle.release_dates` | When `true`, bundles include a `release-date` field (default: true). |
5454
| `bundle.repo` | Default GitHub repository name (for example, `elasticsearch`). Used by the `{changelog}` directive to generate correct PR and issue links. Only needed when the product ID doesn't match the GitHub repository name. |
5555
| `bundle.resolve` | When `true`, changelog contents are copied into bundle (default: `true`). |
56+
| `bundle.use_local_changelogs` | When `true`, always source entries from the local folder and never from the CDN (default: `false`). Refer to [Entry sourcing](#bundle-entry-sourcing). |
5657

5758
:::
5859

@@ -63,6 +64,23 @@ When `bundle.link_allow_repos` is omitted, no link filtering occurs.
6364
- For public repos, add your `owner/repo` to the list at a minimum.
6465
:::
6566

67+
### Entry sourcing [bundle-entry-sourcing]
68+
69+
`changelog bundle` reads the individual changelog entries it aggregates either from the local folder or from the public CDN. CDN sourcing is **opt-in per product** (declared-gate): an entry is pulled from the CDN only when its product is declared under `release_notes` in the docset's `docset.yml`.
70+
71+
```yaml
72+
# docset.yml
73+
release_notes:
74+
- product: elasticsearch
75+
```
76+
77+
Sourcing is decided per run:
78+
79+
- **Local folder (default).** Used when no product in scope is declared under `release_notes`, when `bundle.use_local_changelogs: true`, when `--directory` is passed, or when the filter resolves no concrete product (for example, `--all` or PR/issue-only filters). The folder must contain the changelog files.
80+
- **CDN.** Used only when every product in scope is declared under `release_notes` and none of the local-sourcing conditions above apply. The product must also exist in [products.yml](https://github.com/elastic/docs-builder/blob/main/config/products.yml) with the `release-notes` feature enabled.
81+
82+
The product ID under `release_notes` matches the product format described in [](/cli/changelog/bundle.md#product-format). This is the same declaration the `{changelog}` directive's `:cdn:` mode consumes, so a repository that opts into CDN-sourced bundling and CDN-rendered release notes declares each product once.
83+
6684
### Bundle descriptions [bundle-descriptions]
6785

6886
You can add introductory text to bundles using the `description` field. This text appears at the top of rendered changelogs, after the release heading but before the entry sections.

docs/development/changelog-bundle-registry.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,28 @@ build that includes this feature.
166166

167167
Consumers must therefore treat a missing bundle as non-fatal (skip + warn), not an error.
168168

169+
## `changelog bundle` entry sourcing (declared-gate)
170+
171+
The `changelog bundle` command aggregates individual changelog **entries**. It can read those
172+
entries from the local folder or fetch a product's published entries from the CDN
173+
(`{product}/changelog/registry.json``{product}/changelog/{file}`, via `CdnChangelogEntryFetcher`).
174+
175+
CDN entry sourcing is **opt-in per product** (a *declared-gate*): a product's entries are pulled from
176+
the CDN only when that product is declared under `release_notes` in the repo's `docset.yml` — the same
177+
declaration the directive consumes. The decision is made per run by `ChangelogBundlingService`:
178+
179+
- **Local folder** when `bundle.use_local_changelogs: true`, when `--directory` is passed, when no
180+
concrete product is in scope (for example `--all` or PR/issue-only filters), or when any in-scope
181+
product is **not** declared under `release_notes`.
182+
- **CDN** only when every in-scope product is declared. The declared set is read with
183+
`DocumentationSetFile.LoadMetadata` from the known docset locations (repo root or `docs/`).
184+
185+
The same gate drives the `--plan` `needs_network` output, so a planning step and the actual bundle
186+
run agree on whether the Docker bundle needs network access. The registry-fetch is fail-fast and an
187+
entry still missing after its retry budget fails the bundle (an incomplete release would otherwise
188+
ship silently). `CdnChangelogEntryFetcher` reuses a shared `HttpClient` in production and disposes an
189+
owned client only when a test injects a handler, mirroring `CdnChangelogFetcher`.
190+
169191
## Consumer: `{changelog}` directive `cdn:` mode (implemented)
170192

171193
### Syntax

src/Elastic.Documentation.Configuration/Changelog/BundleConfiguration.cs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,14 @@ public record BundleConfiguration
2121
/// </summary>
2222
public string? OutputDirectory { get; init; }
2323

24+
/// <summary>
25+
/// When true, the individual changelog entries that make up a bundle are sourced from the local
26+
/// <see cref="Directory"/>. When false (the default), they are fetched from the public changelog
27+
/// CDN, scoped to the bundle's products. An explicit <c>--directory</c> on the CLI always forces
28+
/// local sourcing regardless of this setting.
29+
/// </summary>
30+
public bool UseLocalChangelogs { get; init; }
31+
2432
/// <summary>
2533
/// Whether to resolve (copy contents of each changelog file into the entries array).
2634
/// Defaults to true

src/Elastic.Documentation.Configuration/Changelog/ChangelogConfigurationLoader.cs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,7 @@ private static PivotConfiguration ConvertPivot(PivotConfigurationYaml yamlPivot)
529529
{
530530
Directory = yaml.Directory,
531531
OutputDirectory = yaml.OutputDirectory,
532+
UseLocalChangelogs = yaml.UseLocalChangelogs ?? false,
532533
Resolve = yaml.Resolve ?? true,
533534
Description = yaml.Description,
534535
Repo = yaml.Repo,

src/Elastic.Documentation.Configuration/Changelog/ChangelogConfigurationYaml.cs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,12 @@ internal sealed record BundleConfigurationYaml
275275
/// </summary>
276276
public string? OutputDirectory { get; set; }
277277

278+
/// <summary>
279+
/// When true, source the individual changelog entries that make up a bundle from the local
280+
/// <see cref="Directory"/> instead of the public CDN. Defaults to false (CDN sourcing).
281+
/// </summary>
282+
public bool? UseLocalChangelogs { get; set; }
283+
278284
/// <summary>
279285
/// Whether to resolve (copy contents) by default.
280286
/// </summary>
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
// Licensed to Elasticsearch B.V under one or more agreements.
2+
// Elasticsearch B.V licenses this file to you under the Apache 2.0 License.
3+
// See the LICENSE file in the project root for more information
4+
5+
using System.Net;
6+
using System.Text.Json;
7+
using Microsoft.Extensions.Logging;
8+
9+
namespace Elastic.Documentation.Configuration.ReleaseNotes;
10+
11+
/// <summary>
12+
/// One downloaded changelog entry: the registry file name and its raw YAML content.
13+
/// </summary>
14+
public readonly record struct CdnChangelogEntry(string FileName, string Content);
15+
16+
/// <summary>
17+
/// Fetches the individual (scrubbed) changelog entries for a single product from the public CDN, for
18+
/// the <c>changelog bundle</c> command when sourcing entries from S3 rather than a local folder. It
19+
/// reads <c>{base}/{product}/changelog/registry.json</c> to enumerate entries and downloads each
20+
/// <c>{base}/{product}/changelog/{file}</c> as raw YAML; the bundle command then applies its usual
21+
/// filter (products / prs / issues) to the downloaded set.
22+
/// </summary>
23+
/// <remarks>
24+
/// <para>
25+
/// A registry that cannot be fetched or parsed is a hard error (the caller gets an empty list and an
26+
/// emitted error). An individual entry that the registry lists but the CDN does not yet serve is
27+
/// retried a few times with short backoff (and cache-busting, to defeat any CloudFront negative-cache)
28+
/// to ride out the brief upload→scrub→propagate window. If it still cannot be fetched after the retry
29+
/// budget it is escalated to an error, not skipped: the registry asserts the entry exists (uploads
30+
/// never prune) and scrubbing is sub-second, so a persistent miss is a real pipeline problem and
31+
/// silently shipping an incomplete release bundle is worse than failing the run.
32+
/// </para>
33+
/// </remarks>
34+
public sealed class CdnChangelogEntryFetcher : IDisposable
35+
{
36+
private const int SupportedSchemaVersion = 1;
37+
38+
/// <summary>Total GET attempts per entry (1 initial + retries). ~3.5s budget at the default backoff.</summary>
39+
private const int DefaultMaxAttempts = 4;
40+
private const int BaseRetryDelayMs = 500;
41+
private const int MaxRetryDelayMs = 2000;
42+
43+
/// <summary>
44+
/// Bounds an individual registry/entry HTTP request so a stalled CDN connection cannot hang a bundle run.
45+
/// </summary>
46+
private static readonly TimeSpan FetchTimeout = TimeSpan.FromSeconds(30);
47+
48+
/// <summary>
49+
/// Process-wide client shared by every fetcher built for the production (no injected handler) path.
50+
/// <see cref="HttpClient"/> is thread-safe and intended to be long-lived; a single static instance avoids
51+
/// leaking a socket handle per fetch, and <see cref="SocketsHttpHandler.PooledConnectionLifetime"/>
52+
/// bounds DNS staleness. It is intentionally never disposed — it lives for the lifetime of the process.
53+
/// </summary>
54+
private static readonly HttpClient SharedHttpClient = new(
55+
new SocketsHttpHandler
56+
{
57+
AutomaticDecompression = DecompressionMethods.All,
58+
PooledConnectionLifetime = TimeSpan.FromMinutes(5)
59+
})
60+
{ Timeout = FetchTimeout };
61+
62+
private readonly ILogger _logger;
63+
private readonly HttpClient _httpClient;
64+
private readonly int _maxAttempts;
65+
private readonly Func<TimeSpan, Cancel, Task> _sleep;
66+
67+
/// <summary>
68+
/// Non-null only when a caller injects its own <see cref="HttpMessageHandler"/> (tests): in that case we
69+
/// own a per-instance client and must dispose it. On the production path <see cref="_httpClient"/> points
70+
/// at <see cref="SharedHttpClient"/>, which is never disposed.
71+
/// </summary>
72+
private readonly HttpClient? _ownedHttpClient;
73+
74+
public CdnChangelogEntryFetcher(
75+
ILoggerFactory logFactory,
76+
HttpMessageHandler? handler = null,
77+
int maxAttempts = DefaultMaxAttempts,
78+
Func<TimeSpan, Cancel, Task>? sleep = null)
79+
{
80+
_logger = logFactory.CreateLogger<CdnChangelogEntryFetcher>();
81+
_maxAttempts = maxAttempts < 1 ? DefaultMaxAttempts : maxAttempts;
82+
_sleep = sleep ?? DefaultSleepAsync;
83+
84+
if (handler is null)
85+
_httpClient = SharedHttpClient;
86+
else
87+
{
88+
// disposeHandler: false — the injected handler is owned by the caller (tests), not by us.
89+
_ownedHttpClient = new HttpClient(handler, disposeHandler: false) { Timeout = FetchTimeout };
90+
_httpClient = _ownedHttpClient;
91+
}
92+
}
93+
94+
/// <summary>
95+
/// Downloads the changelog entries for <paramref name="product"/> from the CDN at
96+
/// <paramref name="baseUri"/>. Returns an empty list after emitting an error when the registry cannot
97+
/// be read or when a registry-listed entry cannot be fetched within the retry budget. Entries are
98+
/// returned in registry order; the caller owns filtering and de-duplication.
99+
/// </summary>
100+
public async Task<IReadOnlyList<CdnChangelogEntry>> FetchAsync(
101+
Uri baseUri,
102+
string product,
103+
Action<string> emitError,
104+
Action<string> emitWarning,
105+
Cancel ctx)
106+
{
107+
var registryUri = Combine(baseUri, product, "changelog", "registry.json");
108+
109+
ChangelogRegistry? registry;
110+
try
111+
{
112+
registry = await FetchRegistryAsync(registryUri, ctx).ConfigureAwait(false);
113+
}
114+
catch (Exception ex) when (ex is not OperationCanceledException)
115+
{
116+
emitError($"Could not fetch changelog entry registry for product '{product}' from {registryUri}: {ex.Message}");
117+
return [];
118+
}
119+
120+
if (registry is null)
121+
{
122+
emitError($"Changelog entry registry for product '{product}' at {registryUri} was empty or unparseable.");
123+
return [];
124+
}
125+
126+
if (registry.SchemaVersion > SupportedSchemaVersion)
127+
{
128+
emitError(
129+
$"Changelog entry registry for product '{product}' uses schema version {registry.SchemaVersion}, but this build only understands version {SupportedSchemaVersion}. Update docs-builder.");
130+
return [];
131+
}
132+
133+
var entries = new List<CdnChangelogEntry>(registry.Bundles.Count);
134+
foreach (var entry in registry.Bundles)
135+
{
136+
ctx.ThrowIfCancellationRequested();
137+
138+
var fileName = entry.File;
139+
if (string.IsNullOrWhiteSpace(fileName) || !IsSafeFileName(fileName))
140+
{
141+
emitWarning($"Changelog entry registry for '{product}' lists an invalid file name '{fileName}'; skipping.");
142+
continue;
143+
}
144+
145+
var entryUri = Combine(baseUri, product, "changelog", fileName);
146+
var (fetched, content, lastError) = await TryFetchEntryAsync(entryUri, fileName, product, ctx).ConfigureAwait(false);
147+
if (fetched)
148+
{
149+
entries.Add(new CdnChangelogEntry(fileName, content));
150+
continue;
151+
}
152+
153+
// The registry lists this entry, so it exists in the private bucket and should have been
154+
// scrubbed to the public one within milliseconds. Still missing after the retry budget means
155+
// a genuine propagation/scrub failure — fail rather than ship a bundle missing this entry.
156+
emitError(
157+
$"Changelog entry '{fileName}' for product '{product}' is listed in the registry but could not be fetched from {entryUri} after {_maxAttempts} attempt(s): {lastError}. " +
158+
"The scrubbed copy may not have propagated to the CDN yet; retry shortly, and if it persists check the changelog scrubber pipeline.");
159+
return [];
160+
}
161+
162+
_logger.LogInformation("Fetched {Count} changelog entry(ies) for {Product} from {BaseUri}", entries.Count, product, baseUri);
163+
return entries;
164+
}
165+
166+
/// <summary>
167+
/// Fetches a single entry, retrying transient failures (most importantly a not-yet-propagated 404)
168+
/// up to <see cref="_maxAttempts"/> times with exponential backoff. Retry requests are cache-busted
169+
/// so a CloudFront-cached 404 cannot pin the result for the whole window.
170+
/// </summary>
171+
private async Task<(bool Fetched, string Content, string? LastError)> TryFetchEntryAsync(Uri uri, string fileName, string product, Cancel ctx)
172+
{
173+
string? lastError = null;
174+
175+
for (var attempt = 1; attempt <= _maxAttempts; attempt++)
176+
{
177+
ctx.ThrowIfCancellationRequested();
178+
try
179+
{
180+
var content = await FetchTextAsync(uri, attempt, ctx).ConfigureAwait(false);
181+
if (attempt > 1)
182+
_logger.LogInformation("Fetched changelog entry '{File}' for {Product} on attempt {Attempt}/{Max}", fileName, product, attempt, _maxAttempts);
183+
return (true, content, null);
184+
}
185+
catch (Exception ex) when (ex is not OperationCanceledException)
186+
{
187+
lastError = ex.Message;
188+
if (attempt >= _maxAttempts)
189+
break;
190+
191+
var delay = RetryDelay(attempt);
192+
_logger.LogDebug(
193+
"Changelog entry '{File}' for {Product} not yet available (attempt {Attempt}/{Max}: {Error}); retrying in {Delay}",
194+
fileName, product, attempt, _maxAttempts, ex.Message, delay);
195+
await _sleep(delay, ctx).ConfigureAwait(false);
196+
}
197+
}
198+
199+
return (false, string.Empty, lastError);
200+
}
201+
202+
private async Task<ChangelogRegistry?> FetchRegistryAsync(Uri registryUri, Cancel ctx)
203+
{
204+
_logger.LogInformation("Fetching changelog entry registry {RegistryUri}", registryUri);
205+
using var request = new HttpRequestMessage(HttpMethod.Get, registryUri);
206+
using var response = await _httpClient.SendAsync(request, ctx).ConfigureAwait(false);
207+
_ = response.EnsureSuccessStatusCode();
208+
await using var stream = await response.Content.ReadAsStreamAsync(ctx).ConfigureAwait(false);
209+
return await JsonSerializer.DeserializeAsync(stream, ChangelogRegistryJsonContext.Default.ChangelogRegistry, ctx).ConfigureAwait(false);
210+
}
211+
212+
private async Task<string> FetchTextAsync(Uri uri, int attempt, Cancel ctx)
213+
{
214+
// Only bust the cache on retries: the first hit should use the CDN cache normally (the common,
215+
// already-propagated case); retries explicitly want to bypass any cached 404.
216+
var requestUri = attempt > 1 ? WithCacheBuster(uri) : uri;
217+
using var request = new HttpRequestMessage(HttpMethod.Get, requestUri);
218+
if (attempt > 1)
219+
_ = request.Headers.TryAddWithoutValidation("Cache-Control", "no-cache");
220+
using var response = await _httpClient.SendAsync(request, ctx).ConfigureAwait(false);
221+
_ = response.EnsureSuccessStatusCode();
222+
return await response.Content.ReadAsStringAsync(ctx).ConfigureAwait(false);
223+
}
224+
225+
private static TimeSpan RetryDelay(int attempt)
226+
{
227+
// attempt is 1-based; first retry waits BaseRetryDelayMs, doubling up to the cap.
228+
var ms = Math.Min(BaseRetryDelayMs * (1L << (attempt - 1)), MaxRetryDelayMs);
229+
return TimeSpan.FromMilliseconds(ms);
230+
}
231+
232+
private static async Task DefaultSleepAsync(TimeSpan delay, Cancel ctx)
233+
{
234+
if (delay > TimeSpan.Zero)
235+
await Task.Delay(delay, ctx).ConfigureAwait(false);
236+
}
237+
238+
private static Uri WithCacheBuster(Uri uri)
239+
{
240+
var separator = string.IsNullOrEmpty(uri.Query) ? "?" : "&";
241+
return new Uri($"{uri.AbsoluteUri}{separator}_={DateTimeOffset.UtcNow.Ticks:x}");
242+
}
243+
244+
private static Uri Combine(Uri baseUri, params string[] segments)
245+
{
246+
var basePath = baseUri.AbsoluteUri.TrimEnd('/');
247+
var suffix = string.Join('/', segments.Select(Uri.EscapeDataString));
248+
return new Uri($"{basePath}/{suffix}");
249+
}
250+
251+
/// <summary>
252+
/// Guards against path traversal or nested keys sneaking in via the registry: an entry file name
253+
/// must be a single path segment (the producer always writes <c>{product}/changelog/{file}</c>).
254+
/// </summary>
255+
private static bool IsSafeFileName(string fileName) =>
256+
!fileName.Contains('/', StringComparison.Ordinal)
257+
&& !fileName.Contains('\\', StringComparison.Ordinal)
258+
&& fileName is not ("." or "..");
259+
260+
/// <summary>
261+
/// Disposes the per-instance <see cref="HttpClient"/> created for an injected handler. The shared
262+
/// production client (<see cref="SharedHttpClient"/>) is process-lived and intentionally not disposed.
263+
/// </summary>
264+
public void Dispose()
265+
{
266+
_ownedHttpClient?.Dispose();
267+
GC.SuppressFinalize(this);
268+
}
269+
}

0 commit comments

Comments
 (0)