Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/cli-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -2784,7 +2784,7 @@
"name": "plan",
"type": "boolean",
"required": false,
"summary": "Emit GitHub Actions step outputs (needs_network, needs_github_token, output_path) describing network requirements and the resolved output path, then exit without generating the bundle. Intended for CI actions.",
"summary": "Emit GitHub Actions step outputs (needs_network, needs_github_token, output_path, and cdn_url when a product is resolvable) describing network requirements, the resolved output path, and the public CDN URL of the scrubbed bundle, then exit without generating the bundle. Intended for CI actions.",
"defaultValue": "false"
},
{
Expand Down
18 changes: 18 additions & 0 deletions docs/contribute/configure-changelogs-ref.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ These settings are relevant to one or all of the `changelog bundle`, `changelog
| `bundle.release_dates` | When `true`, bundles include a `release-date` field (default: true). |
| `bundle.repo` | Default GitHub repository name (for example, `elasticsearch`). Used by the `{changelog}` directive to generate correct PR and issue links. Only needed when the product ID doesn't match the GitHub repository name. |
| `bundle.resolve` | When `true`, changelog contents are copied into bundle (default: `true`). |
| `bundle.use_local_changelogs` | When `true`, always source entries from the local folder and never from the CDN (default: `false`). Refer to [Entry sourcing](#bundle-entry-sourcing). |

:::

Expand All @@ -63,6 +64,23 @@ When `bundle.link_allow_repos` is omitted, no link filtering occurs.
- For public repos, add your `owner/repo` to the list at a minimum.
:::

### Entry sourcing [bundle-entry-sourcing]

`changelog bundle` reads the individual changelog entries it aggregates either from the local folder or from the public CDN. CDN sourcing is **opt-in per product** (declared-gate): an entry is pulled from the CDN only when its product is declared under `release_notes` in the docset's `docset.yml`.

```yaml
# docset.yml
release_notes:
- product: elasticsearch
```

Sourcing is decided per run:

- **Local folder (default).** Used when no product in scope is declared under `release_notes`, when `bundle.use_local_changelogs: true`, when `--directory` is passed, or when the filter resolves no concrete product (for example, `--all` or PR/issue-only filters). The folder must contain the changelog files.
- **CDN.** Used only when every product in scope is declared under `release_notes` and none of the local-sourcing conditions above apply. The product must also exist in [products.yml](https://github.com/elastic/docs-builder/blob/main/config/products.yml) with the `release-notes` feature enabled.

The product ID under `release_notes` matches the product format described in [](/cli/changelog/bundle.md#product-format). This is the same declaration the `{changelog}` directive's `:cdn:` mode consumes, so a repository that opts into CDN-sourced bundling and CDN-rendered release notes declares each product once.

### Bundle descriptions [bundle-descriptions]

You can add introductory text to bundles using the `description` field. This text appears at the top of rendered changelogs, after the release heading but before the entry sections.
Expand Down
22 changes: 22 additions & 0 deletions docs/development/changelog-bundle-registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,28 @@ build that includes this feature.

Consumers must therefore treat a missing bundle as non-fatal (skip + warn), not an error.

## `changelog bundle` entry sourcing (declared-gate)

The `changelog bundle` command aggregates individual changelog **entries**. It can read those
entries from the local folder or fetch a product's published entries from the CDN
(`{product}/changelog/registry.json` → `{product}/changelog/{file}`, via `CdnChangelogEntryFetcher`).

CDN entry sourcing is **opt-in per product** (a *declared-gate*): a product's entries are pulled from
the CDN only when that product is declared under `release_notes` in the repo's `docset.yml` — the same
declaration the directive consumes. The decision is made per run by `ChangelogBundlingService`:

- **Local folder** when `bundle.use_local_changelogs: true`, when `--directory` is passed, when no
concrete product is in scope (for example `--all` or PR/issue-only filters), or when any in-scope
product is **not** declared under `release_notes`.
- **CDN** only when every in-scope product is declared. The declared set is read with
`DocumentationSetFile.LoadMetadata` from the known docset locations (repo root or `docs/`).

The same gate drives the `--plan` `needs_network` output, so a planning step and the actual bundle
run agree on whether the Docker bundle needs network access. The registry-fetch is fail-fast and an
entry still missing after its retry budget fails the bundle (an incomplete release would otherwise
ship silently). `CdnChangelogEntryFetcher` reuses a shared `HttpClient` in production and disposes an
owned client only when a test injects a handler, mirroring `CdnChangelogFetcher`.

## Consumer: `{changelog}` directive `cdn:` mode (implemented)

### Syntax
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@ public record BundleConfiguration
/// </summary>
public string? OutputDirectory { get; init; }

/// <summary>
/// When true, the individual changelog entries that make up a bundle are sourced from the local
/// <see cref="Directory"/>. When false (the default), they are fetched from the public changelog
/// CDN, scoped to the bundle's products. An explicit <c>--directory</c> on the CLI always forces
/// local sourcing regardless of this setting.
/// </summary>
public bool UseLocalChangelogs { get; init; }

/// <summary>
/// Whether to resolve (copy contents of each changelog file into the entries array).
/// Defaults to true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,7 @@ private static PivotConfiguration ConvertPivot(PivotConfigurationYaml yamlPivot)
{
Directory = yaml.Directory,
OutputDirectory = yaml.OutputDirectory,
UseLocalChangelogs = yaml.UseLocalChangelogs ?? false,
Resolve = yaml.Resolve ?? true,
Description = yaml.Description,
Repo = yaml.Repo,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,12 @@ internal sealed record BundleConfigurationYaml
/// </summary>
public string? OutputDirectory { get; set; }

/// <summary>
/// When true, source the individual changelog entries that make up a bundle from the local
/// <see cref="Directory"/> instead of the public CDN. Defaults to false (CDN sourcing).
/// </summary>
public bool? UseLocalChangelogs { get; set; }

/// <summary>
/// Whether to resolve (copy contents) by default.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
// Licensed to Elasticsearch B.V under one or more agreements.
// Elasticsearch B.V licenses this file to you under the Apache 2.0 License.
// See the LICENSE file in the project root for more information

using System.Net;
using System.Text.Json;
using Microsoft.Extensions.Logging;

namespace Elastic.Documentation.Configuration.ReleaseNotes;

/// <summary>
/// One downloaded changelog entry: the registry file name and its raw YAML content.
/// </summary>
public readonly record struct CdnChangelogEntry(string FileName, string Content);

/// <summary>
/// Fetches the individual (scrubbed) changelog entries for a single product from the public CDN, for
/// the <c>changelog bundle</c> command when sourcing entries from S3 rather than a local folder. It
/// reads <c>{base}/{product}/changelog/registry.json</c> to enumerate entries and downloads each
/// <c>{base}/{product}/changelog/{file}</c> as raw YAML; the bundle command then applies its usual
/// filter (products / prs / issues) to the downloaded set.
/// </summary>
/// <remarks>
/// <para>
/// A registry that cannot be fetched or parsed is a hard error (the caller gets an empty list and an
/// emitted error). An individual entry that the registry lists but the CDN does not yet serve is
/// retried a few times with short backoff (and cache-busting, to defeat any CloudFront negative-cache)
/// to ride out the brief upload→scrub→propagate window. If it still cannot be fetched after the retry
/// budget it is escalated to an error, not skipped: the registry asserts the entry exists (uploads
/// never prune) and scrubbing is sub-second, so a persistent miss is a real pipeline problem and
/// silently shipping an incomplete release bundle is worse than failing the run.
/// </para>
/// </remarks>
public sealed class CdnChangelogEntryFetcher : IDisposable
{
private const int SupportedSchemaVersion = 1;

/// <summary>Total GET attempts per entry (1 initial + retries). ~3.5s budget at the default backoff.</summary>
private const int DefaultMaxAttempts = 4;
private const int BaseRetryDelayMs = 500;
private const int MaxRetryDelayMs = 2000;

/// <summary>
/// Bounds an individual registry/entry HTTP request so a stalled CDN connection cannot hang a bundle run.
/// </summary>
private static readonly TimeSpan FetchTimeout = TimeSpan.FromSeconds(30);

/// <summary>
/// Process-wide client shared by every fetcher built for the production (no injected handler) path.
/// <see cref="HttpClient"/> is thread-safe and intended to be long-lived; a single static instance avoids
/// leaking a socket handle per fetch, and <see cref="SocketsHttpHandler.PooledConnectionLifetime"/>
/// bounds DNS staleness. It is intentionally never disposed — it lives for the lifetime of the process.
/// </summary>
private static readonly HttpClient SharedHttpClient = new(
new SocketsHttpHandler
{
AutomaticDecompression = DecompressionMethods.All,
PooledConnectionLifetime = TimeSpan.FromMinutes(5)
})
{ Timeout = FetchTimeout };

private readonly ILogger _logger;
private readonly HttpClient _httpClient;
private readonly int _maxAttempts;
private readonly Func<TimeSpan, Cancel, Task> _sleep;

/// <summary>
/// Non-null only when a caller injects its own <see cref="HttpMessageHandler"/> (tests): in that case we
/// own a per-instance client and must dispose it. On the production path <see cref="_httpClient"/> points
/// at <see cref="SharedHttpClient"/>, which is never disposed.
/// </summary>
private readonly HttpClient? _ownedHttpClient;

public CdnChangelogEntryFetcher(
ILoggerFactory logFactory,
HttpMessageHandler? handler = null,
int maxAttempts = DefaultMaxAttempts,
Func<TimeSpan, Cancel, Task>? sleep = null)
{
_logger = logFactory.CreateLogger<CdnChangelogEntryFetcher>();
_maxAttempts = maxAttempts < 1 ? DefaultMaxAttempts : maxAttempts;
_sleep = sleep ?? DefaultSleepAsync;

if (handler is null)
_httpClient = SharedHttpClient;
else
{
// disposeHandler: false — the injected handler is owned by the caller (tests), not by us.
_ownedHttpClient = new HttpClient(handler, disposeHandler: false) { Timeout = FetchTimeout };
_httpClient = _ownedHttpClient;
}
}

/// <summary>
/// Downloads the changelog entries for <paramref name="product"/> from the CDN at
/// <paramref name="baseUri"/>. Returns an empty list after emitting an error when the registry cannot
/// be read or when a registry-listed entry cannot be fetched within the retry budget. Entries are
/// returned in registry order; the caller owns filtering and de-duplication.
/// </summary>
public async Task<IReadOnlyList<CdnChangelogEntry>> FetchAsync(
Uri baseUri,
string product,
Action<string> emitError,
Action<string> emitWarning,
Cancel ctx)
{
var registryUri = Combine(baseUri, product, "changelog", "registry.json");

ChangelogRegistry? registry;
try
{
registry = await FetchRegistryAsync(registryUri, ctx).ConfigureAwait(false);
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
emitError($"Could not fetch changelog entry registry for product '{product}' from {registryUri}: {ex.Message}");
return [];
}

if (registry is null)
{
emitError($"Changelog entry registry for product '{product}' at {registryUri} was empty or unparseable.");
return [];
}

if (registry.SchemaVersion > SupportedSchemaVersion)
{
emitError(
$"Changelog entry registry for product '{product}' uses schema version {registry.SchemaVersion}, but this build only understands version {SupportedSchemaVersion}. Update docs-builder.");
return [];
}

var entries = new List<CdnChangelogEntry>(registry.Bundles.Count);
foreach (var entry in registry.Bundles)
{
ctx.ThrowIfCancellationRequested();

var fileName = entry.File;
if (string.IsNullOrWhiteSpace(fileName) || !IsSafeFileName(fileName))
{
emitWarning($"Changelog entry registry for '{product}' lists an invalid file name '{fileName}'; skipping.");
continue;
}

var entryUri = Combine(baseUri, product, "changelog", fileName);
var (fetched, content, lastError) = await TryFetchEntryAsync(entryUri, fileName, product, ctx).ConfigureAwait(false);
if (fetched)
{
entries.Add(new CdnChangelogEntry(fileName, content));
continue;
}

// The registry lists this entry, so it exists in the private bucket and should have been
// scrubbed to the public one within milliseconds. Still missing after the retry budget means
// a genuine propagation/scrub failure — fail rather than ship a bundle missing this entry.
emitError(
$"Changelog entry '{fileName}' for product '{product}' is listed in the registry but could not be fetched from {entryUri} after {_maxAttempts} attempt(s): {lastError}. " +
"The scrubbed copy may not have propagated to the CDN yet; retry shortly, and if it persists check the changelog scrubber pipeline.");
return [];
}

_logger.LogInformation("Fetched {Count} changelog entry(ies) for {Product} from {BaseUri}", entries.Count, product, baseUri);
return entries;
}

/// <summary>
/// Fetches a single entry, retrying transient failures (most importantly a not-yet-propagated 404)
/// up to <see cref="_maxAttempts"/> times with exponential backoff. Retry requests are cache-busted
/// so a CloudFront-cached 404 cannot pin the result for the whole window.
/// </summary>
private async Task<(bool Fetched, string Content, string? LastError)> TryFetchEntryAsync(Uri uri, string fileName, string product, Cancel ctx)
{
string? lastError = null;

for (var attempt = 1; attempt <= _maxAttempts; attempt++)
{
ctx.ThrowIfCancellationRequested();
try
{
var content = await FetchTextAsync(uri, attempt, ctx).ConfigureAwait(false);
if (attempt > 1)
_logger.LogInformation("Fetched changelog entry '{File}' for {Product} on attempt {Attempt}/{Max}", fileName, product, attempt, _maxAttempts);
return (true, content, null);
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
lastError = ex.Message;
if (attempt >= _maxAttempts)
break;

var delay = RetryDelay(attempt);
_logger.LogDebug(
"Changelog entry '{File}' for {Product} not yet available (attempt {Attempt}/{Max}: {Error}); retrying in {Delay}",
fileName, product, attempt, _maxAttempts, ex.Message, delay);
await _sleep(delay, ctx).ConfigureAwait(false);
}
}

return (false, string.Empty, lastError);
}

private async Task<ChangelogRegistry?> FetchRegistryAsync(Uri registryUri, Cancel ctx)
{
_logger.LogInformation("Fetching changelog entry registry {RegistryUri}", registryUri);
using var request = new HttpRequestMessage(HttpMethod.Get, registryUri);
using var response = await _httpClient.SendAsync(request, ctx).ConfigureAwait(false);
_ = response.EnsureSuccessStatusCode();
await using var stream = await response.Content.ReadAsStreamAsync(ctx).ConfigureAwait(false);
return await JsonSerializer.DeserializeAsync(stream, ChangelogRegistryJsonContext.Default.ChangelogRegistry, ctx).ConfigureAwait(false);
}

private async Task<string> FetchTextAsync(Uri uri, int attempt, Cancel ctx)
{
// Only bust the cache on retries: the first hit should use the CDN cache normally (the common,
// already-propagated case); retries explicitly want to bypass any cached 404.
var requestUri = attempt > 1 ? WithCacheBuster(uri) : uri;
using var request = new HttpRequestMessage(HttpMethod.Get, requestUri);
if (attempt > 1)
_ = request.Headers.TryAddWithoutValidation("Cache-Control", "no-cache");
using var response = await _httpClient.SendAsync(request, ctx).ConfigureAwait(false);
_ = response.EnsureSuccessStatusCode();
return await response.Content.ReadAsStringAsync(ctx).ConfigureAwait(false);
}

private static TimeSpan RetryDelay(int attempt)
{
// attempt is 1-based; first retry waits BaseRetryDelayMs, doubling up to the cap.
var ms = Math.Min(BaseRetryDelayMs * (1L << (attempt - 1)), MaxRetryDelayMs);
return TimeSpan.FromMilliseconds(ms);
}

private static async Task DefaultSleepAsync(TimeSpan delay, Cancel ctx)
{
if (delay > TimeSpan.Zero)
await Task.Delay(delay, ctx).ConfigureAwait(false);
}

private static Uri WithCacheBuster(Uri uri)
{
var separator = string.IsNullOrEmpty(uri.Query) ? "?" : "&";
return new Uri($"{uri.AbsoluteUri}{separator}_={DateTimeOffset.UtcNow.Ticks:x}");
}

private static Uri Combine(Uri baseUri, params string[] segments)
{
var basePath = baseUri.AbsoluteUri.TrimEnd('/');
var suffix = string.Join('/', segments.Select(Uri.EscapeDataString));
return new Uri($"{basePath}/{suffix}");
}

/// <summary>
/// Guards against path traversal or nested keys sneaking in via the registry: an entry file name
/// must be a single path segment (the producer always writes <c>{product}/changelog/{file}</c>).
/// </summary>
private static bool IsSafeFileName(string fileName) =>
!fileName.Contains('/', StringComparison.Ordinal)
&& !fileName.Contains('\\', StringComparison.Ordinal)
&& fileName is not ("." or "..");

/// <summary>
/// Disposes the per-instance <see cref="HttpClient"/> created for an injected handler. The shared
/// production client (<see cref="SharedHttpClient"/>) is process-lived and intentionally not disposed.
/// </summary>
public void Dispose()
{
_ownedHttpClient?.Dispose();
GC.SuppressFinalize(this);
}
}
Loading
Loading