Skip to content

changelog: source bundle entries from the CDN with a per-product registry#3470

Open
cotti wants to merge 1 commit into
changelog_directive_s3from
changelog_bundle_s3_source
Open

changelog: source bundle entries from the CDN with a per-product registry#3470
cotti wants to merge 1 commit into
changelog_directive_s3from
changelog_bundle_s3_source

Conversation

@cotti

@cotti cotti commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Note

Stacked on changelog_directive_s3. Base this against that branch and merge it first; the diff shown here is only this change.

Summary

Source the individual changelog entries that make up a bundle from the public CDN by default (scoped to the bundle's product), instead of the local docs/changelog/ folder. This makes bundles reflect what was actually published to S3 — i.e. with private references scrubbed — and decouples bundle creation from a checkout that happens to hold every entry.

The bundle filter is content-based: whether an entry belongs in a bundle depends on the products / prs / issues fields inside the YAML, not on the file name. CloudFront has no ListObjects, so we publish a small per-product index on upload and the client enumerates → fetches → filters.

What changed

  • Per-product entry registry: upload now writes {product}/changelog/registry.json alongside the existing bundle registry (RegistryBuilder / RegistryKey); the scrubber key allow-list recognizes it.
  • CDN entry sourcing: new CdnChangelogEntryFetcher downloads the entry index and each entry; CDN base resolution is centralized in ChangelogCdn (shared with the changelog directive's cdn: mode).
  • Opt-out + safe fallback: bundle.use_local_changelogs: true forces the local folder, and we fall back to local automatically when no concrete product can scope the per-product CDN fetch.
  • --plan cdn_url: changelog bundle --plan now emits cdn_url ({base}/{product}/bundle/{file}) so CI can poll for the scrubbed bundle. docs/cli-schema.json regenerated.
  • No silent gaps: a registry-listed entry that hasn't propagated to the CDN yet is retried with short backoff + cache-busting (defeats a CloudFront-cached 404); a persistent miss fails the bundle rather than shipping an incomplete release.

Verification

  • dotnet format --verify-no-changes: clean
  • dotnet publish -c Release (docs-builder): 0 trim/AOT warnings
  • Elastic.Documentation.Configuration.Tests (409/409) and Elastic.Changelog.Tests (732/732): pass
  • cli-schema.json matches -- __schema output

Test plan

  • upload publishes {product}/changelog/registry.json and individual entries
  • bundle (default) fetches scrubbed entries from the CDN and applies the content filter
  • bundle with use_local_changelogs: true uses the local folder
  • bundle --plan surfaces a correct cdn_url
  • A not-yet-propagated entry recovers via retry; a persistently missing entry fails the run

Made with Cursor

…stry

Source the individual changelog entries that make up a bundle from the
public CDN by default, scoped to the bundle's product(s), instead of the
local folder. Because the bundle filter is content-based (an entry's
products/prs/issues live inside the YAML, not its name) and CloudFront has
no ListObjects, a per-product entry index ({product}/changelog/registry.json)
is published on upload so the client can enumerate then fetch+filter.

- Add bundle.use_local_changelogs opt-out, plus automatic local fallback
  when no concrete product can scope the per-product CDN fetch.
- Extend RegistryBuilder/RegistryKey to write and pass-through the entry
  index (scrubber recognizes {product}/changelog/registry.json).
- Add CdnChangelogEntryFetcher and centralize CDN base resolution in
  ChangelogCdn (shared with the changelog directive's cdn: mode).
- Emit cdn_url from `changelog bundle --plan` so CI can poll for the
  scrubbed bundle ({base}/{product}/bundle/{file}).
- Harden entry sourcing: a registry-listed entry that has not yet
  propagated to the CDN is retried with short backoff and cache-busting;
  a persistent miss fails the bundle instead of silently shipping an
  incomplete release.

dotnet format, AOT publish (0 trim/AOT warnings), and the affected unit
tests all pass; cli-schema.json regenerated for the new --plan output.

Co-authored-by: Cursor <cursoragent@cursor.com>

@Mpdreamz Mpdreamz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few code-quality notes, then a broader design question worth discussing before this merges.

Code issues

HttpClient is never disposed in CdnChangelogEntryFetcher.
The class allocates an HttpClient in a field initializer but doesn't implement IDisposable, so the connection pool leaks in any reuse scenario (tests, future service mode). Should implement IDisposable and dispose the client, or accept an IHttpClientFactory.

Fetch() blocks the calling thread for all HTTP I/O.
FetchRegistry and FetchText use the synchronous HttpClient.Send() overload. Everything else in the service layer is async; this blocks a thread-pool thread for the full CDN round-trip per product.

Multi-paragraph XML doc comments on private methods.
FetchCdnEntries, ResolveCdnBundleUrl, and ResolvePrimaryProduct all have verbose <summary> blocks. Project style is one short line max (or none) on private methods.

PlanBundleAsync duplicates the CDN-producibility check.
The plan path and the bundle path compute "is a product resolvable?" with different expressions at different pipeline stages (pre- vs post-ApplyConfigDefaults). Currently correct, but there is no comment explaining the difference, so they are likely to diverge as the code evolves.


Design question — should this be declarative in docset.yml?

The current approach fetches changelog entries imperatively inside changelog bundle, driven by a use_local_changelogs escape hatch in changelog.yml. That works, but it means:

  • The fetch happens deep in the bundle command where concurrency is ad-hoc (synchronous loop over entries per product).
  • The build cannot validate or prefetch CDN requirements up-front — it discovers the dependency at execution time.
  • The changelog directive has nowhere to point the user on a fetch error; it can only say "fetch failed" rather than "declare this in your config".

Compare how crosslinks work: repos are declared in docset.yml under cross_links, docs-builder knows about the external dependency at startup, can fail fast if the index is unreachable, and can fetch all link indexes concurrently before any page is rendered.

A similar pattern for changelog sourcing might look like:

# docset.yml
release_notes:
  - repository: elastic/elasticsearch
  - repository: elastic/kibana

With that declaration:

  • docs-builder knows at startup which CDN products to pull, can fetch all registries and entries concurrently, and can fail fast before any directive tries to render.
  • The changelog directive just references an already-loaded in-memory set — no per-directive HTTP.
  • If a directive references a product not declared in docset.yml, it emits a clear error pointing the user at the config key rather than a raw CDN failure at render time.
  • changelog bundle reads the same declaration to know which products to source from the CDN, keeping the config in one place.

The use_local_changelogs flag would still make sense as an escape hatch, but the primary sourcing decision would be declared rather than implied by the bundle command's product filter.

Is this the direction you want to go, or is there a reason to keep it embedded in changelog.yml / the bundle command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants