changelog: make generated entry YAMLs self-describing for the scrubber#3417
Conversation
The changelog-upload Lambda (`docs-lambda-changelog-scrubber`) reads each
`{product}/changelogs/*.yaml` from the private S3 bucket and applies the
allowlist sanitizer before mirroring to the public bucket. Because each
per-entry YAML carries no embedded repo context, it calls
`LinkAllowlistSanitizer.TryApplyChangelogEntry(..., defaultOwner: "elastic",
defaultRepo: null, ...)`.
For an entry produced by the upload action's fork-PR re-derivation step —
docs-builder changelog add --concise --use-pr-number \
--owner elastic --repo cloud --prs 155500
— `BuildChangelogData` stored the bare PR number verbatim in the entry's
`Prs` list, so the written YAML contained `prs: ['155500']`. When the
Lambda later tried to scrub it, `ChangelogTextUtilities.TryGetGitHubRepo`
couldn't resolve a repo for the bare number (no `defaultRepo`), the
sanitizer emitted an error and aborted, and SQS retried the message
indefinitely with the same outcome:
System.InvalidOperationException: Failed to apply allowlist to
changelog entry; errors: 1
Two fixes here, with tests:
1. `ChangelogFileWriter.NormalizeReferences` — when `--owner` and `--repo`
are supplied and a `--prs` / `--issues` value is just a number, expand
it to `https://github.com/{owner}/{repo}/{pull|issues}/N` before
writing the YAML. Bundle-style multi-repo strings (`a+b`) and
pre-qualified `org/repo` values are left alone to avoid producing
wrong URLs. Full URLs and `owner/repo#N` short-forms pass through
unchanged. The validator already requires `--owner` and `--repo` when
`--prs` is bare, so this is a strict superset of accepted input.
2. `LinkAllowlistSanitizer.FilterReferenceList` (defense in depth) — if
we encounter a bare numeric reference and `defaultRepo` is unknown,
keep the reference as-is and emit a warning instead of failing the
entry. A bare number carries no repo identity, so it cannot leak a
private link on its own, and downstream rendering supplies owner/repo
from runtime context. Genuinely unparseable references (e.g.
`not-a-pr-ref`) still hit the original fail-closed error branch so
schema regressions remain visible.
Together these unblock the already-uploaded entries that the Lambda is
currently dead-lettering, and prevent the same shape of bad YAML from
ever being produced again.
Verified `dotnet format`, `./build.sh build --skip-dirty-check`, AOT
publish (`dotnet publish src/tooling/docs-builder -c Release`), the full
`Elastic.Changelog.Tests` suite (689 passing), and `docs/cli-schema.json`
regeneration shows no change (the CLI surface is unchanged).
Refs: elastic/cloud changelog-upload SQS DLQ for
`cloud-hosted/changelogs/155500.yaml`
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Actionable comments posted: 0 |
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR adds support for expanding bare numeric pull request and issue references into full GitHub URLs when owner and repo context is available. The core 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
The changelog-upload Lambda (
docs-lambda-changelog-scrubber) reads each{product}/changelogs/*.yamlfrom the private S3 bucket and applies the allowlist sanitizer before mirroring to the public bucket. Because each per-entry YAML carries no embedded repo context, it callsLinkAllowlistSanitizer.TryApplyChangelogEntry(..., defaultOwner: \"elastic\", defaultRepo: null, ...).For an entry produced by the upload action's fork-PR re-derivation step —
—
BuildChangelogDatastored the bare PR number verbatim in the entry'sPrslist, so the written YAML containedprs: ['155500']. When the Lambda later tried to scrub it,ChangelogTextUtilities.TryGetGitHubRepocouldn't resolve a repo for the bare number (nodefaultRepo), the sanitizer emitted an error and aborted, and SQS retried the message indefinitely with the same outcome:(Observed today in CloudWatch — an entry never reached the public CDN bucket.)
Changes
Two fixes here, with tests:
ChangelogFileWriter.NormalizeReferences— when--ownerand--repoare supplied and a--prs/--issuesvalue is just a number, expand it tohttps://github.com/{owner}/{repo}/{pull|issues}/Nbefore writing the YAML. Bundle-style multi-repo strings (a+b) and pre-qualifiedorg/repovalues are left alone to avoid producing wrong URLs. Full URLs andowner/repo#Nshort-forms pass through unchanged. The validator already requires--ownerand--repowhen--prsis bare, so this is a strict superset of accepted input.LinkAllowlistSanitizer.FilterReferenceList(defense in depth) — if we encounter a bare numeric reference anddefaultRepois unknown, keep the reference as-is and emit a warning instead of failing the entry. A bare number carries no repo identity, so it cannot leak a private link on its own, and downstream rendering supplies owner/repo from runtime context. Genuinely unparseable references (e.g.not-a-pr-ref) still hit the original fail-closed error branch so schema regressions remain visible.Together these unblock the already-uploaded entries that the Lambda is currently dead-lettering, and prevent the same shape of bad YAML from ever being produced again.
Test plan
NormalizeReferencesTests(10 cases) cover bare-number expansion, missing-context fall-through, full URL pass-through, short-form pass-through, bundle-style/pre-qualified repo guards, whitespace trimming, and mixed lists.TryApplyChangelogEntry_BarePrNumberWithoutDefaultRepo_KeptWithWarning/_BareIssueNumberWithoutDefaultRepo_KeptWithWarning/_UnparseableNonNumericRef_StillErrorscases inLinkAllowlistSanitizerTests.PrIntegrationTests.CreateChangelog_WithBareNumberPrAndOwnerRepo_WritesFullUrlIntoYamlmirrors the exact upload-action invocation.Elastic.Changelog.Testssuite: 689 passing.dotnet format,./build.sh build --skip-dirty-check,dotnet publish src/tooling/docs-builder -c Release(AOT) all clean.docs/cli-schema.json— no diff (CLI surface unchanged).Roll-out
docs-builder changelog add --owner X --repo Y --prs Nwill be self-describing immediately, so even an old Lambda would scrub them correctly.Made with Cursor