Skip to content

Commit d5647c3

Browse files
authored
Merge pull request #2834 from Hack23/copilot/improve-executive-briefs-generation
Localize executive-brief lead in news HTML + sharpen brief headline tradecraft
2 parents a4ba603 + 76bb9a8 commit d5647c3

7 files changed

Lines changed: 448 additions & 2 deletions

File tree

Article-Generation.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -664,6 +664,16 @@ For each of the 14 supported languages, the renderer resolves the `<title>` and
664664

665665
Authoritative editorial rule: **localized title and description are highlights of the localized executive brief**. The runtime enforcement of this rule lives in [`scripts/render-lib/aggregator/seo/localized-brief.ts`](scripts/render-lib/aggregator/seo/localized-brief.ts) — a pure-function bounded context whose `extractLocalizedBriefSeo({ briefMarkdown, subfolder })` returns `{ title, description }` candidates derived from `executive-brief_<lang>.md` H1 + BLUF. Banned-phrase H1s (`REPLACE THIS H1`, `Executive Brief Template`, `AI_MUST_REPLACE`, `AI-generated political intelligence`) and bare boilerplate `Executive Brief` are rejected in lock-step with `scripts/agentic/analysis-gate.ts § checkExecutiveBrief`, so a translator stub cannot leak into the SERP `<title>` via the localized cascade. The merger in [`scripts/render-lib/article-merge.ts`](scripts/render-lib/article-merge.ts) overlays the brief-derived fields on top of the per-type agent's `article.<lang>.md` front-matter; each field is independent, so a clean BLUF localizes the description even when the brief H1 is rejected. This is enforced by `scripts/validate-executive-brief-translations.ts` for the brief itself and by `tests/localized-brief-seo.test.ts` + `tests/article-merge.test.ts` for the rendered HTML.
666666

667+
#### On-page lead localization — the body opens in the reader's language
668+
669+
The `<title>`/`<meta description>` cascade above governs **metadata**. The same "localized brief if it exists, English otherwise" rule is applied to the **on-page lead** — the first section a reader actually sees — by [`scripts/render-lib/article-brief-lead.ts`](scripts/render-lib/article-brief-lead.ts), a pure (no-I/O) markdown transform `renderArticleHtml` runs once per target language:
670+
671+
1. **Carrier stripping (all languages).** Because the aggregator splices *every* `.md` sibling into `article.md`, the 13 localized briefs are embedded as trailing `## Executive Brief Sv`, `## Executive Brief De`, … carrier sections. `stripEmbeddedLocalizedBriefSections()` removes **all** of them for every language so no page ships 13 foreign-language brief copies inline (this dropped a representative proposition page from ~428 KB to ~307 KB). The carriers are functionally dead weight in the HTML — the SEO cascade reads `executive-brief_<lang>.md` from disk, not from these embedded copies.
672+
2. **Lead swap (non-English with a localized brief).** For a non-English target whose `executive-brief_<lang>.md` exists, the body of the first `<h2>` lead section (`## What Happened`) is replaced with the cleaned localized brief (same `cleanArtifactBody()` + `rewriteRelativeLinks()` pipeline the aggregator uses — `normalizeNarrativeTerminology()` is deliberately **not** run so English first-use glosses never leak into localized prose). The lead **heading** stays the language-stable English `## What Happened` (the in-article TOC localizes its label separately, per `section-title-i18n.ts`); only the lead **body** becomes localized. The lead's provenance comment is repointed from `executive-brief.md` to `executive-brief_<lang>.md`.
673+
3. **English / missing / empty fallback.** English keeps its canonical `## What Happened` lead verbatim. A non-English target with no localized brief (or a whitespace-only one) also keeps the English lead — the page renders English lead content under a non-English `<html lang>` until the next `news-translate` run produces the localized brief, exactly mirroring metadata fallback layer #4.
674+
675+
Localized briefs are also excluded from the artifact list (`resolveArtifactList()` in `scripts/render-articles.ts` and `isReaderGuideEligible()` in `aggregator/reader-guide.ts`) so they no longer appear as Reader Intelligence Guide navigation rows (which would dangle now that the carrier sections are stripped), Article Sources provenance cards, or JSON-LD `isBasedOn` — they are translations of `executive-brief.md`, not independent analytical artifacts. This is covered by `tests/article-brief-lead.test.ts`.
676+
667677
#### `article.md` front-matter (canonical English source)
668678

669679
`scripts/render-lib/aggregator/aggregate.ts` writes the front-matter that the renderer subsequently consumes:

analysis/methodologies/ai-driven-analysis-guide.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,21 @@ Before article aggregation, every workflow must seed the story metadata that wil
208208
3. Add `## 🌐 14-Language SEO Metadata Seeds` to the executive brief when the workflow has enough evidence. Fill all 14 rows with short localized title angles, description angles and keyword seeds. If a language cannot be human-quality localized, write the English story topic plus the language label and mark it `[machine-assisted — verify]`; do not leave it blank.
209209
4. Pass 2 must read every language row back and confirm that the title/description is **contextual** (policy object, actor, consequence) rather than a date-stuffed duplicate.
210210

211+
> **The H1 is doing triple duty.** The `executive-brief.md` H1 and BLUF are not "just SEO". The renderer derives the page `<title>`, `<meta description>`, OG/Twitter cards and Schema.org `headline` from them **and** the brief body is now the **on-page lead** a reader meets first. For a localized page the renderer uses `executive-brief_<lang>.md` as the lead (localized if it exists, English otherwise — the same cascade as `<title>`/`<meta description>`), so the localized H1/BLUF must read as a finished, reader-attracting opening in that language, not a literal calque of the English. Write each H1 to earn the click and the first 10 seconds of reading attention.
212+
213+
#### 🪝 Headline tradecraft (write the title to be clicked, not just indexed)
214+
215+
Pass-1 may write a serviceable H1; Pass-2 must sharpen it. A strong title is **specific, consequential and curiosity-opening** without being clickbait:
216+
217+
- **Lead with the named actor and an active, concrete verb** — "Busch government tightens migration detention" beats "New propositions submitted".
218+
- **Name the stake / so-what** — quantify or sharpen the consequence ("…risking ECHR Article 8 review", "…shifting 12% of the housing budget").
219+
- **Front-load the highest-DIW finding**, never the document count or the date. The title must match the #1 finding the lede and significance scoring agree on.
220+
- **Stay inside the SERP-safe envelope** — 50–70 characters renders without truncation (the renderer trims long H1s at a word boundary and strips trailing connectors, so a title that *needs* its tail to make sense will lose meaning at ~70 chars).
221+
- **No date-stuffing, no admin metadata, no template scaffolding** (`REPLACE THIS H1`, `Executive Brief`, classification badges) — those are stripped/flagged and produce duplicate-looking SERP entries.
222+
- **Localize the angle, not the words** — each non-English title row should carry the same actor/verb/stake in idiomatic phrasing; a reader scanning the SV or JA page should feel the headline was written for them.
223+
224+
Mini self-check (apply to every language row in Pass-2): *Does it name an actor? An active verb? A concrete stake? Would a non-expert understand the so-what in one read? Is it ≤ 70 chars and free of dates/admin text?* Any "no" forces a rewrite.
225+
211226
Minimum row schema:
212227

213228
```markdown
@@ -406,7 +421,7 @@ Score your own output against this rubric before commit:
406421

407422
#### Article and SEO handoff
408423

409-
Before running `scripts/aggregate-analysis.ts`, ensure `executive-brief.md` has a publishable H1 and BLUF that can become `<title>` and `<meta description>` without repair: actor-first, active verb, no literal date, no admin metadata, 55–70 character title target and 140–200 character one-sentence description target. `synthesis-summary.md §Narrative Direction & Article Decision` should agree with that H1/BLUF so `article.md` reads as one coherent intelligence article.
424+
Before running `scripts/aggregate-analysis.ts`, ensure `executive-brief.md` has a publishable H1 and BLUF that can become `<title>` and `<meta description>` without repair: actor-first, active verb, no literal date, no admin metadata, 55–70 character title target and 140–200 character one-sentence description target. `synthesis-summary.md §Narrative Direction & Article Decision` should agree with that H1/BLUF so `article.md` reads as one coherent intelligence article. Apply the same bar to every `executive-brief_<lang>.md`: its H1/BLUF is the localized page's **on-page lead** (the first thing a SV/DE/JA/AR reader sees) as well as that page's localized `<title>`/`<meta description>` — so each localized brief must open with the same reader-attracting, actor-and-stake headline in idiomatic prose, never a literal calque. Run the 🪝 Headline-tradecraft self-check (Step 2B) against every language before aggregation.
410425

411426

412427
Read every file you produced in Steps 3–5. For each one, **improve every section**:

scripts/render-articles.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,14 @@ function resolveArtifactList(rc: RenderCase): readonly string[] {
230230
if (rel === 'pass1' || rel.startsWith('pass1/')) continue;
231231
walk(full, rel);
232232
} else if (/\.(md|json)$/i.test(e.name) && !/^article(?:\.[a-z-]+)?\.md$/i.test(e.name)) {
233+
// Skip localized executive-brief translation carriers
234+
// (`executive-brief_<lang>.md`). They are translations of the
235+
// English `executive-brief.md` — consumed by the SEO cascade and
236+
// the localized on-page lead — not independent analytical
237+
// artifacts, so they must not appear in the Reader Intelligence
238+
// Guide, the Article Sources provenance grid, or JSON-LD
239+
// `isBasedOn`.
240+
if (/^executive-brief_[a-z-]+\.md$/i.test(e.name)) continue;
233241
out.push(rel);
234242
}
235243
}

scripts/render-lib/aggregator/reader-guide.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,11 @@ export function selectReaderGuideArtifacts(available: ReadonlySet<string> | read
148148
const base = file.includes('/') ? file.slice(file.lastIndexOf('/') + 1) : file;
149149
if (base === 'README.md') return false;
150150
if (/^article(?:\.[a-z-]+)?\.md$/i.test(base)) return false;
151+
// Localized executive-brief translation carriers are derivative of
152+
// `executive-brief.md` (which already represents the lead). They are
153+
// not independent analytical sections, so they must never appear as
154+
// Reader Intelligence Guide navigation rows.
155+
if (/^executive-brief_[a-z-]+\.md$/i.test(base)) return false;
151156
return true;
152157
};
153158

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
/**
2+
* @module Infrastructure/RenderLib/ArticleBriefLead
3+
* @category Intelligence Operations / Supporting Infrastructure
4+
* @name Localized executive-brief lead substitution + carrier stripping
5+
*
6+
* @description
7+
* The aggregated `analysis/daily/$DATE/$SUB/article.md` is an
8+
* English-canonical document. It opens with the English executive brief
9+
* (the `## What Happened` lead section) and — because the aggregator
10+
* splices *every* `.md` sibling into the body — also embeds the 13
11+
* localized briefs (`executive-brief_<lang>.md`) as trailing
12+
* `## Executive Brief Sv`, `## Executive Brief De`, … carrier sections.
13+
*
14+
* Those carrier sections were never meant to render inline: they bloat
15+
* every published page (each carries the full brief in a foreign language)
16+
* and they leave a non-English reader meeting the *English* lead before
17+
* their own-language summary. The SEO cascade already localizes the
18+
* `<title>` / `<meta description>` from `executive-brief_<lang>.md`
19+
* (see `aggregator/seo/localized-brief.ts`); this module brings the
20+
* on-page **lead** into lock-step with that cascade.
21+
*
22+
* {@link localizeExecutiveBriefLead} is a pure (no-I/O) string transform
23+
* applied by `renderArticleHtml` to the article-markdown body per target
24+
* language. It:
25+
*
26+
* 1. removes every embedded `## Executive Brief <Lang>` carrier section
27+
* for **all** languages (English included); and
28+
* 2. for a non-English target with a localized brief, replaces the body
29+
* of the first `<h2>` lead section (`## What Happened`) with the
30+
* cleaned `executive-brief_<lang>.md` content so the reader's first
31+
* screen is entirely in their own language. When the localized brief
32+
* is absent, the English lead is left in place (the same "localized
33+
* if exists, English otherwise" rule the SEO cascade follows).
34+
*
35+
* The localized body is cleaned with the **same** pipeline the aggregator
36+
* uses for the carrier sections — `cleanArtifactBody` (front-matter / H1 /
37+
* admin-byline strip + `##` → `###` heading demotion) followed by
38+
* `rewriteRelativeLinks` — so the swapped-in lead is byte-identical to
39+
* what the aggregator would have embedded. Crucially it does **not** run
40+
* `normalizeNarrativeTerminology`, whose English first-use annotations
41+
* (`Riksdag document #… (HD…)`, `Lede`, confidence glosses) must never be
42+
* injected into localized prose.
43+
*
44+
* @author Hack23 AB (Infrastructure Team)
45+
* @license Apache-2.0
46+
*/
47+
48+
import type { Language } from '../types/language.js';
49+
import { LANGUAGES } from './constants.js';
50+
import { buildGithubBlobUrl } from './url-helpers.js';
51+
import {
52+
cleanArtifactBody,
53+
rewriteRelativeLinks,
54+
} from './aggregator/cleaning/structural.js';
55+
56+
/**
57+
* Title-cased single-segment language codes for the 13 non-English
58+
* locales, matching `prettifyFallbackTitle('executive-brief_<lang>.md')`
59+
* in `aggregator/order.ts` (e.g. `sv` → `Sv`, `no` → `No`, `zh` → `Zh`).
60+
* English is excluded — its brief renders as `## What Happened`, never as
61+
* a `## Executive Brief <Lang>` carrier.
62+
*/
63+
const LOCALIZED_BRIEF_TITLE_SUFFIXES: readonly string[] = LANGUAGES
64+
.filter((l) => l !== 'en')
65+
.map((l) => l.charAt(0).toUpperCase() + l.slice(1));
66+
67+
/**
68+
* Matches an embedded `## Executive Brief <Lang>` carrier section: the
69+
* heading line through every following line up to (but excluding) the
70+
* next `<h2>`. Mirrors the line-anchored sweep used by
71+
* `stripBodyDuplicateSections` so `###`/`# `/code-fence lines inside the
72+
* section are consumed while the next `## ` boundary stops the match.
73+
*/
74+
const EMBEDDED_BRIEF_SECTION_RE = new RegExp(
75+
String.raw`^##\s+Executive Brief (?:${LOCALIZED_BRIEF_TITLE_SUFFIXES.join('|')})\b[^\n]*\n(?:(?!^##\s)[^\n]*\n?)*`,
76+
'gim',
77+
);
78+
79+
/**
80+
* Strip all embedded `## Executive Brief <Lang>` carrier sections from an
81+
* article-markdown body. Applied for every language, English included.
82+
*/
83+
export function stripEmbeddedLocalizedBriefSections(content: string): string {
84+
const stripped = content.replace(EMBEDDED_BRIEF_SECTION_RE, '');
85+
// Collapse the blank-line run left where the carrier block used to sit.
86+
return `${stripped.replace(/\n{3,}/g, '\n\n').trimEnd()}\n`;
87+
}
88+
89+
interface LeadBounds {
90+
readonly headingLine: string;
91+
readonly firstH2: number;
92+
readonly secondH2: number;
93+
}
94+
95+
/** Locate the first and second `## ` (h2) line indices in a markdown body. */
96+
function findLeadBounds(lines: readonly string[]): LeadBounds | null {
97+
let firstH2 = -1;
98+
let secondH2 = -1;
99+
for (let i = 0; i < lines.length; i += 1) {
100+
if (/^##\s/.test(lines[i]!)) {
101+
if (firstH2 === -1) {
102+
firstH2 = i;
103+
} else {
104+
secondH2 = i;
105+
break;
106+
}
107+
}
108+
}
109+
if (firstH2 === -1) return null;
110+
return { headingLine: lines[firstH2]!, firstH2, secondH2 };
111+
}
112+
113+
/**
114+
* Replace the body of the first `<h2>` lead section with `localizedBody`,
115+
* keeping the original heading and repointing the provenance comment at
116+
* `executive-brief_<lang>.md`.
117+
*/
118+
function replaceLeadSectionBody(
119+
content: string,
120+
lang: Language,
121+
localizedBody: string,
122+
subfolderRepoRelPath: string,
123+
): string {
124+
const lines = content.split('\n');
125+
const bounds = findLeadBounds(lines);
126+
if (!bounds) return content;
127+
128+
const sourceRel = `executive-brief_${lang}.md`;
129+
const sourceUrl = subfolderRepoRelPath
130+
? buildGithubBlobUrl(`${subfolderRepoRelPath}/${sourceRel}`)
131+
: sourceRel;
132+
133+
const before = lines.slice(0, bounds.firstH2);
134+
const after = bounds.secondH2 === -1 ? [] : lines.slice(bounds.secondH2);
135+
136+
const leadBlock = [
137+
bounds.headingLine,
138+
`<!-- source: ${sourceRel} :: ${sourceUrl} -->`,
139+
'',
140+
localizedBody.trim(),
141+
'',
142+
];
143+
144+
return [...before, ...leadBlock, ...after].join('\n');
145+
}
146+
147+
export interface LocalizeExecutiveBriefLeadInput {
148+
/** Article-markdown body (front-matter already removed). */
149+
readonly content: string;
150+
/** Target language. */
151+
readonly lang: Language;
152+
/** Raw `executive-brief_<lang>.md` markdown when one exists on disk. */
153+
readonly localizedBriefMarkdown?: string;
154+
/** Repo-relative analysis folder, used to rewrite relative links. */
155+
readonly subfolderRepoRelPath?: string;
156+
}
157+
158+
/**
159+
* Localize the on-page executive-brief lead and strip embedded carrier
160+
* sections. See the module JSDoc for the full contract.
161+
*/
162+
export function localizeExecutiveBriefLead(
163+
input: LocalizeExecutiveBriefLeadInput,
164+
): string {
165+
const stripped = stripEmbeddedLocalizedBriefSections(input.content);
166+
167+
// English keeps the canonical `## What Happened` lead verbatim.
168+
if (input.lang === 'en') return stripped;
169+
170+
const brief = input.localizedBriefMarkdown;
171+
if (!brief || brief.trim().length === 0) return stripped;
172+
173+
const cleaned = rewriteRelativeLinks(
174+
cleanArtifactBody(brief),
175+
input.subfolderRepoRelPath ?? '',
176+
);
177+
if (cleaned.trim().length === 0) return stripped;
178+
179+
return replaceLeadSectionBody(
180+
stripped,
181+
input.lang,
182+
cleaned,
183+
input.subfolderRepoRelPath ?? '',
184+
);
185+
}

scripts/render-lib/article.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ import {
7777
} from './article-aside.js';
7878
import { enrichArticleMarkdownWithPoliticalContext } from './political-context.js';
7979
import { applyScannabilityTransforms, transformProgressiveDisclosure } from './article-scannability.js';
80+
import { localizeExecutiveBriefLead } from './article-brief-lead.js';
8081

8182
/**
8283
* CSS selectors identifying the voice-assistant TTS-readable regions of
@@ -544,7 +545,12 @@ export async function renderArticleHtml(input: RenderArticleInput): Promise<stri
544545
const modifiedIso = new Date().toISOString();
545546
const articleType = { type: articleTypeId, label: localizedArticleTypeLabel };
546547

547-
const cleanedContent = stripBodyDuplicateSections(parsed.content);
548+
const cleanedContent = localizeExecutiveBriefLead({
549+
content: stripBodyDuplicateSections(parsed.content),
550+
lang: input.lang,
551+
localizedBriefMarkdown: input.localizedBriefMarkdown,
552+
subfolderRepoRelPath: input.subfolderRepoRelPath,
553+
});
548554

549555
const enrichedContent = enrichArticleMarkdownWithPoliticalContext(cleanedContent, input.lang);
550556

0 commit comments

Comments
 (0)