Skip to content

Commit 7be28fd

Browse files
authored
Merge pull request #2529 from Hack23/copilot/analyze-workflows-and-fix-issues
Enforce English-only analysis artifacts; render non-EN via executive-brief cascade
2 parents 76b1af9 + 94c3ef9 commit 7be28fd

39 files changed

Lines changed: 1078 additions & 388 deletions

.github/prompts/00-base-contract.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,13 @@ Do not add per-phase checkpoint PRs or repo-memory push steps.
7373

7474
## Language & formatting
7575

76+
### Output language — English only
77+
78+
- **All analysis artifacts under `analysis/daily/**/` MUST be authored in English prose**, including all 23 always-on artifacts (Family A/B/C/D), `documents/{dok_id}-analysis.md` (Family E) and any supplementary `*.md` file the aggregator concatenates into `article.md`.
79+
- Swedish-source quotes, document titles, party/agency names and other proper nouns are preserved verbatim with attribution (`Riksdagen`, `Regeringen`, `Skatteverket`, party acronyms, `dok_id` URLs, etc.). Native UTF-8 (`ö ä å`) is required for those tokens.
80+
- The **only translated artifacts** are `analysis/daily/$DATE/$SUB/executive-brief_<lang>.md` for the 13 non-English target languages. They are produced exclusively by the dedicated `news-translate` workflow and consumed at render-time via the localized-brief cascade in `scripts/render-lib/article-merge.ts` (`mergeLocalizedWithEnglish`) + `scripts/render-lib/aggregator/seo/localized-brief.ts`. Per-type workflows MUST NOT write `executive-brief_<lang>.md` and MUST NOT write `article.<lang>.md` (the latter is now forbidden — see below).
81+
- Non-English HTML pages (`news/$DATE-$SUB-<lang>.html`) are rendered by composing the English `article.md` body with the localized executive-brief overlay; no per-language article-body translation is performed any more.
82+
7683
- Native UTF-8 throughout (`ö`, `ä`, `å`). Never use HTML entities.
7784
- Author byline: `James Pether Sörling`.
7885
- Mermaid diagrams in analysis `.md` files must include colour-coded `style` directives.

.github/prompts/03-data-download.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ echo "IMPROVEMENT_MODE=$IMPROVEMENT_MODE (required artifacts: $PRESENT present
8787
| `IMPROVEMENT_MODE` | Behaviour |
8888
|--------------------|-----------|
8989
| `false` | First generation for this `$ARTICLE_DATE` + `$SUBFOLDER` (the full 23 required artifacts are **not** all present **and** no `synthesis-summary.md` baseline exists). Some required artifacts may still already be on disk from a partial prior run; that still remains first-generation unless either all 23 artifacts are present **or** `synthesis-summary.md` exists. Continue with the full pipeline below → `04-analysis-pipeline.md` (Pass 1 + Pass 2) → `05-analysis-gate.md``06-article-generation.md` (aggregate + render) → `07-commit-and-pr.md`. |
90-
| `true` | Prior analysis exists — either all 23 required artifacts are present, **or** at least `synthesis-summary.md` is on disk as a usable baseline from a partial prior run. **Do not skip and do not no-op.** Re-run the download script to pick up any new `dok_id`s, then enter **improvement mode** in `04-analysis-pipeline.md` — read every existing artifact back, fill any missing required artifacts, extend the rest with new evidence / new documents / sharper judgments / closed gaps, run a mandatory Pass 2 read-back, then **always** re-aggregate, re-translate any non-English `article.<lang>.md` whose source `article.md` changed, and re-render `news/$ARTICLE_DATE-$SUBFOLDER-{en,sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html` (all 14 languages). The run still produces exactly one PR. |
90+
| `true` | Prior analysis exists — either all 23 required artifacts are present, **or** at least `synthesis-summary.md` is on disk as a usable baseline from a partial prior run. **Do not skip and do not no-op.** Re-run the download script to pick up any new `dok_id`s, then enter **improvement mode** in `04-analysis-pipeline.md` — read every existing artifact back, fill any missing required artifacts, extend the rest with new evidence / new documents / sharper judgments / closed gaps, run a mandatory Pass 2 read-back, then **always** re-aggregate `article.md` (English only) and re-render `news/$ARTICLE_DATE-$SUBFOLDER-{en,sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html` (all 14 languages) via the localized executive-brief cascade. Per-language Markdown `article.<lang>.md` files MUST NOT be produced — they are forbidden by `scripts/validate-file-ownership.ts` (see `00-base-contract.md §Output language — English only` and `06-article-generation.md §Step 2`). The run still produces exactly one PR. |
9191

9292
> **Folder reuse rule**: the same `$ANALYSIS_DIR` is always reused across runs for the same `$ARTICLE_DATE` + `$SUBFOLDER` when `force_generation=false`. The legacy auto-suffix behaviour (`propositions-2`, `propositions-3`, …) is retained **only** as an explicit escape hatch when `force_generation=true`, so that a forced rerun on a merged day can produce a fresh parallel analysis without trampling the existing one.
9393

.github/prompts/04-analysis-pipeline.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
Analysis is the **primary product**. Articles are derived from analysis. Never write an article before analysis is complete.
44

5+
## Output language
6+
7+
Every artifact you produce in this module is authored in **English prose**. Swedish proper nouns, source titles, party/agency names and direct quotes from `Riksdagen`/`Regeringen` documents are preserved verbatim with attribution — but headings, body paragraphs, table cells, bullet text, Mermaid node labels and analytical commentary are written in English. There is no per-language analysis. The single localized output channel is `executive-brief_<lang>.md`, produced by the separate `news-translate` workflow and consumed at render-time by the existing localized-brief cascade — see `00-base-contract.md §Output language — English only`.
8+
59
Authoritative methodology & templates:
610

711
- **Read-me-first**[`analysis/methodologies/artifact-catalog.md`](../../analysis/methodologies/artifact-catalog.md) (single source of truth for every artifact — family, template, depth floor, Mermaid type, MCP data source, gate check) and [`analysis/methodologies/per-artifact-methodologies.md`](../../analysis/methodologies/per-artifact-methodologies.md) (Inputs / Analytic-moves / Evidence-rules / Anti-patterns per artifact). Open these before any framework-specific methodology.

.github/prompts/05-analysis-gate.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,11 @@ This is the **only** gate separating analysis from article generation. If it fai
2222
10. **Top-2 full-text availability** — when `data-download-manifest.md` contains a `## Full-Text Fetch Outcomes` table, ≥ 2 top documents must have `full_text_available=true`. Add `<!-- full-text-fallback: <reason> -->` to bypass.
2323
11. **Supplementary artifacts** — see §Supplementary checks (blocking for aggregation/Tier-C/multi-run).
2424
12. **Editorial QA gate** — after aggregation, run `npx tsx scripts/validate-article.ts $ANALYSIS_DIR/article.md` (enforces banned-phrase scan, citation density per `reference-quality-thresholds.json → aiFirst.citationDensity.perArticle`, and `economicProvenance` ≤ 6-month vintage unless wrapped in `<!-- stale-vintage: reason -->`). See `validate-article.ts` checks 7–9.
25+
13. **Analysis language** — all analysis artifacts (excluding `executive-brief_<lang>.md`) must be authored in English. Run `npx tsx scripts/check-analysis-language.ts $ANALYSIS_DIR`; fails when Swedish-marker density > 5 % AND ≥ 5 markers.
2526

2627
## Implementation
2728

28-
No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–11, plus conditional check 9b where applicable):
29+
No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–13, plus conditional check 9b where applicable). Check 12 invokes `scripts/validate-article.ts` when `article.md` is already present (after aggregation); Check 13 invokes `scripts/check-analysis-language.ts`:
2930

3031
```bash
3132
set -Eeuo pipefail
@@ -340,6 +341,30 @@ if [ -s "$MANIFEST" ] && grep -q "## Full-Text Fetch Outcomes" "$MANIFEST" \
340341
|| { echo "❌ data-download-manifest.md: Full-Text Fetch Outcomes table present but fewer than 2 top documents have full_text_available=true (found ${FT_SUCCESS:-0}). Add <!-- full-text-fallback: <reason> --> to bypass."; FAIL=1; }
341342
fi
342343

344+
# Check 12 — Editorial QA gate (validate-article.ts: banned phrases, citation density, vintage discipline).
345+
# Runs against the aggregated article.md when present; if the aggregator hasn't run yet the
346+
# article gate is informational (logged), because the editorial validator's domain is post-aggregation.
347+
ART_MD_GATE="$ANALYSIS_DIR/article.md"
348+
if [ -s "$ART_MD_GATE" ]; then
349+
if command -v npx >/dev/null 2>&1; then
350+
npx tsx scripts/validate-article.ts "$ART_MD_GATE" || FAIL=1
351+
else
352+
echo "⚠️ Check 12 (editorial QA): npx not found — skipping (non-blocking)"
353+
fi
354+
else
355+
echo "ℹ️ Check 12 (editorial QA): $ART_MD_GATE not yet produced — skipped (run after aggregator)"
356+
fi
357+
358+
# Check 13 — Analysis language (English-only)
359+
# Block when any analysis artifact (excluding executive-brief_<lang>.md translation siblings)
360+
# exceeds the Swedish-density threshold. The script exits 0 on pass and exits 1 with a
361+
# per-file violation list on fail.
362+
if command -v npx >/dev/null 2>&1; then
363+
npx tsx scripts/check-analysis-language.ts "$ANALYSIS_DIR" || FAIL=1
364+
else
365+
echo "⚠️ Check 13 (analysis language): npx not found — skipping (non-blocking)"
366+
fi
367+
343368
[ "$FAIL" -eq 0 ] || exit 1
344369
```
345370

@@ -353,7 +378,11 @@ Exit code 0 = pass, non-zero = fail with per-check report. Precondition for chec
353378

354379
## Re-run / deduplication note
355380

356-
Same-day re-runs are **improvement runs** (not skip runs) when `03-data-download.md §Pre-flight` detects a reusable baseline (all 23 artifacts present **or** at least `synthesis-summary.md` on disk) and sets `IMPROVEMENT_MODE=true`. Existing rendered HTML under `news/` does **not** establish improvement mode — the router keys off analysis baselines, not HTML. On improvement runs, the pipeline runs in extend-and-improve mode (`04-analysis-pipeline.md §Execution order`), the gate runs normally, and `06-article-generation.md` **always** regenerates `article.md` + `article.<lang>.md` × 13 + `news/$ARTICLE_DATE-$SUBFOLDER-{en,sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html`. There is still exactly one PR call. **Never** call `safeoutputs___noop` because today's HTML "already exists" — existing HTML is a reason to regenerate, not to exit early.
381+
Same-day re-runs are **improvement runs** (not skip runs) when `03-data-download.md §Pre-flight` detects a reusable baseline (all 23 artifacts present **or** at least `synthesis-summary.md` on disk) and sets `IMPROVEMENT_MODE=true`. Existing rendered HTML under `news/` does **not** establish improvement mode — the router keys off analysis baselines, not HTML. On improvement runs, the pipeline runs in extend-and-improve mode (`04-analysis-pipeline.md §Execution order`), the gate runs normally, and `06-article-generation.md` **always** regenerates `article.md` + `news/$ARTICLE_DATE-$SUBFOLDER-{en,sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html` (all 14 languages via the localized executive-brief cascade — see `TRANSLATION_GUIDE.md §News articles are translated out-of-band`). **Per-language Markdown `article.<lang>.md` files MUST NOT be produced** on improvement runs — they are rejected by `scripts/validate-file-ownership.ts` (forbidden artefact, see `06-article-generation.md §Step 2`). There is still exactly one PR call. **Never** call `safeoutputs___noop` because today's HTML "already exists" — existing HTML is a reason to regenerate, not to exit early.
382+
383+
### Check 12 ordering note
384+
385+
Check 12 (`scripts/validate-article.ts`) is the **editorial QA gate** and runs on the aggregated `article.md`. The blocking branch in §Implementation only fires when `article.md` is already on disk; the inline gate runs before aggregation, so on a first pass the article validator is **informational** (the gate logs `ℹ️ Check 12 (editorial QA): … skipped (run after aggregator)`). Workflows MUST re-invoke the gate (or call `npx tsx scripts/validate-article.ts $ANALYSIS_DIR/article.md` directly) **after** `scripts/aggregate-analysis.ts` writes `article.md` so the editorial checks (banned phrases, citation density, `economicProvenance` vintage) become blocking before staging. See `06-article-generation.md §Step 1b — Editorial QA re-check (post-aggregation)` for the post-aggregation invocation pattern.
357386

358387
## Supplementary checks
359388

.github/prompts/06-article-generation.md

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -61,29 +61,24 @@ Any heading like `## Pass 2 …` / `### Pass 2 refinements` / `## 🔁 Pass 2 ad
6161

6262
If a required artifact is missing the aggregator aborts with a non-zero exit code — return to `04-analysis-pipeline.md` and produce the missing file; do **not** hand-edit `article.md`.
6363

64-
### Step 2Translate `article.md` to every non-English language
64+
#### Step 1bEditorial QA re-check (post-aggregation)
6565

66-
Before rendering, the agent **SHOULD** produce a per-language Markdown sibling for every supported non-English language. The translation surface is the same canonical `article.md`; the renderer picks up `article.<lang>.md` automatically when it exists, and falls back to the English source otherwise — so any missing sibling temporarily degrades that language's HTML to English content under a non-English `<html lang>`. This fallback is acceptable as a **temporary** state within a single run's time budget. The `news-translate` workflow does **not** repair `article.<lang>.md` — its mission is the executive-brief markdown pipeline (`executive-brief.md``executive-brief_<lang>.md`). If `article.<lang>.md` is missing, the next scheduled per-type run regenerates the whole article (including translations) from fresh analysis.
66+
The inline analysis gate (`05-analysis-gate.md` Check 12) runs **before** aggregation, so the editorial validator is informational on the first pass. Immediately after `aggregate-analysis.ts` writes `article.md`, re-invoke the article validator to make the editorial checks (banned phrases, citation density per `reference-quality-thresholds.jsonaiFirst.citationDensity.perArticle`, `economicProvenance` ≤ 6-month vintage) blocking before staging:
6767

68-
Target languages (13 — every supported language except `en`):
69-
70-
```text
71-
sv da no fi de fr es nl ar he ja ko zh
68+
```bash
69+
npx tsx scripts/validate-article.ts \
70+
"analysis/daily/$ARTICLE_DATE/$SUBFOLDER/article.md"
7271
```
7372

74-
For each target language the agent produces:
73+
Non-zero exit ⇒ fix the offending claims in the upstream analysis artifacts (never hand-edit `article.md`), re-run `aggregate-analysis.ts`, and re-validate.
7574

76-
`analysis/daily/$ARTICLE_DATE/$SUBFOLDER/article.<lang>.md`
75+
### Step 2 — (No-op) Per-language Markdown translation is no longer performed
7776

78-
Translation contract:
77+
Per-type workflows do **not** produce `article.<lang>.md` for any non-English language. The agent stops after writing the canonical English `article.md` from Step 1. Non-English HTML pages are produced by `scripts/render-articles.ts` via the localized executive-brief cascade — the renderer keeps the detailed **article body in English** and only swaps in the localized hero/SEO overlay from `executive-brief_<lang>.md` (H1, dek, BLUF, JSON-LD `headline`/`description`, `<title>`, `<meta name="description">`, `og:title`/`og:description`). The `<html lang>` / `JSON-LD inLanguage` attributes are forced to the target language even though the body prose remains English. See `scripts/render-lib/article-merge.ts` (`mergeLocalizedWithEnglish`) for the merge contract.
7978

80-
- Translate the body prose, headings and table cells.
81-
- **Preserve verbatim**: YAML front-matter values that are identifiers (`subfolder`, `slug`, `source_folder`, `dok_id` references, file paths, GitHub URLs), Mermaid code fences, JSON code blocks, numeric values, and Schema.org / dataflow / dataset identifiers. Update `language:` in the front-matter to the target language code.
82-
- Keep Swedish political terminology in Swedish where it is the proper noun (party names, committee names, document type acronyms, Riksdagsmonitor brand).
83-
- For Arabic (`ar`) and Hebrew (`he`) the chrome handles `dir="rtl"` automatically — do not add inline direction overrides.
84-
- Keep IMF / SCB / WB / Statskontoret citation blocks intact, including `economicProvenance` JSON.
79+
> ⚠️ **Workflow ordering**: per-type workflows render HTML during the same run that produces the English `executive-brief.md`. The dedicated `news-translate` workflow runs on a separate schedule and back-fills `executive-brief_<lang>.md` *after the fact*. On the first HTML render the cascade therefore falls through to the English brief title/description for every non-EN language (`language: <lang>` is still forced so `<html lang>` / JSON-LD `inLanguage` are correct). The newly translated briefs only appear in the localized HTML on the **next** per-type re-render of the same subfolder (e.g. the next scheduled run, a `force_generation=true` re-run, or an explicit `npm run render-articles`). This is intentional — `news-translate` is **forbidden from touching `news/*.html`** (see `validate-file-ownership.ts`) to keep the file-ownership contract free of merge conflicts.
8580
86-
If the time budget is exhausted before every language is translated, ship whatever has been produced — temporary English fallback for missing languages is acceptable. The next scheduled per-type run regenerates the article (including translations) from fresh analysis; the `news-translate` workflow is **not** responsible for `article.<lang>.md` repair (its mission is `executive-brief_<lang>.md`). **Never commit a half-translated file** — either the language is fully translated or the renderer falls back to the English source for that slot. **Never stage `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/executive-brief_<lang>.md` from a per-type workflow** — those files are exclusively owned by `news-translate`.
81+
Any historical `article.<lang>.md` left in the repo is treated as forbidden artifacts by `scripts/validate-file-ownership.ts` (category-independent reject) — per-type workflows must never recreate them.
8782

8883
### Step 3 — Render
8984

@@ -96,7 +91,7 @@ npx tsx scripts/render-articles.ts \
9691

9792
What the renderer does:
9893

99-
1. Reads `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/article.md` and, for each requested language, prefers `article.<lang>.md` when it exists.
94+
1. Reads `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/article.md`. For each non-English language it composes the English body with the localized executive brief (`executive-brief_<lang>.md`) via `mergeLocalizedWithEnglish` when present; historical `article.<lang>.md` files are intentionally **ignored** by the renderer (forbidden artefact — see `validate-file-ownership.ts`).
10095
2. Parses it through the `unified``remark-parse``remark-gfm``remark-rehype``rehype-raw``rehype-slug``rehype-autolink-headings``rehype-sanitize``rehype-stringify` pipeline. Mermaid ```` ```mermaid ```` fences are preserved as `<pre class="mermaid">` and upgraded to SVG client-side by `js/lib/mermaid-init.mjs`.
10196
3. Wraps the sanitised body in the shared site chrome (`scripts/render-lib/index.ts:buildChrome`): full `<head>` with hreflang × all 14 supported languages, Open Graph / Twitter / JSON-LD `NewsArticle` (with `isBasedOn` citing every source artifact), cyberpunk header with skip-link + language switcher, article dek + provenance badges, footer with "Analysis sources" block linking every source `.md` / `.json` artifact under the source folder back to GitHub. Generated `article.md`, translated `article.<lang>.md`, and `pass1/` snapshots are excluded from the public source list.
10297
4. Writes one HTML file per supported language — **always all 14**:

0 commit comments

Comments
 (0)