You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The aggregator (see [`scripts/render-lib/aggregator.ts`](scripts/render-lib/aggregator.ts)`cleanArtifactBody`):
458
458
459
459
- Requires `executive-brief.md`.
460
460
- Inserts a `Reader Intelligence Guide` before artifact sections so public readers can find high-value analysis such as media framing and forward indicators without scanning every audit artifact.
461
461
- Strips YAML front matter from each artifact.
462
462
- Removes the first H1 from each artifact and injects its own consistent `## Section Title` heading.
463
+
-**Demotes every internal heading by one level** (`##` → `###`, `###` → `####`, …, capped at H6) before concatenation. Without this, every artifact's own H2s become siblings of the wrapper-injected `## Section Title` and the rendered article ends up with ~170 H2s and a flat outline that violates WCAG 2.4.6 ("Headings and Labels"). Headings inside fenced code blocks are not affected. **Tested by**[`tests/render-lib.test.ts > demoteHeadings`](tests/render-lib.test.ts).
464
+
-**Strips legacy `_Source: file.md_` italic preamble lines** that some artifact templates author at the top of their body. Source attribution now lives in the auto-generated [Reader Intelligence Guide](#-reader-intelligence-guide-deterministic-navigation-layer) and the [`## Article Sources` appendix](#-article-sources-appendix-canonical-source-list) — repeating it under every heading reads like a folder listing, not journalism. Inline prose mentions like *"primary source: data.riksdagen.se/…"* are preserved.
465
+
-**Normalises heading slugs** to drop leading hyphens emitted by `github-slugger` when a heading starts with a stripped character (e.g. emoji like `🎯` in `## 🎯 BLUF` slug to `-bluf` and would otherwise become `id="rm--bluf"` once the `rm-` prefix is applied). Both [`markdown.ts#rehypeSlugWithPrefix`](scripts/render-lib/markdown.ts) and [`aggregator.ts#anchorForTitle`](scripts/render-lib/aggregator.ts) collapse leading/trailing hyphens to keep heading IDs and Reader Intelligence Guide anchors in lock-step.
463
466
- Removes leading admin bylines such as `Author`, `Run ID`, `Classification`, `Confidence`, `Prepared by`, `Methodology` and similar metadata fields.
- Rewrites relative Markdown links to absolute GitHub blob URLs.
466
469
- Keeps Mermaid fences untouched so the renderer can preserve them.
470
+
- Annotates each section heading with an HTML comment of shape `<!-- source: <file> :: <github-blob-url> -->` for offline auditors. The comment is dropped by `rehype-sanitize` so it never reaches rendered HTML.
467
471
- Builds front matter with `title`, `description`, `date`, `subfolder`, `slug`, `source_folder`, `generated_at`, `language` and `layout`.
After every artifact section the aggregator emits a single `## Article Sources` H2 at the very end of the article. Each entry is a markdown list link to the artifact on GitHub:
476
+
477
+
```markdown
478
+
## Article Sources
479
+
480
+
Each section above projects one analysis artifact. The full audited markdown is available on GitHub:
This replaces the legacy per-section `_Source: file.md_` italics. Auditors get one canonical list; readers see clean prose; SEO crawlers see one trustworthy `<ul>` of primary-source links instead of 25+ duplicated italics.
488
+
469
489
### Title and description extraction
470
490
471
491
`article.md` metadata comes from `executive-brief.md`:
@@ -495,14 +515,47 @@ layout: article
495
515
---
496
516
```
497
517
498
-
It then emits deterministic sections such as `## Executive Brief`, `## Synthesis Summary`, `## Intelligence Assessment — Key Judgments`, `## Significance Scoring`, and so on. Each section includes a source attribution line like:
518
+
It then emits deterministic sections such as `## Executive Brief`, `## Synthesis Summary`, `## Intelligence Assessment — Key Judgments`, `## Significance Scoring`, and so on. Source attribution is provided by the auto-generated `## Reader Intelligence Guide` (top of article) and `## Article Sources` appendix (bottom of article); the per-section heading carries an HTML comment for offline auditors:
…artifact body content, with all internal headings demoted by one level so the outline stays semantically nested…
502
527
```
503
528
504
529
The generated first body section is `## Reader Intelligence Guide`, which is intentionally not sourced to a single artifact because it is a deterministic navigation projection of the artifact set.
Every aggregated `analysis/daily/$DATE/$SUBFOLDER/article.md` is checked by [`scripts/validate-article.ts`](scripts/validate-article.ts) — a hard, scripted CI gate that fails the build on any of the following violations:
534
+
535
+
| Rule code | What it blocks | Why it matters |
536
+
|---|---|---|
537
+
|`unresolved-placeholder`|`[REQUIRED:…]`, `AI_MUST_REPLACE`, `<insert …>`, `TBD:`, `FILL IN` strings surviving Pass-2 | Templates carry these markers on disk; if they reach `article.md` the AI agent skipped a substitution. Article is not publishable. |
538
+
|`missing-reader-guide`| Article missing `## Reader Intelligence Guide`| Aggregator-generated; if missing, the aggregator broke. |
|`bluf-too-short`| BLUF prose < 80 chars | Stub BLUFs (e.g. `TODO`, `pending`) escape Pass-2. A publishable BLUF needs actor + active verb + object + when + so-what. |
543
+
|`bluf-too-long`| BLUF prose > 1200 chars | Runaway dumps belong in Synthesis Summary or Intelligence Assessment, not the 60-second read. |
544
+
|`empty-heading-slug`| Any heading whose permissive slug is empty (e.g. emoji-only) | Empty `#anchor` would break the Reader Intelligence Guide and SERP deep-links. |
545
+
|`per-doc-missing-dok_id`| Any `### HD…`/`### FiU…` per-document subsection lacking at least one dok_id-style code in its body | Every per-document subsection must trace to a primary-source identifier; orphan sections are blocked. |
The validator is wired into `npm run validate-all` and runs as a hard CI gate after aggregation. It is **content-only** — structural projections (heading demotion, source-preamble stripping, slug normalisation) are unit-tested in [`tests/render-lib.test.ts`](tests/render-lib.test.ts); this script guards the AI-authored contribution: the artifact contents that the aggregator concatenates.
558
+
506
559
---
507
560
508
561
## 🌐 How `article.md` Becomes HTML
@@ -659,7 +712,20 @@ The rendering path is:
659
712
2.[`scripts/render-lib/markdown.ts`](scripts/render-lib/markdown.ts) rewrites them to `<pre class="mermaid">` before Markdown parsing.
660
713
3.`rehype-sanitize` allows the `pre.mermaid` class.
661
714
4.[`scripts/render-lib/chrome.ts`](scripts/render-lib/chrome.ts) includes `js/lib/mermaid-init.mjs`.
662
-
5.[`js/lib/mermaid-init.mjs`](js/lib/mermaid-init.mjs) dynamically imports Mermaid `11.4.1` from jsDelivr, initializes a dark theme and renders all Mermaid blocks after page load.
715
+
5.[`js/lib/mermaid-init.mjs`](js/lib/mermaid-init.mjs) dynamically imports Mermaid `11.4.1` from the **same-origin vendored copy under `js/lib/mermaid/`**, initializes a dark theme and renders all Mermaid blocks after page load.
716
+
717
+
The Mermaid distribution is vendored at build time:
718
+
719
+
| Step | Location | What it does |
720
+
|---|---|---|
721
+
|**Pin**|[`package.json`](package.json)`devDependencies`|`mermaid` is pinned (currently `11.4.1`) — supply-chain audited like every other dependency, in the npm SBOM. |
722
+
|**Copy**|[`scripts/copy-vendor-mermaid.ts`](scripts/copy-vendor-mermaid.ts)| Run as the first step of `prebuild` (and `predev`). Copies `node_modules/mermaid/dist/mermaid.esm.min.mjs` and its required `chunks/mermaid.esm.min/*.mjs` into `js/lib/mermaid/` (≈2.6 MB, 64 files). Sourcemaps, type declarations, mocks and other ESM variants are excluded. |
723
+
|**Gitignore**|[`.gitignore`](.gitignore)|`js/lib/mermaid/` is intentionally ignored — the directory is reproducible from the pinned dependency, so we don't commit duplicates of `node_modules` content. |
724
+
|**Bundle**|[`.github/workflows/deploy-s3.yml`](.github/workflows/deploy-s3.yml)| The "Copy JS libraries to build output" step merges the full `js/` tree (including `js/lib/mermaid/`) into `dist/js/` after the Vite build, alongside `chart.umd.4.4.1.js`, `d3.7.9.0.min.js`, etc. |
725
+
|**Deploy**|[`scripts/deploy-s3.sh`](scripts/deploy-s3.sh)|`*.mjs` files are uploaded with `Content-Type: application/javascript` and `Cache-Control: public, max-age=31536000, immutable` — same long-cache treatment as every other vendored asset. |
726
+
|**Guard**|[`tests/no-external-cdn.test.ts`](tests/no-external-cdn.test.ts)| Vitest test that fails CI if any runtime file under `js/` or any rendered article under `news/` references `cdn.jsdelivr.net`, `cdnjs.cloudflare.com`, `unpkg.com`, `esm.sh`, `cdn.skypack.dev`, or `ajax.googleapis.com`. Riksdagsmonitor serves all JavaScript from its own S3/CloudFront origin — no external CDN allowed. |
727
+
728
+
CSP impact: scripts can be allowed with `script-src 'self'` only — no third-party host needs to be added to the policy. SRI hashes for every Mermaid `.mjs` chunk are produced by `vite-plugin-sri-gen` because the files now live under the build output.
663
729
664
730
The analysis gate requires color-coded Mermaid through `style` directives or Mermaid `themeVariables` / `%%{init}` blocks.
**Note:** CSP includes `'unsafe-inline'` for Chart.js/D3.js inline styles and large inline dashboard script (946 lines). The `connect-src` directive includes `https://raw.githubusercontent.com` to allow fetching CIA CSV data from the cia repository. Security headers are configured via AWS CloudFront Response Headers Policy for the primary deployment. GitHub Pages disaster recovery inherits default GitHub Pages security headers. Future enhancement: nonce-based CSP for stricter inline script control (roadmap: 2027). Chart.js, D3.js, and chartjs-plugin-annotation are hosted locally on CloudFront (js/lib/) rather than via external CDN, eliminating external script dependencies.
388
+
**Note:** CSP includes `'unsafe-inline'` for Chart.js/D3.js inline styles and large inline dashboard script (946 lines). The `connect-src` directive includes `https://raw.githubusercontent.com` to allow fetching CIA CSV data from the cia repository. Security headers are configured via AWS CloudFront Response Headers Policy for the primary deployment. GitHub Pages disaster recovery inherits default GitHub Pages security headers. Future enhancement: nonce-based CSP for stricter inline script control (roadmap: 2027). Chart.js, D3.js, chartjs-plugin-annotation **and Mermaid**are hosted locally on CloudFront (`js/lib/`) rather than via external CDN, eliminating external script dependencies (CI-enforced by [`tests/no-external-cdn.test.ts`](tests/no-external-cdn.test.ts)).
0 commit comments