Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,18 @@ builds/
# Pass-1 snapshots used by the analysis gate (see .github/prompts/05-analysis-gate.md)
analysis/daily/*/*/pass1/

# SEO metadata backfill diff reports (see analysis/metadata-backfill/README.md and
# .github/prompts/seo-metadata-contract.md §6). Intentionally NOT ignored —
# the dry-run CSV is committed so PRs 3 / 4 / 5 can consume it deterministically
# and reviewers can inspect tier classification + violation codes in-place.
# SEO metadata backfill diff reports (see analysis/metadata-backfill/README.md and
# .github/prompts/seo-metadata-contract.md §6). Intentionally NOT ignored —
# the dry-run CSV is committed so PRs 3 / 4 / 5 can consume it deterministically
# and reviewers can inspect tier classification + violation codes in-place.
!analysis/metadata-backfill/*.csv
!analysis/metadata-backfill/README.md

# Vendored Mermaid distribution copied from node_modules by
# scripts/copy-vendor-mermaid.ts during `prebuild`. Reproducible from the
# pinned `mermaid` devDependency in package.json — do not commit.
js/lib/mermaid/
74 changes: 70 additions & 4 deletions Article-Generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -454,18 +454,38 @@ npx tsx scripts/aggregate-analysis.ts --all

### Cleaning and transformation rules

The aggregator:
The aggregator (see [`scripts/render-lib/aggregator.ts`](scripts/render-lib/aggregator.ts) `cleanArtifactBody`):

- Requires `executive-brief.md`.
- Inserts a `Reader Intelligence Guide` before artifact sections so public readers can find high-value analysis such as media framing and forward indicators without scanning every audit artifact.
- Strips YAML front matter from each artifact.
- Removes the first H1 from each artifact and injects its own consistent `## Section Title` heading.
- **Demotes every internal heading by one level** (`##` → `###`, `###` → `####`, …, capped at H6) before concatenation. Without this, every artifact's own H2s become siblings of the wrapper-injected `## Section Title` and the rendered article ends up with ~170 H2s and a flat outline that violates WCAG 2.4.6 ("Headings and Labels"). Headings inside fenced code blocks are not affected. **Tested by** [`tests/render-lib.test.ts > demoteHeadings`](tests/render-lib.test.ts).
- **Strips legacy `_Source: file.md_` italic preamble lines** that some artifact templates author at the top of their body. Source attribution now lives in the auto-generated [Reader Intelligence Guide](#-reader-intelligence-guide-deterministic-navigation-layer) and the [`## Article Sources` appendix](#-article-sources-appendix-canonical-source-list) — repeating it under every heading reads like a folder listing, not journalism. Inline prose mentions like *"primary source: data.riksdagen.se/…"* are preserved.
- **Normalises heading slugs** to drop leading hyphens emitted by `github-slugger` when a heading starts with a stripped character (e.g. emoji like `🎯` in `## 🎯 BLUF` slug to `-bluf` and would otherwise become `id="rm--bluf"` once the `rm-` prefix is applied). Both [`markdown.ts#rehypeSlugWithPrefix`](scripts/render-lib/markdown.ts) and [`aggregator.ts#anchorForTitle`](scripts/render-lib/aggregator.ts) collapse leading/trailing hyphens to keep heading IDs and Reader Intelligence Guide anchors in lock-step.
- Removes leading admin bylines such as `Author`, `Run ID`, `Classification`, `Confidence`, `Prepared by`, `Methodology` and similar metadata fields.
- Removes trailing `Document control`, `Audit trail`, `Generated by`, template footer and `Pass 2` self-audit sections.
- Rewrites relative Markdown links to absolute GitHub blob URLs.
- Keeps Mermaid fences untouched so the renderer can preserve them.
- Annotates each section heading with an HTML comment of shape `<!-- source: <file> :: <github-blob-url> -->` for offline auditors. The comment is dropped by `rehype-sanitize` so it never reaches rendered HTML.
- Builds front matter with `title`, `description`, `date`, `subfolder`, `slug`, `source_folder`, `generated_at`, `language` and `layout`.

### 📚 Article Sources appendix (canonical source list)

After every artifact section the aggregator emits a single `## Article Sources` H2 at the very end of the article. Each entry is a markdown list link to the artifact on GitHub:

```markdown
## Article Sources

Each section above projects one analysis artifact. The full audited markdown is available on GitHub:

- [`executive-brief.md`](https://github.com/Hack23/riksdagsmonitor/blob/main/analysis/daily/.../executive-brief.md)
- [`synthesis-summary.md`](https://github.com/.../synthesis-summary.md)
- …
```

This replaces the legacy per-section `_Source: file.md_` italics. Auditors get one canonical list; readers see clean prose; SEO crawlers see one trustworthy `<ul>` of primary-source links instead of 25+ duplicated italics.

### Title and description extraction

`article.md` metadata comes from `executive-brief.md`:
Expand Down Expand Up @@ -495,14 +515,47 @@ layout: article
---
```

It then emits deterministic sections such as `## Executive Brief`, `## Synthesis Summary`, `## Intelligence Assessment — Key Judgments`, `## Significance Scoring`, and so on. Each section includes a source attribution line like:
It then emits deterministic sections such as `## Executive Brief`, `## Synthesis Summary`, `## Intelligence Assessment — Key Judgments`, `## Significance Scoring`, and so on. Source attribution is provided by the auto-generated `## Reader Intelligence Guide` (top of article) and `## Article Sources` appendix (bottom of article); the per-section heading carries an HTML comment for offline auditors:

```markdown
_Source: [`executive-brief.md`](https://github.com/Hack23/riksdagsmonitor/blob/main/analysis/daily/2026-04-24/interpellations/executive-brief.md)_
## Executive Brief
<!-- source: executive-brief.md :: https://github.com/Hack23/riksdagsmonitor/blob/main/analysis/daily/2026-04-24/interpellations/executive-brief.md -->

### 🎯 BLUF

…artifact body content, with all internal headings demoted by one level so the outline stays semantically nested…
```

The generated first body section is `## Reader Intelligence Guide`, which is intentionally not sourced to a single artifact because it is a deterministic navigation projection of the artifact set.

### ✅ Article minimum-content validator (`scripts/validate-article.ts`)

Every aggregated `analysis/daily/$DATE/$SUBFOLDER/article.md` is checked by [`scripts/validate-article.ts`](scripts/validate-article.ts) — a hard, scripted CI gate that fails the build on any of the following violations:

| Rule code | What it blocks | Why it matters |
|---|---|---|
| `unresolved-placeholder` | `[REQUIRED:…]`, `AI_MUST_REPLACE`, `<insert …>`, `TBD:`, `FILL IN` strings surviving Pass-2 | Templates carry these markers on disk; if they reach `article.md` the AI agent skipped a substitution. Article is not publishable. |
| `missing-reader-guide` | Article missing `## Reader Intelligence Guide` | Aggregator-generated; if missing, the aggregator broke. |
| `missing-executive-brief` | Article missing `## Executive Brief` H2 | Required artifact malformed. |
| `missing-bluf` | No `BLUF` heading anywhere | Editorial product cannot ship without a Bottom-Line-Up-Front. |
| `missing-sources-appendix` | Article missing `## Article Sources` | Aggregator-generated; if missing, re-aggregate. |
| `bluf-too-short` | BLUF prose < 80 chars | Stub BLUFs (e.g. `TODO`, `pending`) escape Pass-2. A publishable BLUF needs actor + active verb + object + when + so-what. |
| `bluf-too-long` | BLUF prose > 1200 chars | Runaway dumps belong in Synthesis Summary or Intelligence Assessment, not the 60-second read. |
| `empty-heading-slug` | Any heading whose permissive slug is empty (e.g. emoji-only) | Empty `#anchor` would break the Reader Intelligence Guide and SERP deep-links. |
| `per-doc-missing-dok_id` | Any `### HD…`/`### FiU…` per-document subsection lacking at least one dok_id-style code in its body | Every per-document subsection must trace to a primary-source identifier; orphan sections are blocked. |

**Run locally:**

```bash
# Validate every aggregated article in the repo:
npm run validate-article

# Validate a single article:
npx tsx scripts/validate-article.ts analysis/daily/2026-04-24/interpellations/article.md
```

The validator is wired into `npm run validate-all` and runs as a hard CI gate after aggregation. It is **content-only** — structural projections (heading demotion, source-preamble stripping, slug normalisation) are unit-tested in [`tests/render-lib.test.ts`](tests/render-lib.test.ts); this script guards the AI-authored contribution: the artifact contents that the aggregator concatenates.

---

## 🌐 How `article.md` Becomes HTML
Expand Down Expand Up @@ -659,7 +712,20 @@ The rendering path is:
2. [`scripts/render-lib/markdown.ts`](scripts/render-lib/markdown.ts) rewrites them to `<pre class="mermaid">` before Markdown parsing.
3. `rehype-sanitize` allows the `pre.mermaid` class.
4. [`scripts/render-lib/chrome.ts`](scripts/render-lib/chrome.ts) includes `js/lib/mermaid-init.mjs`.
5. [`js/lib/mermaid-init.mjs`](js/lib/mermaid-init.mjs) dynamically imports Mermaid `11.4.1` from jsDelivr, initializes a dark theme and renders all Mermaid blocks after page load.
5. [`js/lib/mermaid-init.mjs`](js/lib/mermaid-init.mjs) dynamically imports Mermaid `11.4.1` from the **same-origin vendored copy under `js/lib/mermaid/`**, initializes a dark theme and renders all Mermaid blocks after page load.

The Mermaid distribution is vendored at build time:

| Step | Location | What it does |
|---|---|---|
| **Pin** | [`package.json`](package.json) `devDependencies` | `mermaid` is pinned (currently `11.4.1`) — supply-chain audited like every other dependency, in the npm SBOM. |
| **Copy** | [`scripts/copy-vendor-mermaid.ts`](scripts/copy-vendor-mermaid.ts) | Run as the first step of `prebuild` (and `predev`). Copies `node_modules/mermaid/dist/mermaid.esm.min.mjs` and its required `chunks/mermaid.esm.min/*.mjs` into `js/lib/mermaid/` (≈2.6 MB, 64 files). Sourcemaps, type declarations, mocks and other ESM variants are excluded. |
| **Gitignore** | [`.gitignore`](.gitignore) | `js/lib/mermaid/` is intentionally ignored — the directory is reproducible from the pinned dependency, so we don't commit duplicates of `node_modules` content. |
| **Bundle** | [`.github/workflows/deploy-s3.yml`](.github/workflows/deploy-s3.yml) | The "Copy JS libraries to build output" step merges the full `js/` tree (including `js/lib/mermaid/`) into `dist/js/` after the Vite build, alongside `chart.umd.4.4.1.js`, `d3.7.9.0.min.js`, etc. |
| **Deploy** | [`scripts/deploy-s3.sh`](scripts/deploy-s3.sh) | `*.mjs` files are uploaded with `Content-Type: application/javascript` and `Cache-Control: public, max-age=31536000, immutable` — same long-cache treatment as every other vendored asset. |
| **Guard** | [`tests/no-external-cdn.test.ts`](tests/no-external-cdn.test.ts) | Vitest test that fails CI if any runtime file under `js/` or any rendered article under `news/` references `cdn.jsdelivr.net`, `cdnjs.cloudflare.com`, `unpkg.com`, `esm.sh`, `cdn.skypack.dev`, or `ajax.googleapis.com`. Riksdagsmonitor serves all JavaScript from its own S3/CloudFront origin — no external CDN allowed. |

CSP impact: scripts can be allowed with `script-src 'self'` only — no third-party host needs to be added to the policy. SRI hashes for every Mermaid `.mjs` chunk are produced by `vite-plugin-sri-gen` because the files now live under the build output.

The analysis gate requires color-coded Mermaid through `style` directives or Mermaid `themeVariables` / `%%{init}` blocks.

Expand Down
2 changes: 1 addition & 1 deletion SECURITY_ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy: geolocation=(), microphone=(), camera=()
```

**Note:** CSP includes `'unsafe-inline'` for Chart.js/D3.js inline styles and large inline dashboard script (946 lines). The `connect-src` directive includes `https://raw.githubusercontent.com` to allow fetching CIA CSV data from the cia repository. Security headers are configured via AWS CloudFront Response Headers Policy for the primary deployment. GitHub Pages disaster recovery inherits default GitHub Pages security headers. Future enhancement: nonce-based CSP for stricter inline script control (roadmap: 2027). Chart.js, D3.js, and chartjs-plugin-annotation are hosted locally on CloudFront (js/lib/) rather than via external CDN, eliminating external script dependencies.
**Note:** CSP includes `'unsafe-inline'` for Chart.js/D3.js inline styles and large inline dashboard script (946 lines). The `connect-src` directive includes `https://raw.githubusercontent.com` to allow fetching CIA CSV data from the cia repository. Security headers are configured via AWS CloudFront Response Headers Policy for the primary deployment. GitHub Pages disaster recovery inherits default GitHub Pages security headers. Future enhancement: nonce-based CSP for stricter inline script control (roadmap: 2027). Chart.js, D3.js, chartjs-plugin-annotation **and Mermaid** are hosted locally on CloudFront (`js/lib/`) rather than via external CDN, eliminating external script dependencies (CI-enforced by [`tests/no-external-cdn.test.ts`](tests/no-external-cdn.test.ts)).

**Control Mapping:**
- ISO 27001: A.13.1 Network Security Management
Expand Down
Loading
Loading