Conversation
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/d7def0ce-1b8f-42f7-a36a-fd1e5f2f175c Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…les re-rendered Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/76443d89-98d7-4768-9f63-db54a990e8b8 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: documentation,dependencies,security,html-css,javascript,translation,isms,iso-27001,nist-csf,cis-controls,performance,testing,refactor,size-xl,news Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
|
@copilot analyse and fix issues in Quality Checks / html-validation (pull_request) |
There was a problem hiding this comment.
Pull request overview
This PR hardens the rendering and validation pipeline for generated political-intelligence articles by self-hosting Mermaid (no external CDN), improving aggregator-produced HTML structure (heading hierarchy, anchors, sources), and introducing a scripted minimum-content CI gate for aggregated article.md outputs.
Changes:
- Vendor Mermaid from
node_modules/intojs/lib/mermaid/and add CI coverage to prevent external CDN references in runtime JS and rendered articles. - Improve aggregation/render quality by demoting in-body headings, stripping
_Source: …_preambles, normalizing slugs, and appending a single## Article Sourcessection. - Add
scripts/validate-article.tsand wire it intovalidate-allto enforce required article landmarks and placeholder/BLUF/per-doc constraints.
Reviewed changes
Copilot reviewed 16 out of 91 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/render-lib.test.ts |
Adds unit tests for heading demotion, source-preamble stripping, and stable anchor generation. |
tests/no-external-cdn.test.ts |
New CI guard scanning js/** and news/*.html for forbidden CDN hosts; validates Mermaid loader is same-origin. |
scripts/validate-article.ts |
New hard validator enforcing minimum content/structure rules for aggregated article.md. |
scripts/render-lib/markdown.ts |
Normalizes heading slug IDs to avoid rm--* anchors when headings start with stripped characters. |
scripts/render-lib/aggregator.ts |
Implements heading demotion + source preamble stripping + per-section source comments + appends ## Article Sources. |
scripts/copy-vendor-mermaid.ts |
New script to copy Mermaid runtime .mjs assets into js/lib/mermaid/. |
package.json |
Pins mermaid@11.4.1, adds predev/copy-vendor/validate-article, and wires validation into validate-all. |
package-lock.json |
Lockfile updates reflecting Mermaid and its dependency tree. |
js/lib/mermaid-init.mjs |
Switches Mermaid import from jsDelivr to local vendored path via import.meta.url. |
analysis/templates/README.md |
Documents the script-enforced reader-facing output contract and validator rule codes. |
analysis/daily/2026-04-21/realtime-1353/article.md |
Regenerated aggregated article reflecting new heading/source/appendix rules. |
analysis/daily/2026-04-21/evening-analysis/article.md |
Regenerated aggregated article reflecting new heading/source/appendix rules. |
analysis/daily/2026-04-20/evening-analysis/article.md |
Regenerated aggregated article reflecting new heading/source/appendix rules. |
SECURITY_ARCHITECTURE.md |
Updates CSP/local-hosting note to include Mermaid and references the new CI guard. |
Article-Generation.md |
Updates documentation to describe the new cleaning rules, sources appendix, validator, and vendored Mermaid flow. |
.gitignore |
Ignores vendored Mermaid output directory (and includes an updated comment section). |
| function stripSourcePreamble(body: string): string { | ||
| return body | ||
| .replace(/^_\s*Source:\s*\[?`[^\n]*?\n+/gim, '') | ||
| .replace(/^_\s*Source:\s*[^\n]*_\s*$\n?/gim, ''); | ||
| } |
There was a problem hiding this comment.
stripSourcePreamble() leaves a leading blank line when stripping the bare _Source: file.md_ form because the second regex only consumes a single newline (\n?). This makes the unit test case (_Source: synthesis-summary.md_\n\nbody) return \nbody and will fail CI; consider consuming all following newlines (\n+) or trimming start after removal.
| // The aggregator emits one `### <dok_id>` per per-document analysis, | ||
| // where `<dok_id>` is a riksdagen identifier such as `HD12345` or | ||
| // `FiU17`. After in-body heading demotion (`### Document summary`, | ||
| // `### Classification`, …) every other `### …` heading inside the | ||
| // section body is *content*, not a new per-document boundary. | ||
| // We therefore split only on H3 headings whose text matches a | ||
| // dok_id-shaped token — everything between two such headings (or | ||
| // from the last one to end-of-region) is one section's body. | ||
| const DOK_ID_HEADING = /^###\s+(H[A-Z0-9]{6,10}|[A-ZÅÄÖ]{1,4}\d{4,8})\s*$/m; | ||
| let cursor = region; | ||
| // First pass — anchor to the first dok_id heading. | ||
| let m = cursor.match(DOK_ID_HEADING); | ||
| while (m && m.index !== undefined) { | ||
| const id = m[1]!; | ||
| const after = cursor.slice(m.index + m[0].length); | ||
| const next = after.match(DOK_ID_HEADING); | ||
| const body = next && next.index !== undefined ? after.slice(0, next.index) : after; | ||
| sections.push({ id, body }); | ||
| if (!next || next.index === undefined) break; | ||
| cursor = after.slice(next.index); | ||
| m = cursor.match(DOK_ID_HEADING); |
There was a problem hiding this comment.
extractPerDocumentSections() only treats H3 headings that are a single dok_id token as section boundaries. The corpus already contains composite per-document headings like ### HD01CU27\-CU28 (e.g. analysis/daily/2026-04-17/realtime-1434/article.md), which this regex won’t match—so those sections won’t be validated for dok_id citations. Consider broadening DOK_ID_HEADING to accept composite IDs (hyphen/slash separated, and optional markdown escapes) or splitting on any H3 that starts with a dok_id token.
… to keep dedup state consistent Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/36e3fe38-a1d5-4826-9bb8-dc66c0df589b Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Root cause: my Phase-1 post-trim of leading hyphens in heading slugs desynchronised Fix in c5d7d6d: pre-clean the heading text (strip leading non-letter/number chars via Local results: |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Mermaid self-host (already shipped on this branch)
package.json(mermaid@11.4.1), vendored tojs/lib/mermaid/byscripts/copy-vendor-mermaid.tsduringprebuild/predevjs/lib/mermaid-init.mjs) imports from same-origin relative pathtests/no-external-cdn.test.tsblocks any external CDN host injs/ornews/*.htmlArticle-Generation.md+SECURITY_ARCHITECTURE.mdupdated to reflect self-hosted MermaidArticle quality — Phase 1
_Source:preamble strip,## Article Sourcesappendix, slug normalisation)scripts/validate-article.tsminimum-content validator (9 rule codes) wired intonpm run validate-allArticle-Generation.md+analysis/templates/README.mdCI fix — HTMLHint
id-uniqueregression (this commit)The Phase-1 heading demotion exposed a latent slugger-dedup bug:
### 📜 Sourcesslugged viagithub-sluggerto-sources; my post-trim collapsed it tosourcesbut the slugger's internal state still recorded-sources. A later### Sourcestherefore gotsources(notsources-1) and the rendered HTML had twoid="rm-sources"attributes, failing HTMLHint'sid-uniquerule on 10 files / 16 errors./^[^\p{L}\p{N}]+/u) before passing toslugger.slug(), so the slugger sees the cleaned form and its dedup-suffix state stays consistent. Same fix applied toaggregator.ts#anchorForTitleso Reader Intelligence Guide anchors agree.tests/render-lib.test.tsreproducing the exact CI failure (📜 Sources+Sources,🔒 Confidence Profile+Confidence Profile).htmlhint *.html news/*.html: 2798 files, 0 errors (was 2784 files, 16 errors in 10 files)npm run validate-article: 27 / 27 passOut of scope (follow-up phases)
📰 Reader Lede,🧾 Key Facts Strip, pull quotes, timeline JSON)news-*.mdworkflows