diff --git a/.github/prompts/05-analysis-gate.md b/.github/prompts/05-analysis-gate.md index 5c5a99befa..994c013577 100644 --- a/.github/prompts/05-analysis-gate.md +++ b/.github/prompts/05-analysis-gate.md @@ -30,10 +30,11 @@ This is the **only** gate separating analysis from article generation. If it fai 8. **Family D structure checks**: - `forward-indicators.md` declares **≥ 10 dated indicators** (bullet or table rows matching a date pattern across the four horizon sections). - `coalition-mathematics.md` contains a seat-count table (≥ 1 table row with `Ja`/`Nej`/`Avstår` or a party-to-seats mapping). + - `implementation-feasibility.md` — when it names a recognised agency (Kriminalvården, Polismyndigheten, Försäkringskassan, Skatteverket, Migrationsverket, Arbetsförmedlingen, Socialstyrelsen, Transportstyrelsen, Trafikverket, Naturvårdsverket, Energimyndigheten) — contains a `statskontoret.se` URL citation **or** the literal phrase `none found` in the `Statskontoret relevance` row. ## Implementation -No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–9, with check 9 conditional where applicable): +No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–9, plus conditional check 9b where applicable): ```bash set -Eeuo pipefail @@ -232,6 +233,18 @@ if [ -s "$ANALYSIS_DIR/coalition-mathematics.md" ]; then || { echo "❌ coalition-mathematics.md: missing seat-count / vote-breakdown table"; FAIL=1; } fi +# Check 9b — Statskontoret evidence in implementation-feasibility.md +# When implementation-feasibility.md names a recognised agency, the file MUST +# populate the `| **Statskontoret relevance** | ... |` row with either a +# statskontoret.se URL or the literal `none found` when no relevant coverage exists. +AGENCY_RE='Kriminalvård(en)?|Polismyndigheten|Försäkringskassan|Skatteverket|Migrationsverket|Arbetsförmedlingen|Socialstyrelsen|Transportstyrelsen|Trafikverket|Naturvårdsverket|Energimyndigheten' +if [ -s "$ANALYSIS_DIR/implementation-feasibility.md" ]; then + if grep -qE "$AGENCY_RE" "$ANALYSIS_DIR/implementation-feasibility.md"; then + grep -qiE '^\|[[:space:]]*\*\*Statskontoret relevance\*\*[[:space:]]*\|[[:space:]]*([^|]*statskontoret\.se[^|]*|[^|]*none found[^|]*)\|' "$ANALYSIS_DIR/implementation-feasibility.md" \ + || { echo "❌ implementation-feasibility.md: names a recognised agency but the Statskontoret relevance row lacks a statskontoret.se URL or 'none found'"; FAIL=1; } + fi +fi + # Check 9 — PIR status sidecar (`pir-status.json`) # A valid pir-status.json must be present after every analysis run so that # open PIRs can be automatically rolled forward to the next cycle. diff --git a/.github/skills/myndigheter-monitoring/SKILL.md b/.github/skills/myndigheter-monitoring/SKILL.md index e01b8d5b42..5a16d78fc8 100644 --- a/.github/skills/myndigheter-monitoring/SKILL.md +++ b/.github/skills/myndigheter-monitoring/SKILL.md @@ -224,6 +224,55 @@ interviews (5 labor economists), stakeholder statements* - **Courts** - Administrative law challenges - **Media** - Investigative reporting (that's you!) +## Statskontoret Enrichment Layer + +The **Statskontoret enrichment layer** provides empirical agency-capacity evidence beneath document-level analysis. Use it whenever an `implementation-feasibility.md` artifact names a specific agency (Kriminalvården, Polismyndigheten, Försäkringskassan, etc.) and a feasibility claim needs grounding in published capacity data. + +### Index + +The seed index is at [`data/statskontoret/index.json`](../../../data/statskontoret/index.json). It contains the following fields per entry: + +| Field | Type | Description | +|-------|------|-------------| +| `title` | string | Full Swedish report/dataset title | +| `year` | number | Publication or reference year | +| `agency` | string | Named agency or `"*"` for cross-agency | +| `summary` | string | One-sentence abstract | +| `url` | string | Canonical Statskontoret URL | +| `admiralty_grade` | string | Source reliability (Admiralty scale A–F / 1–6) | +| `cached_at` | ISO-8601 | When the entry was last verified (TTL 30 days) | + +### How to use in implementation-feasibility.md + +1. **Look up** the agency in `data/statskontoret/index.json` (or via a `bash` search on `www.statskontoret.se`). +2. **Populate** the `Statskontoret relevance` row in the Feasibility Context table with the matched entry's URL and title. +3. **Cite** the entry in the 🏛️ Administrative feasibility section, following the established "Statskontoret overlay" pattern. +4. If no entry matches, search `https://www.statskontoret.se/publikationer/` and record `"none found"`. + +### CLI (fetch & persist) + +```bash +# Discover downloadable links for the agency register +tsx scripts/statskontoret-fetch.ts discover --source myndighetsforteckning + +# Fetch agency headcount workbook (once a URL is discovered) +tsx scripts/statskontoret-fetch.ts headcount --url --persist + +# Budget outturn +tsx scripts/statskontoret-fetch.ts budget-outturn --url --source arsutfall --persist +``` + +### Cache TTL + +Statskontoret reports are slow-moving; refresh the index at most once every **30 days**. The `cached_at` timestamp in each entry tracks the last verification. + +### Required behaviour for implementation-feasibility + +When an agency is named in `implementation-feasibility.md`: +- The **Feasibility Context** table MUST include a populated `Statskontoret relevance` row (URL or `"none found"`). +- The **Administrative feasibility** section MUST cite the Statskontoret entry or explicitly state no relevant report was found. +- Both fields are enforced by the analysis gate (`05-analysis-gate.md` Check 9b). + ## Remember - **Agencies matter** - They implement policy, affect daily life directly diff --git a/data/statskontoret/index.json b/data/statskontoret/index.json new file mode 100644 index 0000000000..51193778bc --- /dev/null +++ b/data/statskontoret/index.json @@ -0,0 +1,55 @@ +{ + "version": "1.0", + "source": "Statskontoret", + "classification": "Public", + "cache_ttl_days": 30, + "description": "Agency-capacity evidence index sourced from Statskontoret public reports. Used by implementation-feasibility analysis to cite empirical capacity data for named agencies (Kriminalvården, Polismyndigheten, Försäkringskassan, etc.).", + "generated_at": "2026-04-27T00:00:00Z", + "entries": [ + { + "title": "Statskontorets myndighetsförteckning", + "year": 2025, + "agency": "*", + "summary": "Annual register of all Swedish central-government authorities: headcount by department, organisational form and appropriation codes. Primary source for agency headcount time series.", + "url": "https://www.statskontoret.se/om-statskontoret/publika-register/myndighetsforteckning/", + "admiralty_grade": "A2", + "cached_at": "2026-04-27T00:00:00Z" + }, + { + "title": "Polisens förmåga att utreda brott — en uppföljning", + "year": 2023, + "agency": "Polismyndigheten", + "summary": "Follow-up study on the Swedish Police Authority's capacity to investigate crime. Analyses investigative backlog, clear-up rates and resource allocation. Relevant for implementation-feasibility of criminal-justice legislation. Note: URL points to the publications landing page; search for the specific report title to retrieve the direct PDF/HTML link.", + "url": "https://www.statskontoret.se/publikationer/", + "admiralty_grade": "C2", + "cached_at": "2026-04-27T00:00:00Z" + }, + { + "title": "Kriminalvårdens kapacitetsutmaningar", + "year": 2022, + "agency": "Kriminalvården", + "summary": "Assessment of the Swedish Prison and Probation Service capacity constraints: cell utilisation, staffing shortfalls, expansion plans, and implementation risks for new prison construction programmes. Note: URL points to the publications landing page; search for the specific report title to retrieve the direct PDF/HTML link.", + "url": "https://www.statskontoret.se/publikationer/", + "admiralty_grade": "C2", + "cached_at": "2026-04-27T00:00:00Z" + }, + { + "title": "Försäkringskassans administration av ersättningar", + "year": 2021, + "agency": "Försäkringskassan", + "summary": "Review of the Social Insurance Agency's administrative capacity for benefit administration, processing times, IT-system constraints, and implementation risk for new benefit schemes. Note: URL points to the publications landing page; search for the specific report title to retrieve the direct PDF/HTML link.", + "url": "https://www.statskontoret.se/publikationer/", + "admiralty_grade": "C2", + "cached_at": "2026-04-27T00:00:00Z" + }, + { + "title": "Statskontoret årsutfall — statsbudgeten", + "year": 2025, + "agency": "*", + "summary": "Annual budget outturn for the entire central-government budget. Contains expenditure by appropriation area enabling cross-agency fiscal feasibility benchmarking.", + "url": "https://www.statskontoret.se/om-statskontoret/publika-register/arsutfall/", + "admiralty_grade": "A1", + "cached_at": "2026-04-27T00:00:00Z" + } + ] +} diff --git a/scripts/download-parliamentary-data.ts b/scripts/download-parliamentary-data.ts index 0721bdb2df..b023681a0c 100644 --- a/scripts/download-parliamentary-data.ts +++ b/scripts/download-parliamentary-data.ts @@ -15,6 +15,7 @@ * Usage: * npx tsx scripts/download-parliamentary-data.ts [--date YYYY-MM-DD] [--limit N] * npx tsx scripts/download-parliamentary-data.ts --aggregate weekly [--date YYYY-WNN] + * npx tsx scripts/download-parliamentary-data.ts --auto-full-text-top-n 2 * * @see analysis/methodologies/ai-driven-analysis-guide.md * @author Hack23 AB @@ -62,6 +63,7 @@ export function parseArgs(argv: string[]): { rm: string | null; docType: DocumentTypeKey | null; documentIds: string[]; + autoFullTextTopN: number | null; } { const args = argv.slice(2); const get = (flag: string): string | null => { @@ -146,7 +148,21 @@ export function parseArgs(argv: string[]): { }) : []; - return { date: isoDate, aggregate, limit, weekLabel, rm, docType, documentIds }; + // --auto-full-text-top-n: Override the per-type full-text enrichment limit. + // When set, only the top N documents per type receive fetchDocumentDetails + // (full-text) enrichment, enabling more targeted significance-scoring input. + // Defaults to MAX_ENRICHMENT_PER_TYPE when omitted (null → caller uses default). + const autoFullTextTopNArg = get('--auto-full-text-top-n'); + let autoFullTextTopN: number | null = null; + if (autoFullTextTopNArg !== null) { + const parsed = Number(autoFullTextTopNArg); + if (!Number.isInteger(parsed) || parsed < 0) { + throw new Error(`Invalid --auto-full-text-top-n value: ${autoFullTextTopNArg}. Expected a non-negative integer.`); + } + autoFullTextTopN = parsed; + } + + return { date: isoDate, aggregate, limit, weekLabel, rm, docType, documentIds, autoFullTextTopN }; } function isoWeekNumber(date: Date): number { @@ -372,8 +388,9 @@ async function runPreArticleAnalysis(opts: { rm: string | null; docType: DocumentTypeKey | null; documentIds: string[]; + autoFullTextTopN: number | null; }): Promise { - const { date, limit, aggregate, weekLabel, rm, docType, documentIds } = opts; + const { date, limit, aggregate, weekLabel, rm, docType, documentIds, autoFullTextTopN } = opts; if (aggregate && weekLabel) { console.log(`\n📅 Running weekly data summary for: ${weekLabel}`); @@ -403,10 +420,17 @@ async function runPreArticleAnalysis(opts: { const client = new MCPClient(); const resolvedRm = rm ?? riksMoteFromDate(date); - const downloadOpts: { limit: number; rm: string; docTypes?: DocumentTypeKey[] } = { limit, rm: resolvedRm }; + const downloadOpts: { limit: number; rm: string; docTypes?: DocumentTypeKey[]; enrichLimit?: number } = { limit, rm: resolvedRm }; if (docType) { downloadOpts.docTypes = [docType]; } + // --auto-full-text-top-n wires the CLI flag into the per-type enrichment + // limit, enabling more targeted full-text fetching for significance scoring. + // When null, downloadAllDocuments uses MAX_ENRICHMENT_PER_TYPE (5) by default. + if (autoFullTextTopN !== null) { + downloadOpts.enrichLimit = autoFullTextTopN; + console.log(` 📝 Full-text enrichment: top ${autoFullTextTopN} documents per type (--auto-full-text-top-n=${autoFullTextTopN})`); + } const { data, manifest } = await downloadAllDocuments(client, downloadOpts); const flattenedDocs = flattenDocuments(data); @@ -537,6 +561,11 @@ async function runPreArticleAnalysis(opts: { console.log(' - analysis/methodologies/ai-driven-analysis-guide.md'); console.log(' - analysis/templates/ (per-file analysis templates)'); console.log(' - npx tsx scripts/catalog-downloaded-data.ts --pending-only'); + if (autoFullTextTopN !== null && autoFullTextTopN > 0) { + console.log(` ℹ️ Significance-scoring note: top-${autoFullTextTopN} documents per type have full text`); + console.log(' available (contentFetched=true) — AI significance-scoring step'); + console.log(' should prioritise those documents for deeper analysis.'); + } } // --------------------------------------------------------------------------- diff --git a/tests/statskontoret-enrichment-contract.test.ts b/tests/statskontoret-enrichment-contract.test.ts new file mode 100644 index 0000000000..234b6978a4 --- /dev/null +++ b/tests/statskontoret-enrichment-contract.test.ts @@ -0,0 +1,203 @@ +/** + * Statskontoret enrichment layer contract tests. + * + * Validates: + * - data/statskontoret/index.json structure and required fields + * - implementation-feasibility.md template contains required Statskontoret evidence hooks + * - download-parliamentary-data parseArgs handles --auto-full-text-top-n + * - graceful degradation when enrichment limit is 0 + */ + +import { describe, it, expect } from 'vitest'; +import { readFileSync } from 'node:fs'; +import { resolve, dirname } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import { parseArgs } from '../scripts/download-parliamentary-data.js'; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = dirname(__filename); +const repoRoot = resolve(__dirname, '..'); + +// --------------------------------------------------------------------------- +// data/statskontoret/index.json contract +// --------------------------------------------------------------------------- + +interface StatskontoretIndexEntry { + title: string; + year: number; + agency: string; + summary: string; + url: string; + admiralty_grade: string; + cached_at: string; +} + +interface StatskontoretIndex { + version: string; + source: string; + classification: string; + cache_ttl_days: number; + description: string; + generated_at: string; + entries: StatskontoretIndexEntry[]; +} + +function readStatskontoretIndex(): StatskontoretIndex { + return JSON.parse( + readFileSync(resolve(repoRoot, 'data/statskontoret/index.json'), 'utf-8'), + ) as StatskontoretIndex; +} + +describe('data/statskontoret/index.json', () => { + const idx = readStatskontoretIndex(); + + it('declares source as Statskontoret with Public classification', () => { + expect(idx.source).toMatch(/Statskontoret/i); + expect(idx.classification).toBe('Public'); + expect(idx.version).toBeTruthy(); + }); + + it('specifies a 30-day cache TTL', () => { + expect(idx.cache_ttl_days).toBe(30); + }); + + it('contains at least one entry', () => { + expect(idx.entries.length).toBeGreaterThanOrEqual(1); + }); + + it('each entry has required fields with valid formats', () => { + for (const entry of idx.entries) { + expect(typeof entry.title).toBe('string'); + expect(entry.title.length).toBeGreaterThan(0); + expect(typeof entry.year).toBe('number'); + expect(entry.year).toBeGreaterThan(2000); + expect(typeof entry.agency).toBe('string'); + expect(entry.agency.length).toBeGreaterThan(0); + expect(typeof entry.summary).toBe('string'); + expect(entry.summary.length).toBeGreaterThan(10); + expect(entry.url).toMatch(/^https?:\/\/www\.statskontoret\.se\//); + expect(entry.admiralty_grade).toMatch(/^[A-F][1-6]$/); + expect(entry.cached_at).toMatch(/^\d{4}-\d{2}-\d{2}/); + } + }); + + it('covers known high-priority agencies', () => { + const agencies = idx.entries.map(e => e.agency); + // At least one catch-all cross-agency entry should be present + expect(agencies.some(a => a === '*')).toBe(true); + // At least one entry should target a specific named agency + expect(agencies.some(a => a !== '*')).toBe(true); + }); + + it('entries with named agencies target recognised Swedish authorities', () => { + const KNOWN_AGENCIES = new Set([ + '*', + 'Kriminalvården', + 'Polismyndigheten', + 'Försäkringskassan', + 'Skatteverket', + 'Migrationsverket', + 'Arbetsförmedlingen', + 'Socialstyrelsen', + 'Transportstyrelsen', + 'Trafikverket', + 'Naturvårdsverket', + 'Energimyndigheten', + ]); + for (const entry of idx.entries) { + expect(KNOWN_AGENCIES.has(entry.agency)).toBe(true); + } + }); +}); + +// --------------------------------------------------------------------------- +// implementation-feasibility.md template: Statskontoret hooks +// --------------------------------------------------------------------------- + +describe('analysis/templates/implementation-feasibility.md', () => { + const templatePath = resolve(repoRoot, 'analysis/templates/implementation-feasibility.md'); + const content = readFileSync(templatePath, 'utf-8'); + + it('includes a Statskontoret relevance field in the Feasibility Context table', () => { + expect(content).toMatch(/Statskontoret relevance/i); + }); + + it('includes a Statskontoret overlay instruction in the Administrative section', () => { + expect(content).toMatch(/Statskontoret overlay/i); + }); + + it('references statskontoret in at least one evidence guidance note', () => { + expect(content.toLowerCase()).toContain('statskontoret'); + }); +}); + +// --------------------------------------------------------------------------- +// download-parliamentary-data: --auto-full-text-top-n parsing +// --------------------------------------------------------------------------- + +describe('parseArgs --auto-full-text-top-n', () => { + const BASE_ARGV = ['node', 'download-parliamentary-data.ts']; + + it('returns null when flag is absent', () => { + const args = parseArgs([...BASE_ARGV, '--date', '2026-04-27']); + expect(args.autoFullTextTopN).toBeNull(); + }); + + it('parses integer value correctly', () => { + const args = parseArgs([...BASE_ARGV, '--date', '2026-04-27', '--auto-full-text-top-n', '2']); + expect(args.autoFullTextTopN).toBe(2); + }); + + it('parses 0 (graceful-degradation: disable enrichment)', () => { + const args = parseArgs([...BASE_ARGV, '--date', '2026-04-27', '--auto-full-text-top-n', '0']); + expect(args.autoFullTextTopN).toBe(0); + }); + + it('parses larger values', () => { + const args = parseArgs([...BASE_ARGV, '--auto-full-text-top-n', '10']); + expect(args.autoFullTextTopN).toBe(10); + }); + + it('throws for non-integer float value', () => { + expect(() => + parseArgs([...BASE_ARGV, '--auto-full-text-top-n', '1.5']), + ).toThrow(/Invalid --auto-full-text-top-n/); + }); + + it('throws for negative value', () => { + expect(() => + parseArgs([...BASE_ARGV, '--auto-full-text-top-n', '-1']), + ).toThrow(/Invalid --auto-full-text-top-n/); + }); + + it('throws for non-numeric string', () => { + expect(() => + parseArgs([...BASE_ARGV, '--auto-full-text-top-n', 'abc']), + ).toThrow(/Invalid --auto-full-text-top-n/); + }); + + it('does not affect other parsed fields', () => { + const args = parseArgs([...BASE_ARGV, '--date', '2026-04-27', '--limit', '5', '--auto-full-text-top-n', '2']); + expect(args.limit).toBe(5); + expect(args.date).toBe('2026-04-27'); + expect(args.autoFullTextTopN).toBe(2); + }); +}); + +// --------------------------------------------------------------------------- +// Graceful-degradation: --auto-full-text-top-n=0 disables enrichment +// --------------------------------------------------------------------------- + +describe('graceful degradation: --auto-full-text-top-n=0', () => { + it('parseArgs returns autoFullTextTopN=0 which maps to enrichLimit=0 (no enrichment)', () => { + const args = parseArgs(['node', 'script', '--auto-full-text-top-n', '0']); + // enrichLimit=0 is the signal to downloadAllDocuments to skip all enrichment, + // providing graceful degradation when full-text fetch is unavailable. + expect(args.autoFullTextTopN).toBe(0); + }); + + it('default (no flag) leaves autoFullTextTopN null, meaning downloadAllDocuments uses MAX_ENRICHMENT_PER_TYPE', () => { + const args = parseArgs(['node', 'script']); + expect(args.autoFullTextTopN).toBeNull(); + }); +});