Skip to content

Commit 1858f03

Browse files
authored
Merge pull request #2043 from Hack23/copilot/add-statskontoret-enrichment-layer
feat: Statskontoret agency-capacity enrichment layer
2 parents c30d511 + 6c0fa13 commit 1858f03

5 files changed

Lines changed: 353 additions & 4 deletions

File tree

.github/prompts/05-analysis-gate.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,11 @@ This is the **only** gate separating analysis from article generation. If it fai
3030
8. **Family D structure checks**:
3131
- `forward-indicators.md` declares **≥ 10 dated indicators** (bullet or table rows matching a date pattern across the four horizon sections).
3232
- `coalition-mathematics.md` contains a seat-count table (≥ 1 table row with `Ja`/`Nej`/`Avstår` or a party-to-seats mapping).
33+
- `implementation-feasibility.md` — when it names a recognised agency (Kriminalvården, Polismyndigheten, Försäkringskassan, Skatteverket, Migrationsverket, Arbetsförmedlingen, Socialstyrelsen, Transportstyrelsen, Trafikverket, Naturvårdsverket, Energimyndigheten) — contains a `statskontoret.se` URL citation **or** the literal phrase `none found` in the `Statskontoret relevance` row.
3334

3435
## Implementation
3536

36-
No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–9, with check 9 conditional where applicable):
37+
No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–9, plus conditional check 9b where applicable):
3738

3839
```bash
3940
set -Eeuo pipefail
@@ -232,6 +233,18 @@ if [ -s "$ANALYSIS_DIR/coalition-mathematics.md" ]; then
232233
|| { echo "❌ coalition-mathematics.md: missing seat-count / vote-breakdown table"; FAIL=1; }
233234
fi
234235

236+
# Check 9b — Statskontoret evidence in implementation-feasibility.md
237+
# When implementation-feasibility.md names a recognised agency, the file MUST
238+
# populate the `| **Statskontoret relevance** | ... |` row with either a
239+
# statskontoret.se URL or the literal `none found` when no relevant coverage exists.
240+
AGENCY_RE='Kriminalvård(en)?|Polismyndigheten|Försäkringskassan|Skatteverket|Migrationsverket|Arbetsförmedlingen|Socialstyrelsen|Transportstyrelsen|Trafikverket|Naturvårdsverket|Energimyndigheten'
241+
if [ -s "$ANALYSIS_DIR/implementation-feasibility.md" ]; then
242+
if grep -qE "$AGENCY_RE" "$ANALYSIS_DIR/implementation-feasibility.md"; then
243+
grep -qiE '^\|[[:space:]]*\*\*Statskontoret relevance\*\*[[:space:]]*\|[[:space:]]*([^|]*statskontoret\.se[^|]*|[^|]*none found[^|]*)\|' "$ANALYSIS_DIR/implementation-feasibility.md" \
244+
|| { echo "❌ implementation-feasibility.md: names a recognised agency but the Statskontoret relevance row lacks a statskontoret.se URL or 'none found'"; FAIL=1; }
245+
fi
246+
fi
247+
235248
# Check 9 — PIR status sidecar (`pir-status.json`)
236249
# A valid pir-status.json must be present after every analysis run so that
237250
# open PIRs can be automatically rolled forward to the next cycle.

.github/skills/myndigheter-monitoring/SKILL.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,55 @@ interviews (5 labor economists), stakeholder statements*
224224
- **Courts** - Administrative law challenges
225225
- **Media** - Investigative reporting (that's you!)
226226

227+
## Statskontoret Enrichment Layer
228+
229+
The **Statskontoret enrichment layer** provides empirical agency-capacity evidence beneath document-level analysis. Use it whenever an `implementation-feasibility.md` artifact names a specific agency (Kriminalvården, Polismyndigheten, Försäkringskassan, etc.) and a feasibility claim needs grounding in published capacity data.
230+
231+
### Index
232+
233+
The seed index is at [`data/statskontoret/index.json`](../../../data/statskontoret/index.json). It contains the following fields per entry:
234+
235+
| Field | Type | Description |
236+
|-------|------|-------------|
237+
| `title` | string | Full Swedish report/dataset title |
238+
| `year` | number | Publication or reference year |
239+
| `agency` | string | Named agency or `"*"` for cross-agency |
240+
| `summary` | string | One-sentence abstract |
241+
| `url` | string | Canonical Statskontoret URL |
242+
| `admiralty_grade` | string | Source reliability (Admiralty scale A–F / 1–6) |
243+
| `cached_at` | ISO-8601 | When the entry was last verified (TTL 30 days) |
244+
245+
### How to use in implementation-feasibility.md
246+
247+
1. **Look up** the agency in `data/statskontoret/index.json` (or via a `bash` search on `www.statskontoret.se`).
248+
2. **Populate** the `Statskontoret relevance` row in the Feasibility Context table with the matched entry's URL and title.
249+
3. **Cite** the entry in the 🏛️ Administrative feasibility section, following the established "Statskontoret overlay" pattern.
250+
4. If no entry matches, search `https://www.statskontoret.se/publikationer/` and record `"none found"`.
251+
252+
### CLI (fetch & persist)
253+
254+
```bash
255+
# Discover downloadable links for the agency register
256+
tsx scripts/statskontoret-fetch.ts discover --source myndighetsforteckning
257+
258+
# Fetch agency headcount workbook (once a URL is discovered)
259+
tsx scripts/statskontoret-fetch.ts headcount --url <xlsx-url> --persist
260+
261+
# Budget outturn
262+
tsx scripts/statskontoret-fetch.ts budget-outturn --url <xlsx-url> --source arsutfall --persist
263+
```
264+
265+
### Cache TTL
266+
267+
Statskontoret reports are slow-moving; refresh the index at most once every **30 days**. The `cached_at` timestamp in each entry tracks the last verification.
268+
269+
### Required behaviour for implementation-feasibility
270+
271+
When an agency is named in `implementation-feasibility.md`:
272+
- The **Feasibility Context** table MUST include a populated `Statskontoret relevance` row (URL or `"none found"`).
273+
- The **Administrative feasibility** section MUST cite the Statskontoret entry or explicitly state no relevant report was found.
274+
- Both fields are enforced by the analysis gate (`05-analysis-gate.md` Check 9b).
275+
227276
## Remember
228277

229278
- **Agencies matter** - They implement policy, affect daily life directly

data/statskontoret/index.json

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"version": "1.0",
3+
"source": "Statskontoret",
4+
"classification": "Public",
5+
"cache_ttl_days": 30,
6+
"description": "Agency-capacity evidence index sourced from Statskontoret public reports. Used by implementation-feasibility analysis to cite empirical capacity data for named agencies (Kriminalvården, Polismyndigheten, Försäkringskassan, etc.).",
7+
"generated_at": "2026-04-27T00:00:00Z",
8+
"entries": [
9+
{
10+
"title": "Statskontorets myndighetsförteckning",
11+
"year": 2025,
12+
"agency": "*",
13+
"summary": "Annual register of all Swedish central-government authorities: headcount by department, organisational form and appropriation codes. Primary source for agency headcount time series.",
14+
"url": "https://www.statskontoret.se/om-statskontoret/publika-register/myndighetsforteckning/",
15+
"admiralty_grade": "A2",
16+
"cached_at": "2026-04-27T00:00:00Z"
17+
},
18+
{
19+
"title": "Polisens förmåga att utreda brott — en uppföljning",
20+
"year": 2023,
21+
"agency": "Polismyndigheten",
22+
"summary": "Follow-up study on the Swedish Police Authority's capacity to investigate crime. Analyses investigative backlog, clear-up rates and resource allocation. Relevant for implementation-feasibility of criminal-justice legislation. Note: URL points to the publications landing page; search for the specific report title to retrieve the direct PDF/HTML link.",
23+
"url": "https://www.statskontoret.se/publikationer/",
24+
"admiralty_grade": "C2",
25+
"cached_at": "2026-04-27T00:00:00Z"
26+
},
27+
{
28+
"title": "Kriminalvårdens kapacitetsutmaningar",
29+
"year": 2022,
30+
"agency": "Kriminalvården",
31+
"summary": "Assessment of the Swedish Prison and Probation Service capacity constraints: cell utilisation, staffing shortfalls, expansion plans, and implementation risks for new prison construction programmes. Note: URL points to the publications landing page; search for the specific report title to retrieve the direct PDF/HTML link.",
32+
"url": "https://www.statskontoret.se/publikationer/",
33+
"admiralty_grade": "C2",
34+
"cached_at": "2026-04-27T00:00:00Z"
35+
},
36+
{
37+
"title": "Försäkringskassans administration av ersättningar",
38+
"year": 2021,
39+
"agency": "Försäkringskassan",
40+
"summary": "Review of the Social Insurance Agency's administrative capacity for benefit administration, processing times, IT-system constraints, and implementation risk for new benefit schemes. Note: URL points to the publications landing page; search for the specific report title to retrieve the direct PDF/HTML link.",
41+
"url": "https://www.statskontoret.se/publikationer/",
42+
"admiralty_grade": "C2",
43+
"cached_at": "2026-04-27T00:00:00Z"
44+
},
45+
{
46+
"title": "Statskontoret årsutfall — statsbudgeten",
47+
"year": 2025,
48+
"agency": "*",
49+
"summary": "Annual budget outturn for the entire central-government budget. Contains expenditure by appropriation area enabling cross-agency fiscal feasibility benchmarking.",
50+
"url": "https://www.statskontoret.se/om-statskontoret/publika-register/arsutfall/",
51+
"admiralty_grade": "A1",
52+
"cached_at": "2026-04-27T00:00:00Z"
53+
}
54+
]
55+
}

scripts/download-parliamentary-data.ts

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
* Usage:
1616
* npx tsx scripts/download-parliamentary-data.ts [--date YYYY-MM-DD] [--limit N]
1717
* npx tsx scripts/download-parliamentary-data.ts --aggregate weekly [--date YYYY-WNN]
18+
* npx tsx scripts/download-parliamentary-data.ts --auto-full-text-top-n 2
1819
*
1920
* @see analysis/methodologies/ai-driven-analysis-guide.md
2021
* @author Hack23 AB
@@ -62,6 +63,7 @@ export function parseArgs(argv: string[]): {
6263
rm: string | null;
6364
docType: DocumentTypeKey | null;
6465
documentIds: string[];
66+
autoFullTextTopN: number | null;
6567
} {
6668
const args = argv.slice(2);
6769
const get = (flag: string): string | null => {
@@ -146,7 +148,21 @@ export function parseArgs(argv: string[]): {
146148
})
147149
: [];
148150

149-
return { date: isoDate, aggregate, limit, weekLabel, rm, docType, documentIds };
151+
// --auto-full-text-top-n: Override the per-type full-text enrichment limit.
152+
// When set, only the top N documents per type receive fetchDocumentDetails
153+
// (full-text) enrichment, enabling more targeted significance-scoring input.
154+
// Defaults to MAX_ENRICHMENT_PER_TYPE when omitted (null → caller uses default).
155+
const autoFullTextTopNArg = get('--auto-full-text-top-n');
156+
let autoFullTextTopN: number | null = null;
157+
if (autoFullTextTopNArg !== null) {
158+
const parsed = Number(autoFullTextTopNArg);
159+
if (!Number.isInteger(parsed) || parsed < 0) {
160+
throw new Error(`Invalid --auto-full-text-top-n value: ${autoFullTextTopNArg}. Expected a non-negative integer.`);
161+
}
162+
autoFullTextTopN = parsed;
163+
}
164+
165+
return { date: isoDate, aggregate, limit, weekLabel, rm, docType, documentIds, autoFullTextTopN };
150166
}
151167

152168
function isoWeekNumber(date: Date): number {
@@ -372,8 +388,9 @@ async function runPreArticleAnalysis(opts: {
372388
rm: string | null;
373389
docType: DocumentTypeKey | null;
374390
documentIds: string[];
391+
autoFullTextTopN: number | null;
375392
}): Promise<void> {
376-
const { date, limit, aggregate, weekLabel, rm, docType, documentIds } = opts;
393+
const { date, limit, aggregate, weekLabel, rm, docType, documentIds, autoFullTextTopN } = opts;
377394

378395
if (aggregate && weekLabel) {
379396
console.log(`\n📅 Running weekly data summary for: ${weekLabel}`);
@@ -403,10 +420,17 @@ async function runPreArticleAnalysis(opts: {
403420
const client = new MCPClient();
404421
const resolvedRm = rm ?? riksMoteFromDate(date);
405422

406-
const downloadOpts: { limit: number; rm: string; docTypes?: DocumentTypeKey[] } = { limit, rm: resolvedRm };
423+
const downloadOpts: { limit: number; rm: string; docTypes?: DocumentTypeKey[]; enrichLimit?: number } = { limit, rm: resolvedRm };
407424
if (docType) {
408425
downloadOpts.docTypes = [docType];
409426
}
427+
// --auto-full-text-top-n wires the CLI flag into the per-type enrichment
428+
// limit, enabling more targeted full-text fetching for significance scoring.
429+
// When null, downloadAllDocuments uses MAX_ENRICHMENT_PER_TYPE (5) by default.
430+
if (autoFullTextTopN !== null) {
431+
downloadOpts.enrichLimit = autoFullTextTopN;
432+
console.log(` 📝 Full-text enrichment: top ${autoFullTextTopN} documents per type (--auto-full-text-top-n=${autoFullTextTopN})`);
433+
}
410434

411435
const { data, manifest } = await downloadAllDocuments(client, downloadOpts);
412436
const flattenedDocs = flattenDocuments(data);
@@ -537,6 +561,11 @@ async function runPreArticleAnalysis(opts: {
537561
console.log(' - analysis/methodologies/ai-driven-analysis-guide.md');
538562
console.log(' - analysis/templates/ (per-file analysis templates)');
539563
console.log(' - npx tsx scripts/catalog-downloaded-data.ts --pending-only');
564+
if (autoFullTextTopN !== null && autoFullTextTopN > 0) {
565+
console.log(` ℹ️ Significance-scoring note: top-${autoFullTextTopN} documents per type have full text`);
566+
console.log(' available (contentFetched=true) — AI significance-scoring step');
567+
console.log(' should prioritise those documents for deeper analysis.');
568+
}
540569
}
541570

542571
// ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)