Skip to content

Commit 7246c68

Browse files
authored
Merge pull request #1993 from Hack23/copilot/add-data-support-for-statskontoret
Add Statskontoret data integration support
2 parents bea7119 + 1068acb commit 7246c68

20 files changed

Lines changed: 2478 additions & 2 deletions

ARCHITECTURE.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1856,3 +1856,54 @@ graph TB
18561856
</a>
18571857
</p>
18581858

1859+
---
1860+
1861+
## 🏛️ Statskontoret Integration — Current Architecture
1862+
1863+
> **Effective:** 2026-04-25 · **Classification:** Public · **Runtime:** Node.js 25 / TypeScript CLI · **MCP status:** intentionally **not** an MCP server.
1864+
1865+
Statskontoret is now the Swedish public-administration and central-government budget-execution context layer. It complements the existing provider split: IMF remains primary for macro/fiscal projections, SCB remains Swedish official-statistics ground truth, World Bank remains governance/environment/social residue, and Statskontoret supplies agency structure plus budget outturn detail that the other providers do not expose in the same operational form.
1866+
1867+
### Architectural placement
1868+
1869+
```mermaid
1870+
flowchart LR
1871+
Workflow[Agentic news workflow<br/>Node 25] --> CLI[statskontoret-fetch.ts<br/>list-sources · discover · headcount]
1872+
CLI --> Client[StatskontoretClient<br/>statskontoret-client.ts]
1873+
Client --> Source[www.statskontoret.se<br/>open data pages]
1874+
Source --> XLSX[Excel workbooks]
1875+
Source --> ZIP[CSV ZIP archives]
1876+
Client --> Parser[XLSX / CSV-ZIP parsers<br/>typed StatskontoretError]
1877+
Parser --> Derived[Derived artifacts<br/>headcount-by-department]
1878+
Derived --> Persist[analysis/data/statskontoret/<br/>JSON + .meta.json sidecars]
1879+
Derived --> Articles[Article and dashboard context]
1880+
```
1881+
1882+
### Provider responsibility matrix
1883+
1884+
| Need | Primary provider | Riksdagsmonitor surface |
1885+
|---|---|---|
1886+
| Agency count, department grouping, leadership form and government-body headcount | **Statskontoret Myndighetsförteckning** | `scripts/statskontoret-fetch.ts headcount`, `analysis/statskontoret/` |
1887+
| Annual central-government budget outturn | **Statskontoret Årsutfall** | Download discovery and persisted raw/derived artifacts |
1888+
| Monthly central-government budget execution | **Statskontoret Månadsutfall** | Download discovery for high-frequency budget monitoring |
1889+
| Macro/fiscal projections and cross-country methodology | **IMF WEO/FM/SDMX** | `scripts/imf-*` |
1890+
| Swedish regional/monthly official statistics | **SCB PxWeb** | `scb` MCP |
1891+
| Governance/environment/social residue | **World Bank** | `world-bank` MCP |
1892+
1893+
### Code and quality surfaces
1894+
1895+
| Surface | Responsibility |
1896+
|---|---|
1897+
| `scripts/statskontoret-client.ts` | Typed client, source catalogue, download discovery, HTML entity decoding, XLSX parsing, CSV ZIP parsing, numeric normalisation, department headcount aggregation. |
1898+
| `scripts/statskontoret-fetch.ts` | Import-safe CLI wrapper for workflows; exported argument parsing helpers for testability; exit code `2` for CLI contract errors. |
1899+
| `analysis/statskontoret/indicators-inventory.json` | Machine-readable dataset inventory and provider decision matrix. |
1900+
| `analysis/statskontoret/data-dictionary.md` | Field families, freshness discipline, persistence layout. |
1901+
| `tests/statskontoret-*.test.ts` | Inventory consistency, download-link extraction, workbook parsing, CSV ZIP parsing, CLI parsing and parser primitive coverage. |
1902+
1903+
### Operational characteristics
1904+
1905+
- **Trust boundary:** one outbound HTTPS boundary to `www.statskontoret.se`; no credentials, no private data, no write-back to the source.
1906+
- **Persistence:** optional `--persist` writes raw or derived payloads to `analysis/data/statskontoret/{dataset}/{artifact}.json` with `.meta.json` provenance sidecars.
1907+
- **Failure mode:** optional enrichment semantics; article generation can fall back to cached artifacts or omit Statskontoret context rather than blocking publication.
1908+
- **Security posture:** Public classification, high-integrity provenance, dependency surface limited to existing npm SBOM (`jszip`) and in-repository TypeScript code.
1909+

DATA_MODEL.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2592,3 +2592,27 @@ This DATA_MODEL.md complements ARCHITECTURE.md:
25922592
**⏰ Next Review:** 2027-02-15
25932593
**🎯 Framework Compliance:** [![ISO 27001](https://img.shields.io/badge/ISO_27001-2022_Aligned-blue?style=flat-square&logo=iso&logoColor=white)](https://github.com/Hack23/ISMS-PUBLIC/blob/main/CLASSIFICATION.md) [![NIST CSF 2.0](https://img.shields.io/badge/NIST_CSF-2.0_Aligned-green?style=flat-square&logo=nist&logoColor=white)](https://github.com/Hack23/ISMS-PUBLIC/blob/main/CLASSIFICATION.md) [![CIS Controls](https://img.shields.io/badge/CIS_Controls-v8.1_Aligned-orange?style=flat-square&logo=cisecurity&logoColor=white)](https://github.com/Hack23/ISMS-PUBLIC/blob/main/CLASSIFICATION.md)
25942594

2595+
---
2596+
2597+
## 🏛️ Statskontoret Data Model Extension
2598+
2599+
Statskontoret adds a public Swedish-administration data domain under the economic/public-administration context layer.
2600+
2601+
### Source entities
2602+
2603+
| Entity | Key fields | Storage / source |
2604+
|---|---|---|
2605+
| `StatskontoretSourceDefinition` | `key`, `title`, `url`, `cadence`, `coverage`, `primaryUse` | Static catalogue in `scripts/statskontoret-client.ts`; mirrored by `analysis/statskontoret/indicators-inventory.json`. |
2606+
| `StatskontoretDownloadLink` | `source`, `sourcePage`, `url`, `resourceType`, `documentType`, `fileType`, `fileName`, `year`, `month`, `status`, `updatedAt` | Derived from Statskontoret HTML pages by `extractStatskontoretDownloadLinks`. |
2607+
| `StatskontoretWorkbook` / `StatskontoretSheet` | sheet name and row arrays | Parsed locally from XLSX ZIP parts. |
2608+
| `StatskontoretHeadcountRow` | `year`, `department`, `headcount`, `authorityCount` | Derived from Myndighetsförteckning rows. |
2609+
2610+
### Persisted artifact contract
2611+
2612+
```text
2613+
analysis/data/statskontoret/{dataset}/{artifact}.json
2614+
analysis/data/statskontoret/{dataset}/{artifact}.meta.json
2615+
```
2616+
2617+
Sidecar metadata includes `fetchedAt`, `mcpTool: statskontoret-ts-client`, `dataset`, and `artifact`. The provider decision matrix in `analysis/statskontoret/indicators-inventory.json` maps government-body headcount and central-government budget outturn claims to Statskontoret, while macro/fiscal projections remain IMF-first.
2618+

FLOWCHART.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -969,3 +969,29 @@ flowchart LR
969969
- 24 indicators across 10 IMF dataflows (WEO / FM / IFS / BOP / DOTS / GFS_COFOG / PCPS / ER / MFS_IR / MFS_PR) catalogued in [`analysis/imf/indicators-inventory.json`](analysis/imf/indicators-inventory.json)
970970
- Vintage discipline (>6 mo → annotation) enforced by `tests/imf-inventory.test.ts` (13 assertions) and `tests/economic-context-multi-provider.test.ts` (asserts IMF queried before WB)
971971
- Egress allow-list: `www.imf.org`, `sdmxcentral.imf.org` pinned in every workflow `network:` block
972+
973+
---
974+
975+
## 🏛️ Statskontoret Data Flow (Current State)
976+
977+
```mermaid
978+
flowchart TD
979+
Start[News / analysis workflow needs agency or budget-execution context]
980+
Decision{Context type?}
981+
Start --> Decision
982+
Decision -->|Agency structure / headcount| MF[Statskontoret Myndighetsförteckning]
983+
Decision -->|Annual budget outturn| AU[Statskontoret Årsutfall]
984+
Decision -->|Monthly budget outturn| MU[Statskontoret Månadsutfall]
985+
Decision -->|Macro projection| IMF[IMF WEO/FM]
986+
MF --> CLI[statskontoret-fetch.ts]
987+
AU --> CLI
988+
MU --> CLI
989+
CLI --> Discover[discover: extract Excel / CSV ZIP links]
990+
CLI --> Headcount[headcount: parse XLSX and aggregate department time series]
991+
Discover --> Persist[analysis/data/statskontoret JSON + meta]
992+
Headcount --> Persist
993+
Persist --> Article[Article / dashboard context with source URL and freshness]
994+
```
995+
996+
Key gates: HTTPS-only source, source catalogue validation, parser tests, provenance sidecars, and optional-enrichment fallback.
997+

MINDMAP.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -554,3 +554,52 @@ mindmap
554554
Regional municipal
555555
Budget execution
556556
```
557+
558+
---
559+
560+
## 🏛️ Statskontoret Integration Branch (Current State)
561+
562+
```mermaid
563+
mindmap
564+
root((Statskontoret Integration))
565+
Purpose
566+
Swedish agency structure
567+
Government-body headcount
568+
Central-government budget execution
569+
Sources
570+
Myndighetsforteckning
571+
Annual
572+
XLSX
573+
Headcount by department
574+
Arsutfall
575+
Annual
576+
XLSX
577+
CSV ZIP
578+
Manadsutfall
579+
Monthly
580+
XLSX
581+
CSV ZIP
582+
Budget time series
583+
Long-run state budget context
584+
Code
585+
statskontoret-client.ts
586+
Discovery
587+
XLSX parser
588+
CSV ZIP parser
589+
Typed StatskontoretError
590+
statskontoret-fetch.ts
591+
list-sources
592+
discover
593+
headcount
594+
Governance
595+
Public classification
596+
No MCP server
597+
No credentials
598+
www.statskontoret.se allowlist
599+
analysis/statskontoret inventory
600+
Tests
601+
client tests
602+
CLI parsing tests
603+
inventory tests
604+
```
605+

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,3 +1108,35 @@ Riksdagsmonitor uses a **provider-tiered** data architecture, with each provider
11081108
**Why this split** — IMF uses uniform SNA 2008 / GFSM 2014 / BPM6 methodology across countries (essential for cross-country comparison), publishes T+5 projections (essential for look-ahead workflows), and has fresher data than World Bank's economic indicators. World Bank remains the canonical source for the classes IMF does not publish (WGI governance, environment).
11091109

11101110
Authority: [`.github/aw/ECONOMIC_DATA_CONTRACT.md`](.github/aw/ECONOMIC_DATA_CONTRACT.md) v2.1 · hub: [`analysis/imf/`](analysis/imf/) · agent guide: [`AGENTS.md`](AGENTS.md) §IMF.
1111+
1112+
---
1113+
1114+
## 🏛️ Statskontoret Swedish Administration Integration
1115+
1116+
Riksdagsmonitor now includes a pure-TypeScript Statskontoret integration for Swedish government-body and central-government budget-execution context.
1117+
1118+
| Dataset | Use |
1119+
|---|---|
1120+
| Myndighetsförteckning | Authority count, department grouping, leadership form and årsarbetskrafter/headcount over time. |
1121+
| Årsutfall för statens budget | Annual central-government revenue and expenditure outturns. |
1122+
| Månadsutfall för statens budget | Monthly budget execution from 2006 onward. |
1123+
| Tidsserier, statens budget m.m. | Long-run Swedish budget context. |
1124+
1125+
Quick commands:
1126+
1127+
```bash
1128+
tsx scripts/statskontoret-fetch.ts list-sources
1129+
tsx scripts/statskontoret-fetch.ts discover --source arsutfall --persist
1130+
tsx scripts/statskontoret-fetch.ts headcount --url "https://www.statskontoret.se/...xlsx" --persist
1131+
```
1132+
1133+
Architecture and governance references:
1134+
1135+
- `analysis/statskontoret/README.md` — integration hub.
1136+
- `analysis/statskontoret/indicators-inventory.json` — machine-readable source catalogue.
1137+
- `analysis/statskontoret/data-dictionary.md` — field and freshness rules.
1138+
- `scripts/statskontoret-client.ts` / `scripts/statskontoret-fetch.ts` — client and workflow CLI.
1139+
- `tests/statskontoret-client.test.ts`, `tests/statskontoret-fetch.test.ts`, `tests/statskontoret-inventory.test.ts` — regression coverage.
1140+
1141+
Provider rule: IMF remains primary for macro/fiscal projections, SCB remains Swedish statistical ground truth, World Bank remains governance/environment/social residue, and Statskontoret is authoritative for Swedish agency structure and central-government budget execution.
1142+

SECURITY_ARCHITECTURE.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3086,3 +3086,22 @@ flowchart LR
30863086
**Egress hosts** (allow-list): `www.imf.org` (Datamapper REST · WEO/FM), `sdmxcentral.imf.org` (SDMX 3.0 REST · IFS/BOP/DOTS/GFS/PCPS/ER/MFS_IR/MFS_PR). Both HTTPS-only, anonymous, public — no credentials required.
30873087

30883088
**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).
3089+
3090+
---
3091+
3092+
## 🏛️ Statskontoret Security Architecture
3093+
3094+
Statskontoret is a read-only public-data integration using in-repository TypeScript code and the existing npm dependency graph. It is intentionally not configured as an MCP server; workflows invoke `tsx scripts/statskontoret-fetch.ts` via the bash tool.
3095+
3096+
| Control area | Statskontoret control |
3097+
|---|---|
3098+
| Network egress | Allow only HTTPS to `www.statskontoret.se` for this provider. |
3099+
| Authentication | None required; no tokens or secrets transmitted. |
3100+
| Input validation | Resource classification, URL normalisation, HTML entity decoding, XLSX workbook structure checks, CSV ZIP file filtering. |
3101+
| Integrity | Persisted JSON plus `.meta.json` provenance sidecars with source/dataset/artifact/fetch timestamp. |
3102+
| Availability | 15s client timeout and optional-enrichment fallback to cached artifacts. |
3103+
| Supply chain | Parser code is local TypeScript; ZIP/XLSX parsing uses `jszip` under npm lock/SBOM and advisory review. |
3104+
| Privacy | Public authority and aggregate budget records only; no private-person or credential data. |
3105+
3106+
Security classification: **PUBLIC / High Integrity / Medium-High Availability**. Mapped controls: ISO 27001 A.5.23 (cloud/service use), A.8.9 (configuration management), A.8.12 (data leakage prevention by design), A.8.20 (network security), NIST CSF 2.0 ID.IM / PR.DS / PR.PS, CIS Controls 4, 8, 12 and 16.
3107+

TESTING.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -687,3 +687,24 @@ IMF_LIVE_SMOKE=1 npm test -- imf-client.live
687687
- `tests/imf-vintage-discipline.test.ts` — asserts cache filenames carry vintage tags
688688

689689
**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).
690+
691+
---
692+
693+
## 🧪 Statskontoret Test Coverage
694+
695+
Statskontoret coverage is split across focused Vitest suites:
696+
697+
| Test file | Coverage |
698+
|---|---|
699+
| `tests/statskontoret-client.test.ts` | Download-link extraction, XLSX workbook parsing, CSV ZIP extraction, Swedish decimal handling, injected fetch client behavior. |
700+
| `tests/statskontoret-fetch.test.ts` | Import-safe CLI parsing, typed CLI errors, source validation, resource classification, numeric parsing primitives. |
701+
| `tests/statskontoret-inventory.test.ts` | Inventory metadata, dataset coverage parity with `STATSKONTORET_SOURCES`, provider-decision matrix, client/CLI/persistence declarations. |
702+
703+
Targeted validation command:
704+
705+
```bash
706+
npx vitest run tests/statskontoret-client.test.ts tests/statskontoret-fetch.test.ts tests/statskontoret-inventory.test.ts
707+
```
708+
709+
Quality expectation: no live network calls in tests; fixtures model Statskontoret workbook/ZIP assumptions and prevent workflow regressions without depending on upstream availability.
710+

THREAT_MODEL.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3000,3 +3000,27 @@ All mitigations are codified in:
30003000
**Egress hosts** (allow-list): `www.imf.org` (Datamapper REST · WEO/FM), `sdmxcentral.imf.org` (SDMX 3.0 REST · IFS/BOP/DOTS/GFS/PCPS/ER/MFS_IR/MFS_PR). Both HTTPS-only, anonymous, public — no credentials required.
30013001

30023002
**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).
3003+
3004+
---
3005+
3006+
## 🏛️ Statskontoret Integration — STRIDE Threats
3007+
3008+
> **Effective:** 2026-04-25 · **Classification:** Public · **Entry point:** `scripts/statskontoret-fetch.ts` · **Source:** `www.statskontoret.se`.
3009+
3010+
Statskontoret ingestion introduces a public-data trust boundary for Swedish agency structure and budget outturn files. It is unauthenticated, read-only and optional enrichment, but the integrity of parsed figures matters for political-intelligence claims.
3011+
3012+
| ID | Asset / flow | STRIDE | Threat | Likelihood | Impact | Mitigations |
3013+
|---|---|---|---|---|---|---|
3014+
| T-STATS-01 | `www.statskontoret.se` page discovery | Spoofing | DNS/TLS interception or lookalike page returns false download links | LOW | MEDIUM | HTTPS-only egress, allow-list `www.statskontoret.se`, source URL recorded in payload and `.meta.json`, PR review of persisted diffs. |
3015+
| T-STATS-02 | Excel / CSV ZIP payload | Tampering | Workbook or archive content modified upstream or in transit | LOW | HIGH | TLS transport, local parser contract checks, typed `StatskontoretError`, persisted raw/derived artifacts with provenance sidecars, reviewer diff inspection. |
3016+
| T-STATS-03 | Headcount aggregation | Information integrity | Header drift maps wrong columns to `År`, `Departement`, `Myndighet`, or `Årsarbetskrafter` | MEDIUM | MEDIUM | Header-family matching documented in `analysis/statskontoret/data-dictionary.md`, unit tests for workbook parsing and Swedish number handling, fallback to no derived output if required fields cannot be resolved. |
3017+
| T-STATS-04 | CLI invocation | Repudiation | Article cites agency headcount or budget outturn without source page/year/status | MEDIUM | MEDIUM | `discover` captures source page, URL, year/month/status and `last-modified`; persisted sidecars include `dataset`, `artifact`, `fetchedAt`, and `mcpTool: statskontoret-ts-client`. |
3018+
| T-STATS-05 | Source availability | Denial of service | Statskontoret page unavailable or workbook fetch times out | MEDIUM | LOW | 15s timeout, optional-enrichment semantics, cache-first reuse of `analysis/data/statskontoret/`, article generation can omit context rather than fail. |
3019+
| T-STATS-06 | XLSX/ZIP parsing dependency | Elevation of privilege | Malicious archive attempts parser/resource abuse | LOW | HIGH | `jszip` pinned in npm lock/SBOM, GitHub Advisory Database reviewed, no dynamic eval, no script execution from workbooks, tests exercise parser edge cases. |
3020+
3021+
### Residual risk and classification
3022+
3023+
- **Residual risk:** LOW-MEDIUM integrity risk due to upstream data or workbook-schema drift; handled by provenance, test coverage and human review.
3024+
- **Privacy:** no PII or credentials; public authority and aggregate budget data only.
3025+
- **CIA:** Public / High Integrity / Medium-High Availability for derived article context.
3026+

0 commit comments

Comments
 (0)