You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Statskontoret is now the Swedish public-administration and central-government budget-execution context layer. It complements the existing provider split: IMF remains primary for macro/fiscal projections, SCB remains Swedish official-statistics ground truth, World Bank remains governance/environment/social residue, and Statskontoret supplies agency structure plus budget outturn detail that the other providers do not expose in the same operational form.
Derived --> Articles[Article and dashboard context]
1880
+
```
1881
+
1882
+
### Provider responsibility matrix
1883
+
1884
+
| Need | Primary provider | Riksdagsmonitor surface |
1885
+
|---|---|---|
1886
+
| Agency count, department grouping, leadership form and government-body headcount |**Statskontoret Myndighetsförteckning**|`scripts/statskontoret-fetch.ts headcount`, `analysis/statskontoret/`|
|`scripts/statskontoret-client.ts`| Typed client, source catalogue, download discovery, HTML entity decoding, XLSX parsing, CSV ZIP parsing, numeric normalisation, department headcount aggregation. |
1898
+
|`scripts/statskontoret-fetch.ts`| Import-safe CLI wrapper for workflows; exported argument parsing helpers for testability; exit code `2` for CLI contract errors. |
1899
+
|`analysis/statskontoret/indicators-inventory.json`| Machine-readable dataset inventory and provider decision matrix. |
1900
+
|`analysis/statskontoret/data-dictionary.md`| Field families, freshness discipline, persistence layout. |
1901
+
|`tests/statskontoret-*.test.ts`| Inventory consistency, download-link extraction, workbook parsing, CSV ZIP parsing, CLI parsing and parser primitive coverage. |
1902
+
1903
+
### Operational characteristics
1904
+
1905
+
-**Trust boundary:** one outbound HTTPS boundary to `www.statskontoret.se`; no credentials, no private data, no write-back to the source.
1906
+
-**Persistence:** optional `--persist` writes raw or derived payloads to `analysis/data/statskontoret/{dataset}/{artifact}.json` with `.meta.json` provenance sidecars.
1907
+
-**Failure mode:** optional enrichment semantics; article generation can fall back to cached artifacts or omit Statskontoret context rather than blocking publication.
1908
+
-**Security posture:** Public classification, high-integrity provenance, dependency surface limited to existing npm SBOM (`jszip`) and in-repository TypeScript code.
Sidecar metadata includes `fetchedAt`, `mcpTool: statskontoret-ts-client`, `dataset`, and `artifact`. The provider decision matrix in `analysis/statskontoret/indicators-inventory.json` maps government-body headcount and central-government budget outturn claims to Statskontoret, while macro/fiscal projections remain IMF-first.
Copy file name to clipboardExpand all lines: FLOWCHART.md
+26Lines changed: 26 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -969,3 +969,29 @@ flowchart LR
969
969
- 24 indicators across 10 IMF dataflows (WEO / FM / IFS / BOP / DOTS / GFS_COFOG / PCPS / ER / MFS_IR / MFS_PR) catalogued in [`analysis/imf/indicators-inventory.json`](analysis/imf/indicators-inventory.json)
970
970
- Vintage discipline (>6 mo → annotation) enforced by `tests/imf-inventory.test.ts` (13 assertions) and `tests/economic-context-multi-provider.test.ts` (asserts IMF queried before WB)
971
971
- Egress allow-list: `www.imf.org`, `sdmxcentral.imf.org` pinned in every workflow `network:` block
972
+
973
+
---
974
+
975
+
## 🏛️ Statskontoret Data Flow (Current State)
976
+
977
+
```mermaid
978
+
flowchart TD
979
+
Start[News / analysis workflow needs agency or budget-execution context]
Copy file name to clipboardExpand all lines: README.md
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1108,3 +1108,35 @@ Riksdagsmonitor uses a **provider-tiered** data architecture, with each provider
1108
1108
**Why this split** — IMF uses uniform SNA 2008 / GFSM 2014 / BPM6 methodology across countries (essential for cross-country comparison), publishes T+5 projections (essential for look-ahead workflows), and has fresher data than World Bank's economic indicators. World Bank remains the canonical source for the classes IMF does not publish (WGI governance, environment).
Provider rule: IMF remains primary for macro/fiscal projections, SCB remains Swedish statistical ground truth, World Bank remains governance/environment/social residue, and Statskontoret is authoritative for Swedish agency structure and central-government budget execution.
Copy file name to clipboardExpand all lines: SECURITY_ARCHITECTURE.md
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3086,3 +3086,22 @@ flowchart LR
3086
3086
**Egress hosts** (allow-list): `www.imf.org` (Datamapper REST · WEO/FM), `sdmxcentral.imf.org` (SDMX 3.0 REST · IFS/BOP/DOTS/GFS/PCPS/ER/MFS_IR/MFS_PR). Both HTTPS-only, anonymous, public — no credentials required.
3087
3087
3088
3088
**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).
3089
+
3090
+
---
3091
+
3092
+
## 🏛️ Statskontoret Security Architecture
3093
+
3094
+
Statskontoret is a read-only public-data integration using in-repository TypeScript code and the existing npm dependency graph. It is intentionally not configured as an MCP server; workflows invoke `tsx scripts/statskontoret-fetch.ts` via the bash tool.
3095
+
3096
+
| Control area | Statskontoret control |
3097
+
|---|---|
3098
+
| Network egress | Allow only HTTPS to `www.statskontoret.se` for this provider. |
3099
+
| Authentication | None required; no tokens or secrets transmitted. |
3100
+
| Input validation | Resource classification, URL normalisation, HTML entity decoding, XLSX workbook structure checks, CSV ZIP file filtering. |
3101
+
| Integrity | Persisted JSON plus `.meta.json` provenance sidecars with source/dataset/artifact/fetch timestamp. |
3102
+
| Availability | 15s client timeout and optional-enrichment fallback to cached artifacts. |
3103
+
| Supply chain | Parser code is local TypeScript; ZIP/XLSX parsing uses `jszip` under npm lock/SBOM and advisory review. |
3104
+
| Privacy | Public authority and aggregate budget records only; no private-person or credential data. |
**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).
690
+
691
+
---
692
+
693
+
## 🧪 Statskontoret Test Coverage
694
+
695
+
Statskontoret coverage is split across focused Vitest suites:
696
+
697
+
| Test file | Coverage |
698
+
|---|---|
699
+
| `tests/statskontoret-client.test.ts` | Download-link extraction, XLSX workbook parsing, CSV ZIP extraction, Swedish decimal handling, injected fetch client behavior. |
npx vitest run tests/statskontoret-client.test.ts tests/statskontoret-fetch.test.ts tests/statskontoret-inventory.test.ts
707
+
```
708
+
709
+
Quality expectation: no live network calls in tests; fixtures model Statskontoret workbook/ZIP assumptions and prevent workflow regressions without depending on upstream availability.
Copy file name to clipboardExpand all lines: THREAT_MODEL.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3000,3 +3000,27 @@ All mitigations are codified in:
3000
3000
**Egress hosts** (allow-list): `www.imf.org` (Datamapper REST · WEO/FM), `sdmxcentral.imf.org` (SDMX 3.0 REST · IFS/BOP/DOTS/GFS/PCPS/ER/MFS_IR/MFS_PR). Both HTTPS-only, anonymous, public — no credentials required.
3001
3001
3002
3002
**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).
Statskontoret ingestion introduces a public-data trust boundary for Swedish agency structure and budget outturn files. It is unauthenticated, read-only and optional enrichment, but the integrity of parsed figures matters for political-intelligence claims.
| T-STATS-01 |`www.statskontoret.se` page discovery | Spoofing | DNS/TLS interception or lookalike page returns false download links | LOW | MEDIUM | HTTPS-only egress, allow-list `www.statskontoret.se`, source URL recorded in payload and `.meta.json`, PR review of persisted diffs. |
3015
+
| T-STATS-02 | Excel / CSV ZIP payload | Tampering | Workbook or archive content modified upstream or in transit | LOW | HIGH | TLS transport, local parser contract checks, typed `StatskontoretError`, persisted raw/derived artifacts with provenance sidecars, reviewer diff inspection. |
3016
+
| T-STATS-03 | Headcount aggregation | Information integrity | Header drift maps wrong columns to `År`, `Departement`, `Myndighet`, or `Årsarbetskrafter`| MEDIUM | MEDIUM | Header-family matching documented in `analysis/statskontoret/data-dictionary.md`, unit tests for workbook parsing and Swedish number handling, fallback to no derived output if required fields cannot be resolved. |
3017
+
| T-STATS-04 | CLI invocation | Repudiation | Article cites agency headcount or budget outturn without source page/year/status | MEDIUM | MEDIUM |`discover` captures source page, URL, year/month/status and `last-modified`; persisted sidecars include `dataset`, `artifact`, `fetchedAt`, and `mcpTool: statskontoret-ts-client`. |
3018
+
| T-STATS-05 | Source availability | Denial of service | Statskontoret page unavailable or workbook fetch times out | MEDIUM | LOW | 15s timeout, optional-enrichment semantics, cache-first reuse of `analysis/data/statskontoret/`, article generation can omit context rather than fail. |
3019
+
| T-STATS-06 | XLSX/ZIP parsing dependency | Elevation of privilege | Malicious archive attempts parser/resource abuse | LOW | HIGH |`jszip` pinned in npm lock/SBOM, GitHub Advisory Database reviewed, no dynamic eval, no script execution from workbooks, tests exercise parser edge cases. |
3020
+
3021
+
### Residual risk and classification
3022
+
3023
+
-**Residual risk:** LOW-MEDIUM integrity risk due to upstream data or workbook-schema drift; handled by provenance, test coverage and human review.
3024
+
-**Privacy:** no PII or credentials; public authority and aggregate budget data only.
3025
+
-**CIA:** Public / High Integrity / Medium-High Availability for derived article context.
0 commit comments