Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -1856,3 +1856,54 @@ graph TB
</a>
</p>

---

## 🏛️ Statskontoret Integration — Current Architecture

> **Effective:** 2026-04-25 · **Classification:** Public · **Runtime:** Node.js 25 / TypeScript CLI · **MCP status:** intentionally **not** an MCP server.

Statskontoret is now the Swedish public-administration and central-government budget-execution context layer. It complements the existing provider split: IMF remains primary for macro/fiscal projections, SCB remains Swedish official-statistics ground truth, World Bank remains governance/environment/social residue, and Statskontoret supplies agency structure plus budget outturn detail that the other providers do not expose in the same operational form.

### Architectural placement

```mermaid
flowchart LR
Workflow[Agentic news workflow<br/>Node 25] --> CLI[statskontoret-fetch.ts<br/>list-sources · discover · headcount]
CLI --> Client[StatskontoretClient<br/>statskontoret-client.ts]
Client --> Source[www.statskontoret.se<br/>open data pages]
Source --> XLSX[Excel workbooks]
Source --> ZIP[CSV ZIP archives]
Client --> Parser[XLSX / CSV-ZIP parsers<br/>typed StatskontoretError]
Parser --> Derived[Derived artifacts<br/>headcount-by-department]
Derived --> Persist[analysis/data/statskontoret/<br/>JSON + .meta.json sidecars]
Derived --> Articles[Article and dashboard context]
```

### Provider responsibility matrix

| Need | Primary provider | Riksdagsmonitor surface |
|---|---|---|
| Agency count, department grouping, leadership form and government-body headcount | **Statskontoret Myndighetsförteckning** | `scripts/statskontoret-fetch.ts headcount`, `analysis/statskontoret/` |
| Annual central-government budget outturn | **Statskontoret Årsutfall** | Download discovery and persisted raw/derived artifacts |
| Monthly central-government budget execution | **Statskontoret Månadsutfall** | Download discovery for high-frequency budget monitoring |
| Macro/fiscal projections and cross-country methodology | **IMF WEO/FM/SDMX** | `scripts/imf-*` |
| Swedish regional/monthly official statistics | **SCB PxWeb** | `scb` MCP |
| Governance/environment/social residue | **World Bank** | `world-bank` MCP |
Comment on lines +1884 to +1891

### Code and quality surfaces

| Surface | Responsibility |
|---|---|
| `scripts/statskontoret-client.ts` | Typed client, source catalogue, download discovery, HTML entity decoding, XLSX parsing, CSV ZIP parsing, numeric normalisation, department headcount aggregation. |
| `scripts/statskontoret-fetch.ts` | Import-safe CLI wrapper for workflows; exported argument parsing helpers for testability; exit code `2` for CLI contract errors. |
| `analysis/statskontoret/indicators-inventory.json` | Machine-readable dataset inventory and provider decision matrix. |
| `analysis/statskontoret/data-dictionary.md` | Field families, freshness discipline, persistence layout. |
| `tests/statskontoret-*.test.ts` | Inventory consistency, download-link extraction, workbook parsing, CSV ZIP parsing, CLI parsing and parser primitive coverage. |

### Operational characteristics

- **Trust boundary:** one outbound HTTPS boundary to `www.statskontoret.se`; no credentials, no private data, no write-back to the source.
- **Persistence:** optional `--persist` writes raw or derived payloads to `analysis/data/statskontoret/{dataset}/{artifact}.json` with `.meta.json` provenance sidecars.
- **Failure mode:** optional enrichment semantics; article generation can fall back to cached artifacts or omit Statskontoret context rather than blocking publication.
- **Security posture:** Public classification, high-integrity provenance, dependency surface limited to existing npm SBOM (`jszip`) and in-repository TypeScript code.

24 changes: 24 additions & 0 deletions DATA_MODEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2592,3 +2592,27 @@ This DATA_MODEL.md complements ARCHITECTURE.md:
**⏰ Next Review:** 2027-02-15
**🎯 Framework Compliance:** [![ISO 27001](https://img.shields.io/badge/ISO_27001-2022_Aligned-blue?style=flat-square&logo=iso&logoColor=white)](https://github.com/Hack23/ISMS-PUBLIC/blob/main/CLASSIFICATION.md) [![NIST CSF 2.0](https://img.shields.io/badge/NIST_CSF-2.0_Aligned-green?style=flat-square&logo=nist&logoColor=white)](https://github.com/Hack23/ISMS-PUBLIC/blob/main/CLASSIFICATION.md) [![CIS Controls](https://img.shields.io/badge/CIS_Controls-v8.1_Aligned-orange?style=flat-square&logo=cisecurity&logoColor=white)](https://github.com/Hack23/ISMS-PUBLIC/blob/main/CLASSIFICATION.md)

---

## 🏛️ Statskontoret Data Model Extension

Statskontoret adds a public Swedish-administration data domain under the economic/public-administration context layer.

### Source entities

| Entity | Key fields | Storage / source |
|---|---|---|
| `StatskontoretSourceDefinition` | `key`, `title`, `url`, `cadence`, `coverage`, `primaryUse` | Static catalogue in `scripts/statskontoret-client.ts`; mirrored by `analysis/statskontoret/indicators-inventory.json`. |
| `StatskontoretDownloadLink` | `source`, `sourcePage`, `url`, `resourceType`, `documentType`, `fileType`, `fileName`, `year`, `month`, `status`, `updatedAt` | Derived from Statskontoret HTML pages by `extractStatskontoretDownloadLinks`. |
| `StatskontoretWorkbook` / `StatskontoretSheet` | sheet name and row arrays | Parsed locally from XLSX ZIP parts. |
| `StatskontoretHeadcountRow` | `year`, `department`, `headcount`, `authorityCount` | Derived from Myndighetsförteckning rows. |
Comment on lines +2603 to +2608

### Persisted artifact contract

```text
analysis/data/statskontoret/{dataset}/{artifact}.json
analysis/data/statskontoret/{dataset}/{artifact}.meta.json
```

Sidecar metadata includes `fetchedAt`, `mcpTool: statskontoret-ts-client`, `dataset`, and `artifact`. The provider decision matrix in `analysis/statskontoret/indicators-inventory.json` maps government-body headcount and central-government budget outturn claims to Statskontoret, while macro/fiscal projections remain IMF-first.

26 changes: 26 additions & 0 deletions FLOWCHART.md
Original file line number Diff line number Diff line change
Expand Up @@ -969,3 +969,29 @@ flowchart LR
- 24 indicators across 10 IMF dataflows (WEO / FM / IFS / BOP / DOTS / GFS_COFOG / PCPS / ER / MFS_IR / MFS_PR) catalogued in [`analysis/imf/indicators-inventory.json`](analysis/imf/indicators-inventory.json)
- Vintage discipline (>6 mo → annotation) enforced by `tests/imf-inventory.test.ts` (13 assertions) and `tests/economic-context-multi-provider.test.ts` (asserts IMF queried before WB)
- Egress allow-list: `www.imf.org`, `sdmxcentral.imf.org` pinned in every workflow `network:` block

---

## 🏛️ Statskontoret Data Flow (Current State)

```mermaid
flowchart TD
Start[News / analysis workflow needs agency or budget-execution context]
Decision{Context type?}
Start --> Decision
Decision -->|Agency structure / headcount| MF[Statskontoret Myndighetsförteckning]
Decision -->|Annual budget outturn| AU[Statskontoret Årsutfall]
Decision -->|Monthly budget outturn| MU[Statskontoret Månadsutfall]
Decision -->|Macro projection| IMF[IMF WEO/FM]
MF --> CLI[statskontoret-fetch.ts]
AU --> CLI
MU --> CLI
CLI --> Discover[discover: extract Excel / CSV ZIP links]
CLI --> Headcount[headcount: parse XLSX and aggregate department time series]
Discover --> Persist[analysis/data/statskontoret JSON + meta]
Headcount --> Persist
Persist --> Article[Article / dashboard context with source URL and freshness]
```

Key gates: HTTPS-only source, source catalogue validation, parser tests, provenance sidecars, and optional-enrichment fallback.

49 changes: 49 additions & 0 deletions MINDMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -554,3 +554,52 @@ mindmap
Regional municipal
Budget execution
```

---

## 🏛️ Statskontoret Integration Branch (Current State)

```mermaid
mindmap
root((Statskontoret Integration))
Purpose
Swedish agency structure
Government-body headcount
Central-government budget execution
Sources
Myndighetsforteckning
Annual
XLSX
Headcount by department
Arsutfall
Annual
XLSX
CSV ZIP
Manadsutfall
Monthly
XLSX
CSV ZIP
Budget time series
Long-run state budget context
Code
statskontoret-client.ts
Discovery
XLSX parser
CSV ZIP parser
Typed StatskontoretError
statskontoret-fetch.ts
list-sources
discover
headcount
Governance
Public classification
No MCP server
No credentials
www.statskontoret.se allowlist
analysis/statskontoret inventory
Tests
client tests
CLI parsing tests
inventory tests
```

32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1108,3 +1108,35 @@ Riksdagsmonitor uses a **provider-tiered** data architecture, with each provider
**Why this split** — IMF uses uniform SNA 2008 / GFSM 2014 / BPM6 methodology across countries (essential for cross-country comparison), publishes T+5 projections (essential for look-ahead workflows), and has fresher data than World Bank's economic indicators. World Bank remains the canonical source for the classes IMF does not publish (WGI governance, environment).

Authority: [`.github/aw/ECONOMIC_DATA_CONTRACT.md`](.github/aw/ECONOMIC_DATA_CONTRACT.md) v2.1 · hub: [`analysis/imf/`](analysis/imf/) · agent guide: [`AGENTS.md`](AGENTS.md) §IMF.

---

## 🏛️ Statskontoret Swedish Administration Integration

Riksdagsmonitor now includes a pure-TypeScript Statskontoret integration for Swedish government-body and central-government budget-execution context.

| Dataset | Use |
|---|---|
| Myndighetsförteckning | Authority count, department grouping, leadership form and årsarbetskrafter/headcount over time. |
| Årsutfall för statens budget | Annual central-government revenue and expenditure outturns. |
| Månadsutfall för statens budget | Monthly budget execution from 2006 onward. |
| Tidsserier, statens budget m.m. | Long-run Swedish budget context. |

Quick commands:

```bash
tsx scripts/statskontoret-fetch.ts list-sources
tsx scripts/statskontoret-fetch.ts discover --source arsutfall --persist
tsx scripts/statskontoret-fetch.ts headcount --url "https://www.statskontoret.se/...xlsx" --persist
```

Architecture and governance references:

- `analysis/statskontoret/README.md` — integration hub.
- `analysis/statskontoret/indicators-inventory.json` — machine-readable source catalogue.
- `analysis/statskontoret/data-dictionary.md` — field and freshness rules.
- `scripts/statskontoret-client.ts` / `scripts/statskontoret-fetch.ts` — client and workflow CLI.
- `tests/statskontoret-client.test.ts`, `tests/statskontoret-fetch.test.ts`, `tests/statskontoret-inventory.test.ts` — regression coverage.

Provider rule: IMF remains primary for macro/fiscal projections, SCB remains Swedish statistical ground truth, World Bank remains governance/environment/social residue, and Statskontoret is authoritative for Swedish agency structure and central-government budget execution.

19 changes: 19 additions & 0 deletions SECURITY_ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3086,3 +3086,22 @@ flowchart LR
**Egress hosts** (allow-list): `www.imf.org` (Datamapper REST · WEO/FM), `sdmxcentral.imf.org` (SDMX 3.0 REST · IFS/BOP/DOTS/GFS/PCPS/ER/MFS_IR/MFS_PR). Both HTTPS-only, anonymous, public — no credentials required.

**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).

---

## 🏛️ Statskontoret Security Architecture

Statskontoret is a read-only public-data integration using in-repository TypeScript code and the existing npm dependency graph. It is intentionally not configured as an MCP server; workflows invoke `tsx scripts/statskontoret-fetch.ts` via the bash tool.

| Control area | Statskontoret control |
|---|---|
| Network egress | Allow only HTTPS to `www.statskontoret.se` for this provider. |
| Authentication | None required; no tokens or secrets transmitted. |
| Input validation | Resource classification, URL normalisation, HTML entity decoding, XLSX workbook structure checks, CSV ZIP file filtering. |
| Integrity | Persisted JSON plus `.meta.json` provenance sidecars with source/dataset/artifact/fetch timestamp. |
Comment on lines +3096 to +3101
| Availability | 15s client timeout and optional-enrichment fallback to cached artifacts. |
| Supply chain | Parser code is local TypeScript; ZIP/XLSX parsing uses `jszip` under npm lock/SBOM and advisory review. |
| Privacy | Public authority and aggregate budget records only; no private-person or credential data. |

Security classification: **PUBLIC / High Integrity / Medium-High Availability**. Mapped controls: ISO 27001 A.5.23 (cloud/service use), A.8.9 (configuration management), A.8.12 (data leakage prevention by design), A.8.20 (network security), NIST CSF 2.0 ID.IM / PR.DS / PR.PS, CIS Controls 4, 8, 12 and 16.

21 changes: 21 additions & 0 deletions TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -687,3 +687,24 @@ IMF_LIVE_SMOKE=1 npm test -- imf-client.live
- `tests/imf-vintage-discipline.test.ts` — asserts cache filenames carry vintage tags

**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).

---

## 🧪 Statskontoret Test Coverage

Statskontoret coverage is split across focused Vitest suites:

| Test file | Coverage |
|---|---|
| `tests/statskontoret-client.test.ts` | Download-link extraction, XLSX workbook parsing, CSV ZIP extraction, Swedish decimal handling, injected fetch client behavior. |
| `tests/statskontoret-fetch.test.ts` | Import-safe CLI parsing, typed CLI errors, source validation, resource classification, numeric parsing primitives. |
| `tests/statskontoret-inventory.test.ts` | Inventory metadata, dataset coverage parity with `STATSKONTORET_SOURCES`, provider-decision matrix, client/CLI/persistence declarations. |
Comment on lines +697 to +701

Targeted validation command:

```bash
npx vitest run tests/statskontoret-client.test.ts tests/statskontoret-fetch.test.ts tests/statskontoret-inventory.test.ts
```

Quality expectation: no live network calls in tests; fixtures model Statskontoret workbook/ZIP assumptions and prevent workflow regressions without depending on upstream availability.

24 changes: 24 additions & 0 deletions THREAT_MODEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -3000,3 +3000,27 @@ All mitigations are codified in:
**Egress hosts** (allow-list): `www.imf.org` (Datamapper REST · WEO/FM), `sdmxcentral.imf.org` (SDMX 3.0 REST · IFS/BOP/DOTS/GFS/PCPS/ER/MFS_IR/MFS_PR). Both HTTPS-only, anonymous, public — no credentials required.

**Canonical rule.** Every economic claim in a Riksdagsmonitor article cites an IMF dataflow first; World Bank citations are reserved for governance, environment and social residue (the classes IMF does not publish). SCB is the Swedish-specific ground truth layer. See `ECONOMIC_DATA_CONTRACT.md` v2.1 for the banned-phrase list and vintage discipline (>6 mo → annotation).

---

## 🏛️ Statskontoret Integration — STRIDE Threats

> **Effective:** 2026-04-25 · **Classification:** Public · **Entry point:** `scripts/statskontoret-fetch.ts` · **Source:** `www.statskontoret.se`.

Statskontoret ingestion introduces a public-data trust boundary for Swedish agency structure and budget outturn files. It is unauthenticated, read-only and optional enrichment, but the integrity of parsed figures matters for political-intelligence claims.

| ID | Asset / flow | STRIDE | Threat | Likelihood | Impact | Mitigations |
|---|---|---|---|---|---|---|
| T-STATS-01 | `www.statskontoret.se` page discovery | Spoofing | DNS/TLS interception or lookalike page returns false download links | LOW | MEDIUM | HTTPS-only egress, allow-list `www.statskontoret.se`, source URL recorded in payload and `.meta.json`, PR review of persisted diffs. |
| T-STATS-02 | Excel / CSV ZIP payload | Tampering | Workbook or archive content modified upstream or in transit | LOW | HIGH | TLS transport, local parser contract checks, typed `StatskontoretError`, persisted raw/derived artifacts with provenance sidecars, reviewer diff inspection. |
| T-STATS-03 | Headcount aggregation | Information integrity | Header drift maps wrong columns to `År`, `Departement`, `Myndighet`, or `Årsarbetskrafter` | MEDIUM | MEDIUM | Header-family matching documented in `analysis/statskontoret/data-dictionary.md`, unit tests for workbook parsing and Swedish number handling, fallback to no derived output if required fields cannot be resolved. |
Comment on lines +3012 to +3016
| T-STATS-04 | CLI invocation | Repudiation | Article cites agency headcount or budget outturn without source page/year/status | MEDIUM | MEDIUM | `discover` captures source page, URL, year/month/status and `last-modified`; persisted sidecars include `dataset`, `artifact`, `fetchedAt`, and `mcpTool: statskontoret-ts-client`. |
| T-STATS-05 | Source availability | Denial of service | Statskontoret page unavailable or workbook fetch times out | MEDIUM | LOW | 15s timeout, optional-enrichment semantics, cache-first reuse of `analysis/data/statskontoret/`, article generation can omit context rather than fail. |
| T-STATS-06 | XLSX/ZIP parsing dependency | Elevation of privilege | Malicious archive attempts parser/resource abuse | LOW | HIGH | `jszip` pinned in npm lock/SBOM, GitHub Advisory Database reviewed, no dynamic eval, no script execution from workbooks, tests exercise parser edge cases. |

### Residual risk and classification

- **Residual risk:** LOW-MEDIUM integrity risk due to upstream data or workbook-schema drift; handled by provenance, test coverage and human review.
- **Privacy:** no PII or credentials; public authority and aggregate budget data only.
- **CIA:** Public / High Integrity / Medium-High Availability for derived article context.

Loading
Loading