Skip to content

Commit 0f24065

Browse files
Copilotpethers
andauthored
Merge remote-tracking branch 'origin/main' into copilot/implement-riksdag-calendar-api-fallback
# Conflicts: # scripts/fetch-calendar.ts # tests/fetch-calendar.test.ts Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
2 parents 2c8ad28 + a63366f commit 0f24065

9 files changed

Lines changed: 1619 additions & 2 deletions

File tree

.github/skills/myndigheter-monitoring/SKILL.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,81 @@ When an agency is named in `implementation-feasibility.md`:
286286
- **Stakeholder voices** - Include citizens, experts, civil society
287287
- **Public interest** - Agencies serve citizens, not themselves
288288

289+
## Statskontoret Data Integration
290+
291+
Statskontoret (Swedish Agency for Public Management) publishes open data that provides
292+
authoritative, Admiralty-A1 ground truth for government-body context. Use this data
293+
**before** relying on estimates or secondary sources when writing about agency headcounts,
294+
organisational structures or central-government budget execution.
295+
296+
### Available Datasets
297+
298+
| Dataset key | Title | Cadence | Primary use |
299+
|-------------|-------|---------|-------------|
300+
| `myndighetsforteckning` | Myndighetsförteckning — öppna data | Annual | Headcount by department & leadership form (2007–present) |
301+
| `arsutfall` | Årsutfall för statens budget — öppna data | Annual | Annual budget outturn by appropriation & agency |
302+
| `manadsutfall` | Månadsutfall för statens budget — öppna data | Monthly | High-frequency budget-execution monitoring |
303+
| `budget-time-series` | Tidsserier, statens budget m.m. | Annual | Long-run central-government budget context (1995+) |
304+
305+
### How to Fetch (agentic workflows)
306+
307+
The cached library helper is invoked from TypeScript code (see "Cached Fetch Module"
308+
below). For ad-hoc CLI use, the `statskontoret-fetch.ts` wrapper is the entrypoint:
309+
310+
```bash
311+
# CLI: list every built-in Statskontoret source
312+
tsx scripts/statskontoret-fetch.ts list-sources
313+
314+
# CLI: discover downloadable files for a source
315+
tsx scripts/statskontoret-fetch.ts discover --source myndighetsforteckning
316+
317+
# CLI: fetch + parse headcount workbook
318+
tsx scripts/statskontoret-fetch.ts headcount --url <xlsx-url> --persist
319+
320+
# CLI: fetch + parse budget-outturn workbook
321+
tsx scripts/statskontoret-fetch.ts budget-outturn --source arsutfall --url <xlsx-url> --doc-type Inkomst --persist
322+
```
323+
324+
### Cached Fetch Module (`scripts/fetch-statskontoret.ts`)
325+
326+
The `fetch-statskontoret.ts` module provides a **30-day TTL cache layer** over the raw
327+
HTTP client, making it suitable for agentic workflows that run daily but should only
328+
re-download large Excel workbooks every 30 days:
329+
330+
```typescript
331+
import { fetchStatskontoretCached, isStatskontoretCacheFresh } from './fetch-statskontoret.js';
332+
333+
// Check cache freshness without a network call
334+
if (!isStatskontoretCacheFresh('myndighetsforteckning')) {
335+
const payload = await fetchStatskontoretCached('myndighetsforteckning');
336+
// payload.fromCache === false → fresh download
337+
// payload.links → array of StatskontoretDownloadLink (Excel URLs)
338+
}
339+
```
340+
341+
On network failure the module automatically falls back to the most recent stale cache
342+
entry, ensuring workflows remain resilient to temporary outages.
343+
344+
### Data Provenance Rule
345+
346+
Any implementation-feasibility or agency-context analysis that names a Swedish
347+
government body **must** annotate the headcount or budget figure with a
348+
Statskontoret source citation:
349+
350+
```markdown
351+
*Headcount source: Statskontoret Myndighetsförteckning 2025
352+
(analysis/data/statskontoret/myndighetsforteckning/) [A1]*
353+
```
354+
355+
Admiralty grade for own-Statskontoret publications: **A1** (official statistics,
356+
primary public record).
357+
358+
### Network Allowlist
359+
360+
`www.statskontoret.se` and `statskontoret.se` are included in the `network.allowed`
361+
list of all 11 `news-*.md` agentic workflow files. No additional configuration is
362+
required.
363+
289364
## References
290365

291366
- [Swedish Agency Directory](https://www.regeringen.se/regeringens-politik/myndigheter-under-regeringen/)
@@ -294,6 +369,9 @@ When an agency is named in `implementation-feasibility.md`:
294369
- [OECD Public Administration Reviews](https://www.oecd.org/governance/)
295370
- [Transparency International Sweden](https://www.transparency.se/)
296371
- [Swedish Agency for Public Management (Statskontoret)](https://www.statskontoret.se/)
372+
- [Statskontoret Indicators Inventory](../../../analysis/statskontoret/indicators-inventory.json)
373+
- [fetch-statskontoret.ts](../../../scripts/fetch-statskontoret.ts) — 30-day cache module
374+
- [statskontoret-client.ts](../../../scripts/statskontoret-client.ts) — HTTP client library
297375

298376
---
299377

analysis/methodologies/ai-driven-analysis-guide.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,28 @@ npx tsx scripts/download-parliamentary-data.ts \
9393

9494
**Write `data-download-manifest.md`** using the [manifest template](../templates/data-download-manifest.md). It records what arrived, from which MCP tools, with what data-depth distribution (FULL-TEXT / SUMMARY / METADATA-ONLY).
9595

96+
After `download-parliamentary-data.ts` completes for `committeeReports`, also run the voting-records script to capture party-level vote counts and defector detection for each betänkande:
97+
98+
```bash
99+
npx tsx scripts/fetch-voting-records.ts \
100+
--date ${ARTICLE_DATE} \
101+
--doc-type committeeReports \
102+
--persist
103+
```
104+
105+
This writes `data/voteringar/${ARTICLE_DATE}/{bet}.json` and injects voting-record summaries into `analysis/daily/${ARTICLE_DATE}/committeeReports/voting-records/`. Each record carries an explicit `status` field. `fetchVotingForBet` emits one of three statuses: `"fetched"` (full table available), `"not_found"` (MCP returned successfully with zero rows — e.g. referral, procedural decision, or committee item without a chamber vote), or `"error"` (transient MCP/network failure with `errorMessage`). Editorial tooling that *knows* a vote is upcoming may also persist `"vote_pending"` annotations manually. The script emits a matching injection template for every non-`fetched` status (`<!-- vote-not-found: {bet} -->`, `<!-- vote-fetch-error: {bet} -->`, `<!-- vote-pending: {bet} -->`), so the coalition-mathematics section can paste the template verbatim and rerun the script to upgrade `error` / `not_found` to `fetched` once data is available.
106+
107+
To fetch the parliamentary forward calendar for week-ahead or month-ahead forecasting, run:
108+
109+
```bash
110+
npx tsx scripts/fetch-calendar.ts \
111+
--from ${ARTICLE_DATE} \
112+
--tom ${TOM_DATE} \
113+
--persist
114+
```
115+
116+
This writes `analysis/data/calendar/${ARTICLE_DATE}_${TOM_DATE}.json` using the MCP `get_calendar_events` primary path with automatic fallback to HTML parsing of riksdagen.se/sv/kalendarium/.
117+
96118
If the date yields 0 documents, apply the **Empty-Day Protocol** (§ Empty-Day Handling below) — never publish a "0 documents" file.
97119

98120
---
@@ -168,7 +190,7 @@ Every run produces **all five Family C files and all seven Family D files**. The
168190
|`methodology-reflection.md` | **VITAL run-audit gate.** Evidence sufficiency, confidence distribution, source diversity, party-neutrality arithmetic, **ICD 203 compliance audit**, three concrete methodology improvements for the next cycle. Skipping it breaks the self-correction loop. | [`methodology-reflection.md`](../templates/methodology-reflection.md) | Key Assumptions Check, Quality of Information Check |
169191
| `election-2026-analysis.md` | Seat-projection deltas + coalition viability for every run through 2026-09; after the election it converts to a permanent "post-2026 government-formation context" file | [`election-2026-analysis.md`](../templates/election-2026-analysis.md) | Morphological |
170192
| `voter-segmentation.md` | Demographic / regional / ideological segment impact; when the day's docs are procedural, documents baseline segment positions | [`voter-segmentation.md`](../templates/voter-segmentation.md) | Outside-In Thinking |
171-
| `coalition-mathematics.md` | Current seat map + pivotal votes + Sainte-Laguë scenarios; stable structure regardless of daily contentiousness | [`coalition-mathematics.md`](../templates/coalition-mathematics.md) | Morphological |
193+
| `coalition-mathematics.md` | Current seat map + pivotal votes + Sainte-Laguë scenarios; stable structure regardless of daily contentiousness. **MUST** include a voting-record table sourced from `fetch-voting-records` output (`data/voteringar/{date}/{bet}.json`) for every betänkande cited, or one of the explicit annotations: `<!-- vote-not-found: {bet} -->` when `status: "not_found"` (MCP returned successfully with zero data — referral or procedural vote), `<!-- vote-fetch-error: {bet} -->` when `status: "error"` (transient MCP/network failure; rerun once the service is back), or — set manually by editorial tooling that knows a vote is upcoming — `<!-- vote-pending: {bet} -->`. | [`coalition-mathematics.md`](../templates/coalition-mathematics.md) | Morphological |
172194
| `historical-parallels.md` | Named precedent(s) ≤ 40 years with similarity score; when no obvious parallel exists, documents the "no-precedent" finding with reasoning | [`historical-parallels.md`](../templates/historical-parallels.md) | Outside-In Thinking |
173195
| `media-framing-analysis.md` | How each party, press quadrant, and platform frames the day; runs every cycle to build the longitudinal frame record | [`media-framing-analysis.md`](../templates/media-framing-analysis.md) | Outside-In Thinking |
174196
| `implementation-feasibility.md` | Delivery-risk view (budget / IT / regulatory / workforce); when no new bill lands, audits the backlog of in-flight commitments | [`implementation-feasibility.md`](../templates/implementation-feasibility.md) | Premortem Analysis |
@@ -281,6 +303,12 @@ graph TB
281303
|----------|:--------:|:--------:|:--------:|:--------:|:--------:|
282304
| Morning per-type (propositions, motions, betänkanden, interpellationer, frågor) | ✅ All 9 | ✅ Both | ✅ All 5 | ✅ All 7 | ✅ Every doc |
283305
| Midday week-ahead / month-ahead forecasts | ✅ All 9 | ✅ Both | ✅ All 5 | ✅ All 7 | ✅ Every forecast item |
306+
307+
> 📅 **Week-ahead / month-ahead calendar enrichment**: For midday forecasting runs, run `fetch-calendar.ts` **before** analysis to pre-populate forward events:
308+
> ```bash
309+
> npx tsx scripts/fetch-calendar.ts --from ${ARTICLE_DATE} --tom ${TOM_DATE} --persist
310+
> ```
311+
> The resulting `analysis/data/calendar/${ARTICLE_DATE}_${TOM_DATE}.json` feeds `forward-indicators.md` (horizon items) and `coalition-mathematics.md` (scheduled votes). Use the `source` field to cite whether events came from MCP (`"mcp"`) or the web fallback (`"web_fallback"`), and apply the appropriate Admiralty reliability code.
284312
| Evening analysis | ✅ All 9 | ✅ Both | ✅ All 5 | ✅ All 7 | ✅ Every doc |
285313
| Realtime monitor | ✅ All 9 | ✅ Both | ✅ All 5 | ✅ All 7 | ✅ Every doc |
286314
| Weekly review | ✅ All 9 | ✅ Both | ✅ All 5 | ✅ All 7 | Top 20 |

analysis/statskontoret/indicators-inventory.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
"clients": {
99
"cli": "tsx scripts/statskontoret-fetch.ts (commands: list-sources, discover, headcount, budget-outturn)",
1010
"library": "scripts/statskontoret-client.ts (StatskontoretClient class)",
11+
"cachedFetch": "scripts/fetch-statskontoret.ts (fetchStatskontoretCached — 30-day TTL cache layer for agentic workflows)",
1112
"persistence": "scripts/parliamentary-data/data-persistence.ts (persistStatskontoretData)"
1213
},
1314
"notes": {

scripts/fetch-statskontoret.ts

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
/**
2+
* @module scripts/fetch-statskontoret
3+
* @description Cached fetch module for Statskontoret open data, providing a
4+
* 30-day TTL cache layer over {@link StatskontoretClient}.
5+
*
6+
* This module is intended for use by agentic workflows that need Statskontoret
7+
* context (authority register, budget outturn) without re-downloading large
8+
* Excel/ZIP files on every run. It follows the same no-MCP client pattern as
9+
* `imf-context.ts` and `scb-context.ts`.
10+
*
11+
* ### Cache behaviour
12+
* - Cache root: `analysis/data/statskontoret/<sourceKey>/cache/`
13+
* - TTL: 30 days (configurable via the `cacheTtlMs` option)
14+
* - On hit: returns the cached payload with provenance metadata
15+
* - On miss or stale: invokes `StatskontoretClient.discoverDownloads()` and
16+
* persists the result before returning
17+
* - On fetch error: falls back to the most recent stale cache entry (resilience)
18+
*
19+
* ### Security
20+
* Fetch calls go only to `https://www.statskontoret.se` (enforced by
21+
* `assertStatskontoretFetchTarget` inside `StatskontoretClient`). No
22+
* credentials are required; all data is PUBLIC classification.
23+
*
24+
* @see analysis/statskontoret/indicators-inventory.json
25+
* @see scripts/statskontoret-client.ts (low-level HTTP + parse)
26+
* @see scripts/statskontoret-fetch.ts (CLI entry-point)
27+
* @author Hack23 AB
28+
* @license Apache-2.0
29+
*/
30+
31+
import fs from 'node:fs';
32+
import path from 'node:path';
33+
import { fileURLToPath } from 'node:url';
34+
35+
import {
36+
getStatskontoretSource,
37+
STATSKONTORET_SOURCES,
38+
StatskontoretClient,
39+
StatskontoretError,
40+
type StatskontoretClientConfig,
41+
type StatskontoretDownloadLink,
42+
type StatskontoretSourceKey,
43+
} from './statskontoret-client.js';
44+
45+
// ---------------------------------------------------------------------------
46+
// Constants
47+
// ---------------------------------------------------------------------------
48+
49+
const __filename = fileURLToPath(import.meta.url);
50+
const REPO_ROOT = path.resolve(path.dirname(__filename), '..');
51+
52+
/** Default 30-day cache TTL in milliseconds (30 days × 24 h × 60 min × 60 s × 1000 ms). */
53+
export const CACHE_TTL_MS = 30 * 24 * 60 * 60 * 1000;
54+
55+
/** Root directory for cached Statskontoret payloads. */
56+
export const STATSKONTORET_CACHE_ROOT = path.join(
57+
REPO_ROOT,
58+
'analysis',
59+
'data',
60+
'statskontoret',
61+
);
62+
63+
// ---------------------------------------------------------------------------
64+
// Types
65+
// ---------------------------------------------------------------------------
66+
67+
/** A cached Statskontoret downloads payload with provenance metadata. */
68+
export interface StatskontoretCachedPayload {
69+
readonly sourceKey: StatskontoretSourceKey;
70+
readonly sourceTitle: string;
71+
readonly sourceUrl: string;
72+
readonly links: readonly StatskontoretDownloadLink[];
73+
readonly cachedAt: string;
74+
readonly fetchedAt: string;
75+
readonly fromCache: boolean;
76+
readonly cacheAgeMs: number;
77+
}
78+
79+
/** Options for {@link fetchStatskontoretCached}. */
80+
export interface FetchStatskontoretCachedOptions {
81+
/** Override the 30-day TTL (milliseconds). Mainly for testing. */
82+
readonly cacheTtlMs?: number;
83+
/** Override the cache root directory. Mainly for testing. */
84+
readonly cacheRoot?: string;
85+
/** Override the `StatskontoretClient` configuration (e.g. inject a mock fetch). */
86+
readonly clientConfig?: StatskontoretClientConfig;
87+
}
88+
89+
/** Internal cache file format. */
90+
interface CacheEntry {
91+
readonly fetchedAt: string;
92+
readonly sourceKey: StatskontoretSourceKey;
93+
readonly links: StatskontoretDownloadLink[];
94+
}
95+
96+
// ---------------------------------------------------------------------------
97+
// Private helpers
98+
// ---------------------------------------------------------------------------
99+
100+
function cacheDir(sourceKey: StatskontoretSourceKey, cacheRoot: string): string {
101+
return path.join(cacheRoot, sourceKey, 'cache');
102+
}
103+
104+
function cacheFilePath(sourceKey: StatskontoretSourceKey, cacheRoot: string): string {
105+
return path.join(cacheDir(sourceKey, cacheRoot), 'downloads.json');
106+
}
107+
108+
function readCacheEntry(filePath: string): CacheEntry | undefined {
109+
try {
110+
const raw = fs.readFileSync(filePath, 'utf-8');
111+
return JSON.parse(raw) as CacheEntry;
112+
} catch {
113+
return undefined;
114+
}
115+
}
116+
117+
function writeCacheEntry(filePath: string, entry: CacheEntry): void {
118+
const dir = path.dirname(filePath);
119+
fs.mkdirSync(dir, { recursive: true });
120+
fs.writeFileSync(filePath, JSON.stringify(entry, null, 2), 'utf-8');
121+
}
122+
123+
function isCacheFresh(fetchedAt: string, ttlMs: number): boolean {
124+
const age = Date.now() - new Date(fetchedAt).getTime();
125+
return age < ttlMs;
126+
}
127+
128+
// ---------------------------------------------------------------------------
129+
// Public API
130+
// ---------------------------------------------------------------------------
131+
132+
/**
133+
* Fetch Statskontoret download links for a given source key, using a 30-day
134+
* file-system cache.
135+
*
136+
* @param sourceKey - The Statskontoret source to fetch
137+
* (`myndighetsforteckning`, `arsutfall`, `manadsutfall`, `budget-time-series`).
138+
* @param options - Optional TTL, cache-root and client overrides.
139+
* @returns A {@link StatskontoretCachedPayload} with links and provenance info.
140+
*
141+
* @example
142+
* ```ts
143+
* const payload = await fetchStatskontoretCached('myndighetsforteckning');
144+
* console.log(`Found ${payload.links.length} download links (fromCache=${payload.fromCache})`);
145+
* ```
146+
*/
147+
export async function fetchStatskontoretCached(
148+
sourceKey: StatskontoretSourceKey,
149+
options: FetchStatskontoretCachedOptions = {},
150+
): Promise<StatskontoretCachedPayload> {
151+
const {
152+
cacheTtlMs = CACHE_TTL_MS,
153+
cacheRoot = STATSKONTORET_CACHE_ROOT,
154+
clientConfig = {},
155+
} = options;
156+
157+
const source = getStatskontoretSource(sourceKey);
158+
const filePath = cacheFilePath(sourceKey, cacheRoot);
159+
160+
// --- Cache hit ---
161+
const cached = readCacheEntry(filePath);
162+
if (cached !== undefined && isCacheFresh(cached.fetchedAt, cacheTtlMs)) {
163+
const cacheAgeMs = Date.now() - new Date(cached.fetchedAt).getTime();
164+
return {
165+
sourceKey,
166+
sourceTitle: source.title,
167+
sourceUrl: source.url,
168+
links: cached.links,
169+
cachedAt: cached.fetchedAt,
170+
fetchedAt: cached.fetchedAt,
171+
fromCache: true,
172+
cacheAgeMs,
173+
};
174+
}
175+
176+
// --- Cache miss or stale: fetch from origin ---
177+
const client = new StatskontoretClient(clientConfig);
178+
let links: StatskontoretDownloadLink[];
179+
let fetchedAt: string;
180+
181+
try {
182+
links = await client.discoverDownloads(sourceKey);
183+
// Stamp provenance after the fetch completes so `fetchedAt` reflects when
184+
// the data was actually retrieved, not when the request was issued.
185+
fetchedAt = new Date().toISOString();
186+
writeCacheEntry(filePath, { fetchedAt, sourceKey, links });
187+
} catch (error) {
188+
// --- Resilience: return stale cache on fetch failure ---
189+
if (cached !== undefined) {
190+
const cacheAgeMs = Date.now() - new Date(cached.fetchedAt).getTime();
191+
return {
192+
sourceKey,
193+
sourceTitle: source.title,
194+
sourceUrl: source.url,
195+
links: cached.links,
196+
cachedAt: cached.fetchedAt,
197+
fetchedAt: cached.fetchedAt,
198+
fromCache: true,
199+
cacheAgeMs,
200+
};
201+
}
202+
const detail = error instanceof Error ? error.message : String(error);
203+
throw new StatskontoretError(
204+
`fetch-statskontoret: failed to fetch ${sourceKey} and no cache available: ${detail}`,
205+
'http',
206+
{ cause: error },
207+
);
208+
}
209+
210+
return {
211+
sourceKey,
212+
sourceTitle: source.title,
213+
sourceUrl: source.url,
214+
links,
215+
cachedAt: fetchedAt,
216+
fetchedAt,
217+
fromCache: false,
218+
cacheAgeMs: 0,
219+
};
220+
}
221+
222+
/**
223+
* Check whether a fresh cache entry exists for the given source key without
224+
* triggering a network fetch.
225+
*
226+
* @param sourceKey - The Statskontoret source to check.
227+
* @param options - Optional TTL and cache-root overrides.
228+
* @returns `true` if a fresh cache entry exists, `false` otherwise.
229+
*/
230+
export function isStatskontoretCacheFresh(
231+
sourceKey: StatskontoretSourceKey,
232+
options: Pick<FetchStatskontoretCachedOptions, 'cacheTtlMs' | 'cacheRoot'> = {},
233+
): boolean {
234+
const { cacheTtlMs = CACHE_TTL_MS, cacheRoot = STATSKONTORET_CACHE_ROOT } = options;
235+
const filePath = cacheFilePath(sourceKey, cacheRoot);
236+
const cached = readCacheEntry(filePath);
237+
return cached !== undefined && isCacheFresh(cached.fetchedAt, cacheTtlMs);
238+
}
239+
240+
/**
241+
* Return the list of all built-in Statskontoret source keys.
242+
* Useful for iterating over all sources in agentic workflows.
243+
*/
244+
export function statskontoretSourceKeys(): readonly StatskontoretSourceKey[] {
245+
return STATSKONTORET_SOURCES.map((s) => s.key);
246+
}

0 commit comments

Comments
 (0)