diff --git a/.github/prompts/05-analysis-gate.md b/.github/prompts/05-analysis-gate.md
index 994c013577..c4f69320de 100644
--- a/.github/prompts/05-analysis-gate.md
+++ b/.github/prompts/05-analysis-gate.md
@@ -31,10 +31,13 @@ This is the **only** gate separating analysis from article generation. If it fai
- `forward-indicators.md` declares **≥ 10 dated indicators** (bullet or table rows matching a date pattern across the four horizon sections).
- `coalition-mathematics.md` contains a seat-count table (≥ 1 table row with `Ja`/`Nej`/`Avstår` or a party-to-seats mapping).
- `implementation-feasibility.md` — when it names a recognised agency (Kriminalvården, Polismyndigheten, Försäkringskassan, Skatteverket, Migrationsverket, Arbetsförmedlingen, Socialstyrelsen, Transportstyrelsen, Trafikverket, Naturvårdsverket, Energimyndigheten) — contains a `statskontoret.se` URL citation **or** the literal phrase `none found` in the `Statskontoret relevance` row.
+9. **PIR status sidecar** — `pir-status.json` is present and valid so open PIRs can roll forward to the next cycle.
+10. **Top-2 full-text availability** — when `data-download-manifest.md` contains a `## Full-Text Fetch Outcomes` table (written by `download-parliamentary-data.ts --auto-full-text-top-n`), at least 2 top documents must have `full_text_available=true`. Add `` to the manifest to bypass (e.g. when full text is genuinely unavailable from the MCP server or the flag was not used).
+11. **Supplementary artifacts** — see §Supplementary checks below (blocking for aggregation/Tier-C/multi-run).
## Implementation
-No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–9, plus conditional check 9b where applicable):
+No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–11, plus conditional check 9b where applicable):
```bash
set -Eeuo pipefail
@@ -238,9 +241,10 @@ fi
# populate the `| **Statskontoret relevance** | ... |` row with either a
# statskontoret.se URL or the literal `none found` when no relevant coverage exists.
AGENCY_RE='Kriminalvård(en)?|Polismyndigheten|Försäkringskassan|Skatteverket|Migrationsverket|Arbetsförmedlingen|Socialstyrelsen|Transportstyrelsen|Trafikverket|Naturvårdsverket|Energimyndigheten'
+STATSKONTORET_RELEVANCE_RE='^\|[[:space:]]*\*\*Statskontoret relevance\*\*[[:space:]]*\|[[:space:]]*([^|]*statskontoret\.se[^|]*|[^|]*none found[^|]*)\|'
if [ -s "$ANALYSIS_DIR/implementation-feasibility.md" ]; then
if grep -qE "$AGENCY_RE" "$ANALYSIS_DIR/implementation-feasibility.md"; then
- grep -qiE '^\|[[:space:]]*\*\*Statskontoret relevance\*\*[[:space:]]*\|[[:space:]]*([^|]*statskontoret\.se[^|]*|[^|]*none found[^|]*)\|' "$ANALYSIS_DIR/implementation-feasibility.md" \
+ grep -qiE "$STATSKONTORET_RELEVANCE_RE" "$ANALYSIS_DIR/implementation-feasibility.md" \
|| { echo "❌ implementation-feasibility.md: names a recognised agency but the Statskontoret relevance row lacks a statskontoret.se URL or 'none found'"; FAIL=1; }
fi
fi
@@ -319,6 +323,26 @@ except Exception as e:
" 2>&1 || FAIL=1
fi
+# Check 10 — top-2 full-text availability (auto-full-text-top-n gate)
+# When the manifest contains a "Full-Text Fetch Outcomes" table (written by
+# download-parliamentary-data.ts --auto-full-text-top-n), verify that at least
+# 2 top documents have full_text_available=true. A fallback annotation
+# anywhere in the manifest bypasses
+# this check so that runs without the flag, or runs where full text is
+# genuinely unavailable from the MCP server, are not blocked.
+if [ -s "$ANALYSIS_DIR/data-download-manifest.md" ]; then
+ if grep -q "## Full-Text Fetch Outcomes" "$ANALYSIS_DIR/data-download-manifest.md"; then
+ if grep -q "full-text-fallback:" "$ANALYSIS_DIR/data-download-manifest.md"; then
+ : # Fallback annotation present — bypass check
+ else
+ FT_SUCCESS=$(grep -cE '^\|[[:space:]]*[A-Za-z0-9_-]+[[:space:]]*\|[[:space:]]*true' \
+ "$ANALYSIS_DIR/data-download-manifest.md" || true)
+ [ "${FT_SUCCESS:-0}" -ge 2 ] \
+ || { echo "❌ data-download-manifest.md: Full-Text Fetch Outcomes table present but fewer than 2 top documents have full_text_available=true (found ${FT_SUCCESS:-0}). Add to the manifest to bypass."; FAIL=1; }
+ fi
+ fi
+fi
+
[ "$FAIL" -eq 0 ] || exit 1
```
@@ -351,7 +375,7 @@ Non-blocking for `standard` / `deep` runs; **blocking for `comprehensive` / Tier
Inline bash probe — append to the main block after `FAIL=0` bookkeeping completes. Supplementary artifacts have **three independent blocking triggers**, not a single tier-only rule: **aggregation article types** (`weekly-review`, `monthly-review`) require the aggregation artifacts; any run whose **tier** is `comprehensive` (the Tier-C run mode) requires the Tier-C supplementary set; and `cross-run-diff.md` is blocking whenever the workflow has **≥ 2 production runs** of the same article type, including `standard` and `deep` runs. `ARTICLE_TYPE` encodes the workflow family; `ANALYSIS_TIER` (when set) encodes the depth tier (`standard` | `deep` | `comprehensive`); `ANALYSIS_RUN_COUNT` (when set) is the numeric count of runs for the same article-generation cycle (if unset or non-numeric, treated as `1`).
```bash
-# Check 10 — supplementary artifacts (blocking for aggregation types, any Tier-C run, and S5 when run-count >= 2)
+# Check 11 — supplementary artifacts (blocking for aggregation types, any Tier-C run, and S5 when run-count >= 2)
IS_AGGREGATION=0
IS_TIER_C=0
IS_MULTI_RUN=0
diff --git a/analysis/methodologies/ai-driven-analysis-guide.md b/analysis/methodologies/ai-driven-analysis-guide.md
index 7da05a828e..2be0cc9f1c 100644
--- a/analysis/methodologies/ai-driven-analysis-guide.md
+++ b/analysis/methodologies/ai-driven-analysis-guide.md
@@ -16,7 +16,7 @@
-**📋 Document Owner:** CEO | **📄 Version:** 6.6 | **📅 Last Updated:** 2026-04-25 (UTC)
+**📋 Document Owner:** CEO | **📄 Version:** 6.7 | **📅 Last Updated:** 2026-04-27 (UTC)
**🔄 Review Cycle:** Quarterly | **⏰ Next Review:** 2026-07-21
**🏢 Owner:** Hack23 AB (Org.nr 5595347807) | **🏷️ Classification:** Public
@@ -87,11 +87,13 @@ Scripts run the download. Example:
```bash
npx tsx scripts/download-parliamentary-data.ts \
--date ${ARTICLE_DATE} \
- --scope ${DOC_TYPE} \
- --out analysis/daily/${ARTICLE_DATE}/${DOC_TYPE}/data/
+ --doc-type ${DOC_TYPE} \
+ --auto-full-text-top-n 2
```
-**Write `data-download-manifest.md`** using the [manifest template](../templates/data-download-manifest.md). It records what arrived, from which MCP tools, with what data-depth distribution (FULL-TEXT / SUMMARY / METADATA-ONLY).
+**`--auto-full-text-top-n 2`** (recommended for L2/L3 runs): after the bulk download, the script calls `get_dokument_innehall` with `include_full_text=true` for the top-2 documents (by order in the downloaded batch) and persists the retrieved content to `analysis/daily/${ARTICLE_DATE}/${DOC_TYPE}/full-text/{dok_id}.md`. Accept the extra 30–60 s as a documented quality investment. The manifest's `## Full-Text Fetch Outcomes` table records `full_text_available` per `dok_id`; the analysis gate (check 10) enforces that ≥ 2 succeed or a `` annotation is present.
+
+**Write `data-download-manifest.md`** using the [manifest template](../templates/data-download-manifest.md). It records what arrived, from which MCP tools, with what data-depth distribution (FULL-TEXT / SUMMARY / METADATA-ONLY) and — when `--auto-full-text-top-n` is used — the `## Full-Text Fetch Outcomes` table.
After `download-parliamentary-data.ts` completes for `committeeReports`, also run the voting-records script to capture party-level vote counts and defector detection for each betänkande:
diff --git a/scripts/download-parliamentary-data.ts b/scripts/download-parliamentary-data.ts
index b023681a0c..e5863a9b90 100644
--- a/scripts/download-parliamentary-data.ts
+++ b/scripts/download-parliamentary-data.ts
@@ -34,8 +34,9 @@ import {
flattenDocuments,
subtractBusinessDays,
MAX_LOOKBACK_BUSINESS_DAYS,
+ fetchFullTextForTopN,
} from './parliamentary-data/data-downloader.js';
-import type { DocumentTypeKey } from './parliamentary-data/data-downloader.js';
+import type { DocumentTypeKey, FullTextFetchOutcome } from './parliamentary-data/data-downloader.js';
import { persistDownloadedData, sanitizeDokId } from './parliamentary-data/data-persistence.js';
@@ -148,10 +149,11 @@ export function parseArgs(argv: string[]): {
})
: [];
- // --auto-full-text-top-n: Override the per-type full-text enrichment limit.
- // When set, only the top N documents per type receive fetchDocumentDetails
- // (full-text) enrichment, enabling more targeted significance-scoring input.
- // Defaults to MAX_ENRICHMENT_PER_TYPE when omitted (null → caller uses default).
+ // --auto-full-text-top-n: Override the per-type full-text enrichment limit and
+ // persist full text outcomes for the first N documents in the current filtered
+ // array order. Defaults to null when omitted so downloadAllDocuments uses
+ // MAX_ENRICHMENT_PER_TYPE; explicit 0 disables per-type enrichment and
+ // persisted full-text fetching. No DIW significance ranking is applied here.
const autoFullTextTopNArg = get('--auto-full-text-top-n');
let autoFullTextTopN: number | null = null;
if (autoFullTextTopNArg !== null) {
@@ -235,6 +237,7 @@ function serializeDataManifest(
docCounts: Record,
dateFilteredTotal: number,
dataFreshness: string | null,
+ fullTextOutcomes?: FullTextFetchOutcome[],
): string {
const totalDocs = Object.values(docCounts).reduce((a, b) => a + b, 0);
const lines: string[] = [
@@ -267,6 +270,21 @@ function serializeDataManifest(
lines.push(`Data sourced from ${dataFreshness} via lookback fallback — check freshness indicators.`);
}
+ // Append full-text fetch outcomes when --auto-full-text-top-n was used.
+ if (fullTextOutcomes && fullTextOutcomes.length > 0) {
+ lines.push('', '## Full-Text Fetch Outcomes', '');
+ lines.push('| dok_id | full_text_available | chars | notes |');
+ lines.push('|--------|--------------------:|------:|-------|');
+ for (const o of fullTextOutcomes) {
+ const available = o.success ? 'true' : 'false';
+ const chars = o.chars > 0 ? String(o.chars) : '0';
+ const notes = o.reason ?? (o.filePath ? `persisted: ${o.filePath}` : '');
+ lines.push(`| ${o.dokId} | ${available} | ${chars} | ${notes} |`);
+ }
+ const successCount = fullTextOutcomes.filter(o => o.success).length;
+ lines.push('', `**Full-text retrieved**: ${successCount}/${fullTextOutcomes.length} top documents`);
+ }
+
return lines.join('\n');
}
@@ -514,10 +532,27 @@ async function runPreArticleAnalysis(opts: {
const persistResult = persistDownloadedData(data, resolvedRm);
console.log(` 🗄️ Persisted data for ${persistResult.written} documents to ${path.relative(REPO_ROOT, persistResult.dataRoot)}/ (${persistResult.skipped} skipped)`);
+ // ── Step 2b: Auto-fetch full text for top-N documents ────────────────────
+ let fullTextOutcomes: FullTextFetchOutcome[] | undefined;
+ if (autoFullTextTopN !== null && autoFullTextTopN > 0 && allDocs.length > 0) {
+ console.log(`\n📄 Step 2b: Auto-fetching full text for top-${autoFullTextTopN} documents (--auto-full-text-top-n=${autoFullTextTopN})...`);
+ console.log(' ⏱️ This may take 30–60 s — documented quality investment for deep-analysis tiers.');
+ fullTextOutcomes = await fetchFullTextForTopN(client, allDocs, autoFullTextTopN, outputDir);
+ const successCount = fullTextOutcomes.filter(o => o.success).length;
+ console.log(` ✅ Full text retrieved for ${successCount}/${fullTextOutcomes.length} document(s)`);
+ for (const o of fullTextOutcomes) {
+ if (o.success) {
+ console.log(` ✅ ${o.dokId}: ${o.chars} chars → ${o.filePath}`);
+ } else {
+ console.warn(` ⚠️ ${o.dokId}: ${o.reason}`);
+ }
+ }
+ }
+
// Write data-download-manifest.md (factual download summary — NOT analysis)
const manifestContent = serializeDataManifest(
date, generatedAt, manifest.dataSources, manifest.docCounts,
- allDocs.length, dataFreshness,
+ allDocs.length, dataFreshness, fullTextOutcomes,
);
const manifestPath = path.join(outputDir, 'data-download-manifest.md');
fs.writeFileSync(manifestPath, manifestContent, 'utf8');
@@ -553,6 +588,11 @@ async function runPreArticleAnalysis(opts: {
console.log(`\n✅ Data download complete! Results in: ${path.relative(REPO_ROOT, outputDir)}/`);
console.log(` 📄 ${totalFiles} total files written (1 manifest + ${storedCount} documents)`);
console.log(` 📊 ${allDocs.length} documents available for AI analysis`);
+ if (autoFullTextTopN !== null && autoFullTextTopN > 0) {
+ const successCount = fullTextOutcomes?.filter(o => o.success).length ?? 0;
+ const attempted = fullTextOutcomes?.length ?? 0;
+ console.log(` 📄 Full text: ${successCount}/${attempted} top-${autoFullTextTopN} documents (see full-text/ sub-folder)`);
+ }
if (docType) {
console.log(` 📋 Scoped to: ${docType}`);
}
diff --git a/scripts/fetch-statskontoret.ts b/scripts/fetch-statskontoret.ts
index c6870a74b4..5fd43ff2e2 100644
--- a/scripts/fetch-statskontoret.ts
+++ b/scripts/fetch-statskontoret.ts
@@ -180,8 +180,8 @@ export async function fetchStatskontoretCached(
try {
links = await client.discoverDownloads(sourceKey);
- // Stamp provenance after the fetch completes so `fetchedAt` reflects when
- // the data was actually retrieved, not when the request was issued.
+ // Stamp provenance after discovery completes so `fetchedAt` reflects the
+ // cache completion time, not when the request was issued.
fetchedAt = new Date().toISOString();
writeCacheEntry(filePath, { fetchedAt, sourceKey, links });
} catch (error) {
diff --git a/scripts/parliamentary-data/data-downloader.ts b/scripts/parliamentary-data/data-downloader.ts
index 7e1f8d3837..533ad2fea6 100644
--- a/scripts/parliamentary-data/data-downloader.ts
+++ b/scripts/parliamentary-data/data-downloader.ts
@@ -9,13 +9,17 @@
* and `rm`). Date-specific filtering should be applied by the caller after download
* (e.g., filtering by the `datum` field on each `RawDocument`).
*
- * This module is intentionally side-effect-free with respect to the filesystem;
- * callers are responsible for writing any output.
+ * Most functions in this module are side-effect-free with respect to the filesystem;
+ * callers are responsible for writing any output. The exception is `fetchFullTextForTopN`,
+ * which writes persisted full-text files to `{outputDir}/full-text/` — see its JSDoc.
*
* @author Hack23 AB
* @license Apache-2.0
*/
+import fs from 'node:fs';
+import path from 'node:path';
+
import type { RawDocument } from '../data-transformers/types.js';
import { isPersonProfileText } from '../data-transformers/helpers.js';
import type { MCPClient } from '../mcp-client/client.js';
@@ -82,6 +86,24 @@ export interface DownloadResult {
/** Maximum number of documents to enrich with full-text content per type. */
export const MAX_ENRICHMENT_PER_TYPE = 5;
+/**
+ * Outcome record for a single document in a top-N full-text fetch.
+ * Used in the data-download-manifest and as the return value of
+ * `fetchFullTextForTopN`.
+ */
+export interface FullTextFetchOutcome {
+ /** Riksdag document identifier */
+ dokId: string;
+ /** Whether meaningful full-text content was retrieved and persisted */
+ success: boolean;
+ /** Length (chars) of the persisted content; 0 when success is false */
+ chars: number;
+ /** Relative path to the persisted `.md` file (undefined when success is false) */
+ filePath?: string;
+ /** Human-readable reason when success is false */
+ reason?: string;
+}
+
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
@@ -485,3 +507,150 @@ export function flattenDocuments(data: DownloadedData): RawDocument[] {
return true;
});
}
+
+// ---------------------------------------------------------------------------
+// Top-N full-text fetch (auto-full-text-top-n feature)
+// ---------------------------------------------------------------------------
+
+/**
+ * Fetch full-text content for the top-N documents in `docs` and persist each
+ * to `{outputDir}/full-text/{dok_id}.md`.
+ *
+ * This function has filesystem side effects: it creates `outputDir/full-text/`
+ * (including any missing parent directories) and writes one `.md` file per
+ * successfully fetched document.
+ *
+ * Documents that lack a resolvable `dok_id` are skipped. If the MCP call
+ * succeeds but returns no meaningful content (< FULL_TEXT_MIN_LENGTH chars),
+ * the outcome is recorded as `success: false` with an explanatory `reason` so
+ * the caller (and the analysis gate) can distinguish "not tried" from
+ * "tried but only metadata returned".
+ *
+ * @param client - MCPClient instance for calling get_dokument_innehall
+ * @param docs - Ordered list of documents; first `topN` will be attempted
+ * @param topN - Maximum number of documents to fetch full text for
+ * @param outputDir - Base directory; `full-text/` sub-folder is created here
+ * (including any missing parent directories)
+ * @returns - One outcome record per dok_id attempted
+ */
+export async function fetchFullTextForTopN(
+ client: MCPClient,
+ docs: RawDocument[],
+ topN: number,
+ outputDir: string,
+): Promise {
+ if (topN <= 0 || docs.length === 0) return [];
+
+ const fullTextDir = path.join(outputDir, 'full-text');
+ fs.mkdirSync(fullTextDir, { recursive: true });
+
+ // Resolve dok_id for each candidate in the same priority order as enrichment.
+ const candidates: Array<{ dokId: string; doc: RawDocument }> = [];
+ for (const doc of docs) {
+ if (candidates.length >= topN) break;
+ const record = doc as Record;
+ const dokId = [
+ record['dok_id'],
+ record['dokument_id'],
+ record['rel_dok_id'],
+ record['id'],
+ record['dokumentnamn'],
+ ]
+ .map((v) => (typeof v === 'string' ? v.trim() : ''))
+ .find((v) => v.length > 0);
+ if (!dokId) continue;
+ candidates.push({ dokId, doc });
+ }
+
+ const outcomes: FullTextFetchOutcome[] = [];
+
+ for (const { dokId, doc } of candidates) {
+ let outcome: FullTextFetchOutcome;
+ try {
+ const str = (v: unknown): string => (typeof v === 'string' ? v : '');
+ const sanitize = (v: unknown): string => {
+ const s = str(v).trim();
+ return isPersonProfileText(s) ? '' : s;
+ };
+ const selectContent = (source: Record): string => {
+ const rawText = str(source['text']).trim();
+ const rawFullContent = sanitize(source['fullContent']);
+ // fullText may contain MP profile/deceased-notice text — sanitize it.
+ // text and html fields are structural content from the Riksdag dump and
+ // do not contain person-profile text, so str().trim() is sufficient.
+ const rawFullText = sanitize(source['fullText']);
+ const rawHtml = str(source['html']).trim();
+
+ // Prefer normalized/enriched content already present on the document,
+ // then fall back to the raw HTML dump fields.
+ return rawText.length > FULL_TEXT_MIN_LENGTH
+ ? rawText
+ : rawFullContent.length > FULL_TEXT_MIN_LENGTH
+ ? rawFullContent
+ : rawFullText.length > FULL_TEXT_MIN_LENGTH
+ ? rawFullText
+ : rawHtml;
+ };
+
+ // Reuse already-enriched fields on the document (set by downloadAllDocuments)
+ // before issuing a duplicate MCP call. Only call fetchDocumentDetails when
+ // the document does not already carry meaningful content.
+ const docRecord = doc as Record;
+ let details: Record | null = null;
+ let content = selectContent(docRecord);
+
+ if (content.length <= FULL_TEXT_MIN_LENGTH) {
+ details = (await client.fetchDocumentDetails(dokId, true)) as Record;
+ content = selectContent(details);
+ }
+
+ if (content.length > FULL_TEXT_MIN_LENGTH) {
+ const filenameSafeDokId = dokId.replace(/[^A-Za-z0-9_-]/g, '_');
+ const filePath = path.join(fullTextDir, `${filenameSafeDokId}.md`);
+ const snippet =
+ sanitize(docRecord['snippet']) ||
+ sanitize(docRecord['summary']) ||
+ sanitize(details?.['snippet']) ||
+ sanitize(details?.['summary']) ||
+ '';
+ // Build header without filtering blank lines so Markdown structure is preserved.
+ // The array is joined with \n and must end with \n so content starts on a new line.
+ const headerLines = [
+ `# Full Text — ${dokId}`,
+ '',
+ ...(snippet ? [`> ${snippet}`, ''] : []),
+ '---',
+ '',
+ ];
+ const header = headerLines.join('\n');
+ fs.writeFileSync(filePath, header + content, 'utf8');
+ // Use outputDir as the stable base for the relative path so the manifest
+ // entry is consistent regardless of the caller's working directory.
+ // Normalize separators to POSIX form for byte-identical artifacts across OSes.
+ outcome = {
+ dokId,
+ success: true,
+ chars: content.length,
+ filePath: path.relative(outputDir, filePath).split(path.sep).join('/'),
+ };
+ } else {
+ outcome = {
+ dokId,
+ success: false,
+ chars: 0,
+ reason: `content below FULL_TEXT_MIN_LENGTH (${FULL_TEXT_MIN_LENGTH}) — metadata-only`,
+ };
+ }
+ } catch (err) {
+ outcome = {
+ dokId,
+ success: false,
+ chars: 0,
+ reason: `fetchDocumentDetails failed: ${err instanceof Error ? err.message : String(err)}`,
+ };
+ }
+ outcomes.push(outcome);
+ }
+
+ return outcomes;
+}
diff --git a/tests/auto-full-text-top-n.test.ts b/tests/auto-full-text-top-n.test.ts
new file mode 100644
index 0000000000..a79cdf8099
--- /dev/null
+++ b/tests/auto-full-text-top-n.test.ts
@@ -0,0 +1,433 @@
+/**
+ * @module tests/auto-full-text-top-n
+ * @description Tests for the --auto-full-text-top-n feature.
+ *
+ * Validates:
+ * - parseArgs correctly parses --auto-full-text-top-n flag
+ * - fetchFullTextForTopN fetches and persists full text for top-N documents
+ * - Graceful degradation when full text is unavailable (metadata-only)
+ * - Graceful degradation when fetchDocumentDetails rejects
+ * - Manifest serializeDataManifest records full-text outcomes table
+ * - Documents without resolvable dok_id are skipped
+ * - topN=0 returns empty array immediately
+ */
+
+import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as os from 'node:os';
+import * as path from 'node:path';
+
+import { parseArgs } from '../scripts/download-parliamentary-data.js';
+import {
+ fetchFullTextForTopN,
+ FULL_TEXT_MIN_LENGTH,
+} from '../scripts/parliamentary-data/data-downloader.js';
+import type { RawDocument } from '../scripts/data-transformers/types.js';
+import type { MCPClient } from '../scripts/mcp-client/client.js';
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+function makeDoc(overrides: Record = {}): RawDocument {
+ return {
+ dok_id: 'HD01FiU48',
+ titel: 'Test committee report',
+ doktyp: 'bet',
+ organ: 'FiU',
+ datum: '2026-04-26',
+ ...overrides,
+ };
+}
+
+function createMockClient(
+ fetchDetailsImpl?: (dokId: string, includeFullText: boolean) => Promise>,
+): MCPClient {
+ return {
+ fetchPropositions: vi.fn().mockResolvedValue([]),
+ fetchMotions: vi.fn().mockResolvedValue([]),
+ fetchCommitteeReports: vi.fn().mockResolvedValue([]),
+ fetchVotingRecords: vi.fn().mockResolvedValue([]),
+ searchSpeeches: vi.fn().mockResolvedValue([]),
+ fetchWrittenQuestions: vi.fn().mockResolvedValue([]),
+ fetchInterpellations: vi.fn().mockResolvedValue([]),
+ fetchDocumentDetails: fetchDetailsImpl
+ ? vi.fn().mockImplementation(fetchDetailsImpl)
+ : vi.fn().mockResolvedValue({}),
+ } as unknown as MCPClient;
+}
+
+let tmpDir: string;
+
+beforeEach(() => {
+ tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'test-auto-ft-'));
+});
+
+afterEach(() => {
+ fs.rmSync(tmpDir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// parseArgs — --auto-full-text-top-n flag
+// ---------------------------------------------------------------------------
+
+describe('parseArgs --auto-full-text-top-n', () => {
+ it('defaults to null when flag is absent', () => {
+ const result = parseArgs(['node', 'script.ts']);
+ expect(result.autoFullTextTopN).toBeNull();
+ });
+
+ it('parses integer value correctly', () => {
+ const result = parseArgs(['node', 'script.ts', '--auto-full-text-top-n', '2']);
+ expect(result.autoFullTextTopN).toBe(2);
+ });
+
+ it('accepts value 0 (explicit disable)', () => {
+ const result = parseArgs(['node', 'script.ts', '--auto-full-text-top-n', '0']);
+ expect(result.autoFullTextTopN).toBe(0);
+ });
+
+ it('accepts larger values', () => {
+ const result = parseArgs(['node', 'script.ts', '--auto-full-text-top-n', '5']);
+ expect(result.autoFullTextTopN).toBe(5);
+ });
+
+ it('throws on negative value', () => {
+ expect(() =>
+ parseArgs(['node', 'script.ts', '--auto-full-text-top-n', '-1']),
+ ).toThrow(/auto-full-text-top-n/);
+ });
+
+ it('throws on non-integer value', () => {
+ expect(() =>
+ parseArgs(['node', 'script.ts', '--auto-full-text-top-n', 'abc']),
+ ).toThrow(/auto-full-text-top-n/);
+ });
+
+ it('throws on fractional value', () => {
+ expect(() =>
+ parseArgs(['node', 'script.ts', '--auto-full-text-top-n', '2.5']),
+ ).toThrow(/auto-full-text-top-n/);
+ });
+
+ it('combines correctly with other flags', () => {
+ const result = parseArgs([
+ 'node', 'script.ts',
+ '--date', '2026-04-26',
+ '--limit', '10',
+ '--auto-full-text-top-n', '2',
+ ]);
+ expect(result.date).toBe('2026-04-26');
+ expect(result.limit).toBe(10);
+ expect(result.autoFullTextTopN).toBe(2);
+ });
+});
+
+// ---------------------------------------------------------------------------
+// fetchFullTextForTopN — core behaviour
+// ---------------------------------------------------------------------------
+
+describe('fetchFullTextForTopN', () => {
+ it('returns empty array when topN is 0', async () => {
+ const client = createMockClient();
+ const outcomes = await fetchFullTextForTopN(client, [makeDoc()], 0, tmpDir);
+ expect(outcomes).toHaveLength(0);
+ expect(client.fetchDocumentDetails).not.toHaveBeenCalled();
+ });
+
+ it('returns empty array when docs list is empty', async () => {
+ const client = createMockClient();
+ const outcomes = await fetchFullTextForTopN(client, [], 2, tmpDir);
+ expect(outcomes).toHaveLength(0);
+ });
+
+ it('fetches full text for top-N documents and persists to full-text/ dir', async () => {
+ const longContent = '' + 'X'.repeat(FULL_TEXT_MIN_LENGTH + 50) + '
';
+ const docs = [
+ makeDoc({ dok_id: 'HD01FiU48' }),
+ makeDoc({ dok_id: 'HD01CU25' }),
+ ];
+ const client = createMockClient(async (_dokId, _) => ({ text: longContent, snippet: 'Test snippet' }));
+
+ const outcomes = await fetchFullTextForTopN(client, docs, 2, tmpDir);
+
+ expect(outcomes).toHaveLength(2);
+ expect(outcomes[0]!.dokId).toBe('HD01FiU48');
+ expect(outcomes[0]!.success).toBe(true);
+ expect(outcomes[0]!.chars).toBeGreaterThan(FULL_TEXT_MIN_LENGTH);
+
+ expect(outcomes[1]!.dokId).toBe('HD01CU25');
+ expect(outcomes[1]!.success).toBe(true);
+
+ // Verify files were written to full-text/ subdirectory
+ const fullTextDir = path.join(tmpDir, 'full-text');
+ expect(fs.existsSync(path.join(fullTextDir, 'HD01FiU48.md'))).toBe(true);
+ expect(fs.existsSync(path.join(fullTextDir, 'HD01CU25.md'))).toBe(true);
+ });
+
+ it('only fetches top-N documents even when more docs provided', async () => {
+ const longContent = 'A'.repeat(FULL_TEXT_MIN_LENGTH + 10);
+ const docs = [
+ makeDoc({ dok_id: 'DOC1' }),
+ makeDoc({ dok_id: 'DOC2' }),
+ makeDoc({ dok_id: 'DOC3' }),
+ ];
+ const fetchDetails = vi.fn().mockResolvedValue({ text: longContent });
+ const client = createMockClient(fetchDetails);
+
+ await fetchFullTextForTopN(client, docs, 2, tmpDir);
+
+ expect(fetchDetails).toHaveBeenCalledTimes(2);
+ expect(fetchDetails).toHaveBeenCalledWith('DOC1', true);
+ expect(fetchDetails).toHaveBeenCalledWith('DOC2', true);
+ expect(fetchDetails).not.toHaveBeenCalledWith('DOC3', true);
+ });
+
+ it('creates full-text directory automatically', async () => {
+ const longContent = 'B'.repeat(FULL_TEXT_MIN_LENGTH + 1);
+ const client = createMockClient(async () => ({ text: longContent }));
+ const nestedDir = path.join(tmpDir, 'deeply', 'nested');
+
+ await fetchFullTextForTopN(client, [makeDoc({ dok_id: 'X1' })], 1, nestedDir);
+
+ expect(fs.existsSync(path.join(nestedDir, 'full-text', 'X1.md'))).toBe(true);
+ });
+
+ describe('graceful degradation — metadata-only response', () => {
+ it('records success=false when content is below FULL_TEXT_MIN_LENGTH', async () => {
+ const shortContent = 'short';
+ const client = createMockClient(async () => ({ text: shortContent, snippet: 'snippet' }));
+
+ const outcomes = await fetchFullTextForTopN(
+ client, [makeDoc({ dok_id: 'HD03104' })], 1, tmpDir,
+ );
+
+ expect(outcomes).toHaveLength(1);
+ expect(outcomes[0]!.success).toBe(false);
+ expect(outcomes[0]!.chars).toBe(0);
+ expect(outcomes[0]!.reason).toMatch(/metadata-only/);
+
+ // No file should be written for failed fetches
+ const fullTextDir = path.join(tmpDir, 'full-text');
+ expect(fs.existsSync(path.join(fullTextDir, 'HD03104.md'))).toBe(false);
+ });
+
+ it('falls back from text to fullText field when text is too short', async () => {
+ const longFullText = 'C'.repeat(FULL_TEXT_MIN_LENGTH + 20);
+ const client = createMockClient(async () => ({
+ text: 'short',
+ fullText: longFullText,
+ }));
+
+ const outcomes = await fetchFullTextForTopN(
+ client, [makeDoc({ dok_id: 'DOC_FT' })], 1, tmpDir,
+ );
+
+ expect(outcomes[0]!.success).toBe(true);
+ expect(outcomes[0]!.chars).toBe(longFullText.length);
+ });
+
+ it('falls back to html field when text and fullText are too short', async () => {
+ const longHtml = '' + 'D'.repeat(FULL_TEXT_MIN_LENGTH + 20) + '';
+ const client = createMockClient(async () => ({
+ text: 'x',
+ fullText: 'y',
+ html: longHtml,
+ }));
+
+ const outcomes = await fetchFullTextForTopN(
+ client, [makeDoc({ dok_id: 'DOC_HTML' })], 1, tmpDir,
+ );
+
+ expect(outcomes[0]!.success).toBe(true);
+ expect(outcomes[0]!.chars).toBe(longHtml.length);
+ });
+ });
+
+ describe('graceful degradation — fetchDocumentDetails throws', () => {
+ it('records success=false with reason when MCP call rejects', async () => {
+ const client = createMockClient(async () => {
+ throw new Error('MCP connection timeout');
+ });
+
+ const outcomes = await fetchFullTextForTopN(
+ client, [makeDoc({ dok_id: 'HD01CU24' })], 1, tmpDir,
+ );
+
+ expect(outcomes).toHaveLength(1);
+ expect(outcomes[0]!.success).toBe(false);
+ expect(outcomes[0]!.chars).toBe(0);
+ expect(outcomes[0]!.reason).toMatch(/fetchDocumentDetails failed/);
+ expect(outcomes[0]!.reason).toMatch(/MCP connection timeout/);
+ });
+
+ it('continues to next document after one fails', async () => {
+ const longContent = 'E'.repeat(FULL_TEXT_MIN_LENGTH + 5);
+ const client = createMockClient(async (dokId) => {
+ if (dokId === 'FAIL_DOC') throw new Error('timeout');
+ return { text: longContent };
+ });
+
+ const docs = [
+ makeDoc({ dok_id: 'FAIL_DOC' }),
+ makeDoc({ dok_id: 'OK_DOC' }),
+ ];
+
+ const outcomes = await fetchFullTextForTopN(client, docs, 2, tmpDir);
+
+ expect(outcomes).toHaveLength(2);
+ expect(outcomes[0]!.success).toBe(false);
+ expect(outcomes[1]!.success).toBe(true);
+ expect(outcomes[1]!.chars).toBeGreaterThan(0);
+ });
+ });
+
+ describe('dokId resolution', () => {
+ it('skips documents with no resolvable dok_id', async () => {
+ const fetchDetails = vi.fn().mockResolvedValue({});
+ const client = createMockClient(fetchDetails);
+ const docNoId = makeDoc({
+ dok_id: undefined,
+ dokument_id: undefined,
+ rel_dok_id: undefined,
+ id: undefined,
+ dokumentnamn: undefined,
+ });
+
+ const outcomes = await fetchFullTextForTopN(client, [docNoId], 1, tmpDir);
+
+ expect(outcomes).toHaveLength(0);
+ expect(fetchDetails).not.toHaveBeenCalled();
+ });
+
+ it('resolves dok_id from dokument_id when dok_id is absent', async () => {
+ const longContent = 'F'.repeat(FULL_TEXT_MIN_LENGTH + 5);
+ const fetchDetails = vi.fn().mockResolvedValue({ text: longContent });
+ const client = createMockClient(fetchDetails);
+ const doc = makeDoc({
+ dok_id: undefined,
+ dokument_id: 'DOKU1',
+ });
+
+ const outcomes = await fetchFullTextForTopN(client, [doc], 1, tmpDir);
+
+ expect(fetchDetails).toHaveBeenCalledWith('DOKU1', true);
+ expect(outcomes[0]!.dokId).toBe('DOKU1');
+ });
+ });
+
+ describe('file content', () => {
+ it('writes markdown file with header and content', async () => {
+ const content = 'G'.repeat(FULL_TEXT_MIN_LENGTH + 5);
+ const client = createMockClient(async () => ({
+ text: content,
+ snippet: 'A short summary',
+ }));
+
+ await fetchFullTextForTopN(client, [makeDoc({ dok_id: 'DOC99' })], 1, tmpDir);
+
+ const written = fs.readFileSync(path.join(tmpDir, 'full-text', 'DOC99.md'), 'utf8');
+ expect(written).toContain('# Full Text — DOC99');
+ expect(written).toContain('A short summary');
+ expect(written).toContain(content);
+ // Blank lines must be preserved so the horizontal rule renders correctly
+ expect(written).toContain('\n\n---\n');
+ });
+
+ it('filePath in outcome is relative to outputDir (not CWD)', async () => {
+ const content = 'Z'.repeat(FULL_TEXT_MIN_LENGTH + 5);
+ const client = createMockClient(async () => ({ text: content }));
+
+ const outcomes = await fetchFullTextForTopN(
+ client, [makeDoc({ dok_id: 'RELPATH' })], 1, tmpDir,
+ );
+
+ expect(outcomes[0]!.success).toBe(true);
+ // filePath should be relative to outputDir, not an absolute path or CWD-relative
+ expect(outcomes[0]!.filePath).toBe(path.join('full-text', 'RELPATH.md'));
+ });
+
+ it('sanitizes MP profile text (isPersonProfileText filter)', async () => {
+ const profileText = 'Tjänstgörande riksdagsledamot ' + 'A'.repeat(200);
+ const client = createMockClient(async () => ({
+ text: 'short',
+ fullText: profileText,
+ }));
+
+ const outcomes = await fetchFullTextForTopN(
+ client, [makeDoc({ dok_id: 'PROF_DOC' })], 1, tmpDir,
+ );
+
+ // Profile text is sanitized, falls back to short text which is below threshold
+ expect(outcomes[0]!.success).toBe(false);
+ });
+ });
+});
+
+// ---------------------------------------------------------------------------
+// serializeDataManifest — full-text outcomes section
+// ---------------------------------------------------------------------------
+
+// We import buildWeeklySummaryMarkdown as a smoke-test that the module loads
+// without errors after the changes to download-parliamentary-data.ts.
+import { buildWeeklySummaryMarkdown } from '../scripts/download-parliamentary-data.js';
+
+describe('serializeDataManifest (via buildWeeklySummaryMarkdown sanity)', () => {
+ it('buildWeeklySummaryMarkdown still works after refactor', () => {
+ const md = buildWeeklySummaryMarkdown({
+ weekLabel: '2026-W17',
+ generatedAt: '2026-04-26 06:00 UTC',
+ documentsDownloaded: 42,
+ daysIncluded: 5,
+ dayList: ['2026-04-22', '2026-04-23', '2026-04-24'],
+ });
+ expect(md).toContain('2026-W17');
+ expect(md).toContain('42');
+ });
+});
+
+// We test the manifest full-text section indirectly by checking parseArgs
+// correctly exposes autoFullTextTopN so the caller can pass outcomes to
+// serializeDataManifest. Direct testing of the private serialize function
+// is done via the integration path in the pipeline.
+describe('manifest full-text outcomes integration contract', () => {
+ it('parseArgs exposes autoFullTextTopN=2 when flag is set', () => {
+ const args = parseArgs(['node', 'script.ts', '--auto-full-text-top-n', '2']);
+ expect(args.autoFullTextTopN).toBe(2);
+ });
+
+ it('fetchFullTextForTopN returns outcome with filePath for successful fetch', async () => {
+ const longContent = 'H'.repeat(FULL_TEXT_MIN_LENGTH + 10);
+ const client = createMockClient(async () => ({ text: longContent }));
+
+ const outcomes = await fetchFullTextForTopN(
+ client,
+ [makeDoc({ dok_id: 'HD01FiU48' }), makeDoc({ dok_id: 'HD01CU25' })],
+ 2,
+ tmpDir,
+ );
+
+ for (const o of outcomes) {
+ if (o.success) {
+ expect(o.filePath).toBeDefined();
+ expect(o.chars).toBeGreaterThan(FULL_TEXT_MIN_LENGTH);
+ }
+ }
+ });
+
+ it('analysis-gate can determine if top-2 full texts are available from outcomes', async () => {
+ const longContent = 'I'.repeat(FULL_TEXT_MIN_LENGTH + 1);
+ const client = createMockClient(async () => ({ text: longContent }));
+
+ const docs = [
+ makeDoc({ dok_id: 'TOP1' }),
+ makeDoc({ dok_id: 'TOP2' }),
+ ];
+
+ const outcomes = await fetchFullTextForTopN(client, docs, 2, tmpDir);
+
+ const successCount = outcomes.filter(o => o.success).length;
+ // Gate can check: successCount >= 2 OR fallback annotation present
+ expect(successCount).toBe(2);
+ });
+});
diff --git a/tests/pir-status-contract.test.ts b/tests/pir-status-contract.test.ts
index df453f6156..929f38327e 100644
--- a/tests/pir-status-contract.test.ts
+++ b/tests/pir-status-contract.test.ts
@@ -850,7 +850,10 @@ describe('analysis-gate pir-status.json contract', () => {
});
it('05-analysis-gate.md keeps PIR and supplementary checks sequential', () => {
expect(gate).toContain('# Check 9 — PIR status sidecar');
- expect(gate).toContain('# Check 10 — supplementary artifacts');
+ // Check 10 is now "top-2 full-text availability" (added by --auto-full-text-top-n);
+ // supplementary artifacts shifted to Check 11.
+ expect(gate).toContain('# Check 10 — top-2 full-text availability');
+ expect(gate).toContain('# Check 11 — supplementary artifacts');
});
it('ai-driven-analysis-guide.md references pir-status.json', () => {
expect(guide).toContain('pir-status.json');
diff --git a/tests/statskontoret-feasibility-contract.test.ts b/tests/statskontoret-feasibility-contract.test.ts
index 617b9a4042..a5f099917f 100644
--- a/tests/statskontoret-feasibility-contract.test.ts
+++ b/tests/statskontoret-feasibility-contract.test.ts
@@ -125,8 +125,7 @@ describe('Statskontoret inventory → implementation-feasibility coverage contra
expect(inv.datasets['myndighetsforteckning'].admiralty).toBe('A1');
});
- it('at least one implementation-feasibility.md file in the analysis tree ' +
- 'mentions a known Swedish agency (otherwise the per-file coverage test is vacuous)', () => {
+ it('at least one implementation-feasibility.md file in the analysis tree mentions a known Swedish agency (otherwise the per-file coverage test is vacuous)', () => {
const filesWithMentions = feasibilityFiles.filter((filePath) => {
const content = fs.readFileSync(filePath, 'utf-8');
return extractAgencyMentions(content, KNOWN_AGENCIES).length > 0;
@@ -138,8 +137,7 @@ describe('Statskontoret inventory → implementation-feasibility coverage contra
).toBeGreaterThan(0);
});
- it('every implementation-feasibility.md mentioning a known agency resolves to ' +
- 'a Statskontoret dataset that covers it via myndighetsforteckning', () => {
+ it('every implementation-feasibility.md mentioning a known agency resolves to a Statskontoret dataset that covers it via myndighetsforteckning', () => {
// Since myndighetsforteckning covers ALL Swedish government bodies by
// definition, one dataset entry suffices for all named agencies. This
// test enforces the contract per-file: every file mentioning an agency