Conversation
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: size-xs Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…pipeline Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/c07332ca-b490-4915-9dd9-22c94ee49aea Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…sedAutoFullTextTopN, add comment) Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/c07332ca-b490-4915-9dd9-22c94ee49aea Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
Adds an “auto full-text prefetch” step to the parliamentary data download pipeline so analysis runs can reliably access full text for the top-N documents (via --auto-full-text-top-n) and enforces this via an analysis-gate check.
Changes:
- Add
fetchFullTextForTopN()to fetch/persist full text for the firstNdocuments and return structured outcomes. - Add
--auto-full-text-top-nCLI parsing + manifest section## Full-Text Fetch Outcomes. - Add Analysis Gate Check 10 enforcing “≥2 full texts available” (or a documented fallback), plus a new Vitest suite covering the feature.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/auto-full-text-top-n.test.ts | New test coverage for CLI parsing and top-N full-text fetch/persist behavior. |
| scripts/parliamentary-data/data-downloader.ts | Introduces fetchFullTextForTopN() and FullTextFetchOutcome type; writes full text artifacts under full-text/. |
| scripts/download-parliamentary-data.ts | Adds CLI flag, runs auto-fetch step, and writes outcomes into data-download-manifest.md. |
| analysis/methodologies/ai-driven-analysis-guide.md | Updates methodology guidance to recommend --auto-full-text-top-n 2 for L2/L3 runs. |
| .github/prompts/05-analysis-gate.md | Adds new blocking gate check for top-2 full-text availability when outcomes table is present. |
| import fs from 'node:fs'; | ||
| import path from 'node:path'; | ||
|
|
There was a problem hiding this comment.
This module header says it is "side-effect-free with respect to the filesystem", but this file now imports fs/path and implements a function that writes to disk. Please update the module-level documentation (or move the filesystem-writing helper into a different module) so callers aren’t misled about side effects.
| * "tried but only metadata returned". | ||
| * | ||
| * The function is intentionally *not* side-effect-free: it writes files to | ||
| * `outputDir/full-text/`. The caller is responsible for creating `outputDir`. |
There was a problem hiding this comment.
The docstring says the caller is responsible for creating outputDir, but the implementation calls fs.mkdirSync(fullTextDir, { recursive: true }), which will also create missing parent directories. Either adjust the wording to match the actual behavior or change the implementation to require a pre-existing outputDir.
| * `outputDir/full-text/`. The caller is responsible for creating `outputDir`. | |
| * `outputDir/full-text/` and creates that directory as needed, including any | |
| * missing parent directories. |
| ].filter(Boolean).join('\n'); | ||
| fs.writeFileSync(filePath, header + content, 'utf8'); |
There was a problem hiding this comment.
header is built with .filter(Boolean), which removes the intentional blank lines and can cause the persisted file to lose Markdown structure. It also means header + content can join without a separating newline (e.g., ---<p>...), breaking the horizontal rule and readability. Preserve blank lines (don’t filter them out) and ensure there’s at least one \n between the header block and content when writing.
| ].filter(Boolean).join('\n'); | |
| fs.writeFileSync(filePath, header + content, 'utf8'); | |
| ].join('\n'); | |
| const headerBlock = header.endsWith('\n') ? header : `${header}\n`; | |
| fs.writeFileSync(filePath, headerBlock + content, 'utf8'); |
| outcome = { | ||
| dokId, | ||
| success: true, | ||
| chars: content.length, | ||
| filePath: path.relative(process.cwd(), filePath), |
There was a problem hiding this comment.
filePath is derived via path.relative(process.cwd(), filePath), which makes the manifest output dependent on the caller’s current working directory (and can vary between local runs/CI). Prefer a stable base such as outputDir or REPO_ROOT when computing the relative path so the persisted manifest points to the right location consistently.
| outcome = { | |
| dokId, | |
| success: true, | |
| chars: content.length, | |
| filePath: path.relative(process.cwd(), filePath), | |
| const manifestBaseDir = path.dirname(fullTextDir); | |
| outcome = { | |
| dokId, | |
| success: true, | |
| chars: content.length, | |
| filePath: path.relative(manifestBaseDir, filePath), |
| // --auto-full-text-top-n: Number of top documents to auto-fetch full text for. | ||
| // Defaults to 0 (disabled). Use 2 to guarantee the top-2 DIW-ranked documents | ||
| // have full text available for deep-analysis tiers (L2/L3). |
There was a problem hiding this comment.
The comment mentions "top-2 DIW-ranked documents", but this script doesn’t compute or apply any DIW ranking before passing allDocs into fetchFullTextForTopN (it uses the current array order after filtering). Please reword to avoid implying a ranking that isn’t present, or explicitly document what ordering determines the "top" documents here.
| // --auto-full-text-top-n: Number of top documents to auto-fetch full text for. | |
| // Defaults to 0 (disabled). Use 2 to guarantee the top-2 DIW-ranked documents | |
| // have full text available for deep-analysis tiers (L2/L3). | |
| // --auto-full-text-top-n: Number of documents to auto-fetch full text for. | |
| // Defaults to 0 (disabled). When enabled, the script fetches full text for the | |
| // first N documents in the current filtered array order for deep-analysis tiers (L2/L3). |
|
@copilot resolve the merge conflicts in this pull request |
…ble filePath, comment accuracy) Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/c2ed7611-af52-4af6-9563-cc59a848957f Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Merge conflicts are resolved (commit
|
|
@copilot resolve the merge conflicts in this pull request |
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/cee684f4-8a98-433b-a17a-89916445bb20 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/cee684f4-8a98-433b-a17a-89916445bb20 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Merge conflicts resolved in |
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Merge conflicts resolved in |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
|
@copilot analyse and improve and adress issues in TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request) |
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/323444a6-2c5b-4338-b133-d78b0f8281bf Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Fixed in |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
scripts/download-parliamentary-data.ts:165
--auto-full-text-top-ncurrently accepts1, but the updated analysis gate Check 10 requires at least 2 successful full-text rows whenever the outcomes table is present. WithtopN=1, the script will emit the table and the gate becomes impossible to satisfy without manually editing the manifest. Consider rejecting values between 1 and 1 (require 0 or ≥2), auto-bumping 1→2, or emitting an automatic<!-- full-text-fallback: ... -->annotation whentopN < 2.
// --auto-full-text-top-n: Override the per-type full-text enrichment limit and
// persist full text outcomes for the first N documents in the current filtered
// array order. Defaults to null when omitted so downloadAllDocuments uses
// MAX_ENRICHMENT_PER_TYPE; explicit 0 disables per-type enrichment and
// persisted full-text fetching. No DIW significance ranking is applied here.
const autoFullTextTopNArg = get('--auto-full-text-top-n');
let autoFullTextTopN: number | null = null;
if (autoFullTextTopNArg !== null) {
const parsed = Number(autoFullTextTopNArg);
if (!Number.isInteger(parsed) || parsed < 0) {
throw new Error(`Invalid --auto-full-text-top-n value: ${autoFullTextTopNArg}. Expected a non-negative integer.`);
}
autoFullTextTopN = parsed;
}
| [ "${FT_SUCCESS:-0}" -ge 2 ] \ | ||
| || { echo "❌ data-download-manifest.md: Full-Text Fetch Outcomes table present but fewer than 2 top documents have full_text_available=true (found ${FT_SUCCESS:-0}). Add <!-- full-text-fallback: <reason> --> to the manifest to bypass."; FAIL=1; } | ||
| fi | ||
| fi |
There was a problem hiding this comment.
The new Check 10 bash snippet opens an outer if [ -s "$ANALYSIS_DIR/data-download-manifest.md" ]; then but never closes it with a matching fi, so the example gate script is syntactically invalid if copied/run as-is. Add the missing fi before the final [ "$FAIL" -eq 0 ] || exit 1 line.
| fi | |
| fi | |
| fi |
| for (const { dokId } of candidates) { | ||
| let outcome: FullTextFetchOutcome; | ||
| try { | ||
| const details = await client.fetchDocumentDetails(dokId, true) as Record<string, unknown>; | ||
| const str = (v: unknown): string => (typeof v === 'string' ? v : ''); | ||
| const sanitize = (v: unknown): string => { | ||
| const s = str(v).trim(); | ||
| return isPersonProfileText(s) ? '' : s; | ||
| }; | ||
|
|
||
| const rawText = str(details['text']).trim(); | ||
| // fullText may contain MP profile/deceased-notice text — sanitize it. | ||
| // text and html fields are structural content from the Riksdag dump and | ||
| // do not contain person-profile text, so str().trim() is sufficient. | ||
| const rawFullText = sanitize(details['fullText']); | ||
| const rawHtml = str(details['html']).trim(); | ||
|
|
||
| // Prefer MCP 'text' field (embedded HTML dump), fall back to fullText then html. | ||
| const content = rawText.length > FULL_TEXT_MIN_LENGTH | ||
| ? rawText | ||
| : rawFullText.length > FULL_TEXT_MIN_LENGTH | ||
| ? rawFullText | ||
| : rawHtml; | ||
|
|
||
| if (content.length > FULL_TEXT_MIN_LENGTH) { | ||
| const filenameSafeDokId = dokId.replace(/[^A-Za-z0-9_-]/g, '_'); | ||
| const filePath = path.join(fullTextDir, `${filenameSafeDokId}.md`); | ||
| const snippet = sanitize(details['snippet']) || sanitize(details['summary']) || ''; |
There was a problem hiding this comment.
fetchFullTextForTopN always calls client.fetchDocumentDetails(dokId, true) for each candidate, even though the caller (downloadAllDocuments) already enriches top documents via the same MCP call and stores fullContent/fullText on the existing RawDocument objects. Consider reusing already-enriched fields (and only calling MCP when missing/too short) to avoid duplicate network calls and reduce MCP load/rate-limit risk; note you already keep { dokId, doc } but currently don’t use doc.
| for (const { dokId } of candidates) { | |
| let outcome: FullTextFetchOutcome; | |
| try { | |
| const details = await client.fetchDocumentDetails(dokId, true) as Record<string, unknown>; | |
| const str = (v: unknown): string => (typeof v === 'string' ? v : ''); | |
| const sanitize = (v: unknown): string => { | |
| const s = str(v).trim(); | |
| return isPersonProfileText(s) ? '' : s; | |
| }; | |
| const rawText = str(details['text']).trim(); | |
| // fullText may contain MP profile/deceased-notice text — sanitize it. | |
| // text and html fields are structural content from the Riksdag dump and | |
| // do not contain person-profile text, so str().trim() is sufficient. | |
| const rawFullText = sanitize(details['fullText']); | |
| const rawHtml = str(details['html']).trim(); | |
| // Prefer MCP 'text' field (embedded HTML dump), fall back to fullText then html. | |
| const content = rawText.length > FULL_TEXT_MIN_LENGTH | |
| ? rawText | |
| : rawFullText.length > FULL_TEXT_MIN_LENGTH | |
| ? rawFullText | |
| : rawHtml; | |
| if (content.length > FULL_TEXT_MIN_LENGTH) { | |
| const filenameSafeDokId = dokId.replace(/[^A-Za-z0-9_-]/g, '_'); | |
| const filePath = path.join(fullTextDir, `${filenameSafeDokId}.md`); | |
| const snippet = sanitize(details['snippet']) || sanitize(details['summary']) || ''; | |
| for (const { dokId, doc } of candidates) { | |
| let outcome: FullTextFetchOutcome; | |
| try { | |
| const str = (v: unknown): string => (typeof v === 'string' ? v : ''); | |
| const sanitize = (v: unknown): string => { | |
| const s = str(v).trim(); | |
| return isPersonProfileText(s) ? '' : s; | |
| }; | |
| const selectContent = (source: Record<string, unknown>): string => { | |
| const rawText = str(source['text']).trim(); | |
| const rawFullContent = sanitize(source['fullContent']); | |
| // fullText may contain MP profile/deceased-notice text — sanitize it. | |
| // text and html fields are structural content from the Riksdag dump and | |
| // do not contain person-profile text, so str().trim() is sufficient. | |
| const rawFullText = sanitize(source['fullText']); | |
| const rawHtml = str(source['html']).trim(); | |
| // Prefer normalized/enriched content already present on the document, | |
| // then fall back to the raw HTML dump fields. | |
| return rawText.length > FULL_TEXT_MIN_LENGTH | |
| ? rawText | |
| : rawFullContent.length > FULL_TEXT_MIN_LENGTH | |
| ? rawFullContent | |
| : rawFullText.length > FULL_TEXT_MIN_LENGTH | |
| ? rawFullText | |
| : rawHtml; | |
| }; | |
| const docRecord = doc as Record<string, unknown>; | |
| let details: Record<string, unknown> | null = null; | |
| let content = selectContent(docRecord); | |
| if (content.length <= FULL_TEXT_MIN_LENGTH) { | |
| details = await client.fetchDocumentDetails(dokId, true) as Record<string, unknown>; | |
| content = selectContent(details); | |
| } | |
| if (content.length > FULL_TEXT_MIN_LENGTH) { | |
| const filenameSafeDokId = dokId.replace(/[^A-Za-z0-9_-]/g, '_'); | |
| const filePath = path.join(fullTextDir, `${filenameSafeDokId}.md`); | |
| const snippet = sanitize(docRecord['snippet']) | |
| || sanitize(docRecord['summary']) | |
| || sanitize(details?.['snippet']) | |
| || sanitize(details?.['summary']) | |
| || ''; |
| outcome = { | ||
| dokId, | ||
| success: true, | ||
| chars: content.length, | ||
| filePath: path.relative(outputDir, filePath), |
There was a problem hiding this comment.
filePath is recorded via path.relative(outputDir, filePath), which will emit backslashes on Windows and makes manifest output non-deterministic across OSes. Elsewhere in the repo, relative paths are normalized to POSIX separators (.split(path.sep).join('/')) for stable, byte-identical artifacts; consider doing the same here before storing filePath.
| outcome = { | |
| dokId, | |
| success: true, | |
| chars: content.length, | |
| filePath: path.relative(outputDir, filePath), | |
| // Normalize separators to POSIX form for byte-identical artifacts across OSes. | |
| outcome = { | |
| dokId, | |
| success: true, | |
| chars: content.length, | |
| filePath: path.relative(outputDir, filePath).split(path.sep).join('/'), |
|
@copilot apply changes based on the comments in this thread |
…POSIX path) Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/7b65de24-f1f7-4553-8cbb-1cd800314bae Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied all three suggestions in
Validation: |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
--auto-full-text-top-n=NCLI flag toscripts/download-parliamentary-data.tsfetchFullTextForTopNfromscripts/parliamentary-data/data-downloader.tsserializeDataManifestto record full-text fetch outcomesrunPreArticleAnalysis(Step 2b)05-analysis-gate.md) with Check 10 (full-text) and renumbered Check 11 (supplementary artifacts)analysis/methodologies/ai-driven-analysis-guide.md(v6.7)origin/mainwith true merge commitef34de7249920739464180780088:fiin Check 10 bash snippet (05-analysis-gate.md)fullContent/fullText/text/htmlfields onRawDocumentbefore issuing a duplicate MCP callfilePathseparators to POSIX (split(path.sep).join('/')) for byte-identical artifacts across OSesnpx vitest run tests/auto-full-text-top-n.test.ts tests/pir-status-contract.test.ts— 108 passednpm run build:lib— passed4326877830