You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Stage analysis → Commit → ONE create_pull_request (analysis-only)
44
48
```
45
49
46
-
No step may be skipped, reordered, or executed in parallel with its successor.
47
-
48
-
## Phase checkpoint — persist every phase to repo memory
49
-
50
-
Valuable analysis must never be lost. After each pipeline phase completes, snapshot its output to the gh-aw repo-memory mount at `$GH_AW_MEMORY_DIR` (runtime default `/tmp/gh-aw/repo-memory/default`). gh-aw pushes that directory to the `memory/news-generation` branch in a **separate post-job** — so checkpoints survive even if the content PR job fails, crashes, or times out.
51
-
52
-
### Mandatory checkpoint points
53
-
54
-
| After phase | Phase label | Source(s) |
55
-
|-------------|-------------|-----------|
56
-
| 03 Data download |`phase-03-download`|`$ANALYSIS_DIR` (manifest + fetched data summaries) |
| 07 Immediately before `create_pull_request`|`phase-07-final`|`$ANALYSIS_DIR` + articles from `news/${ARTICLE_DATE}-*.html`|
62
-
|`news-translate` per batch |`phase-translate-<lang>`| Translated `news/${ARTICLE_DATE}-*.html`|
63
-
64
-
Each checkpoint is mandatory. Skipping them forfeits the only cross-run safety net for analysis work.
65
-
66
-
### Reusable snippet
67
-
68
-
Run this bash block at the end of every phase (pass the phase label as `$1`). Article HTML is written directly under the flat `news/` directory, so checkpoint copies must use `news/${ARTICLE_DATE}-*.html` rather than `news/$YYYY/$MM/$DD/*.html`:
MCP pre-warm → Detect existing analysis → Read all artifacts into context →
53
+
Optionally check for new data → Article Pass 1 → Article Pass 2 →
54
+
Stage articles → Commit → ONE create_pull_request (articles)
92
55
```
93
56
94
-
### Checkpoint rules
57
+
No step may be skipped within a run. Runs must not overlap for the same `$ARTICLE_DATE` + `$SUBFOLDER`.
58
+
59
+
Same-day re-runs always use the same `$ANALYSIS_DIR` folder — never create a parallel folder for the same date + type combination unless `force_generation=true`.
60
+
61
+
## Session keepalive requirement
62
+
63
+
> ⚠️ **Critical**: The Copilot API creates a server-side session when the agent starts. That session is bound to the `github.token` baked in at step start — it is **never refreshed** mid-run. The session expires at approximately **60 minutes** (gh-aw issue #24920). After expiry, all tool calls and inference requests fail silently. The workflow appears to run but makes zero progress, and **the PR is never created**.
64
+
65
+
To mitigate MCP idle-connection drops, workflows set `sandbox.mcp.keepalive-interval: 300` (5-minute ping). This keeps MCP connections alive but does **not** refresh the Copilot API token.
66
+
67
+
**The reliable mitigation is to ensure `safeoutputs___create_pull_request` is called well before the session approaches expiry.** Plan the run so the PR is created before the agent passes ~45 minutes of work — that leaves ~10 minutes of safety margin on the 55-minute `timeout-minutes` cap and ~15 minutes on the ~60-minute token window for staging and safe-outputs publishing. See `07-commit-and-pr.md §Deadline enforcement` for the mandatory PR-timing procedure.
95
68
96
-
| Rule | Rationale |
97
-
|------|-----------|
98
-
|**Never block on checkpoint failure** — always `exit 0`. | Repo-memory is a safety net, not a gate. |
99
-
| Do **not** copy `$ANALYSIS_DIR/documents/` or `$ANALYSIS_DIR/pass1/`. |`documents/` exceeds the 50-file push cap; `pass1/` is local gate evidence only. |
100
-
| Do **not** stage or commit anything under `$GH_AW_MEMORY_DIR`. | gh-aw's `push_repo_memory` post-job publishes it; see `07-commit-and-pr.md`. |
101
-
| Prefer small summary `.md` / `.json` files (≤ 50 KB each, ≤ 50 per push). | gh-aw silently drops files exceeding the push caps. |
102
-
| Re-run the snippet at every phase, even if earlier phases already snapshotted — it overwrites with the latest content. | Ensures the final state is always preserved, and earlier snapshots remain on the branch from prior runs. |
103
-
| For `news-translate`, use `SUBFOLDER=batch/<lang-or-batch-id>` so memory paths don't collide with analysis runs. | Keeps the branch organised by article type. |
69
+
Do not add per-phase checkpoint PRs or repo-memory push steps.
|`github`| HTTP (Copilot MCP) | workflow `tools.github`| standard | full GitHub MCP toolset |
15
-
|`repo-memory`| local helper | workflow `tools.repo-memory`| standard | persistent cross-run memory on `memory/news-generation`|
16
15
|`bash`| local helper | workflow `tools.bash`| standard | shell execution |
17
16
|`safeoutputs`| runner | always available |`snake_case`|`safeoutputs___create_pull_request`, `safeoutputs___noop`, `safeoutputs___dispatch_workflow`|
18
17
@@ -42,4 +41,4 @@ Run once at workflow start, then proceed — do not loop forever.
42
41
43
42
## Pre-warm step (CI job, not prompt)
44
43
45
-
Every news workflow declares a **single**`curl`-based pre-warm step with ≤ 6 retries, ≤ 20 s apart. With `curl --max-time 30`, the worst-case runtime can exceed 4 minutes, so this is a best-effort pre-warm rather than a hard ≤ 2 minute guarantee. If a strict 2 minute cap is required, the workflow's `curl` timeout and/or retry policy must be reduced accordingly. No background pingers. The `safeoutputs`session is kept alive by completing work inside its ~30-minute idle window, not by opening interim PRs.
44
+
Every news workflow declares a **single**`curl`-based pre-warm step with ≤ 6 retries, ≤ 20 s apart. With `curl --max-time 30`, the worst-case runtime can exceed 4 minutes, so this is a best-effort pre-warm rather than a hard ≤ 2 minute guarantee. If a strict 2 minute cap is required, the workflow's `curl` timeout and/or retry policy must be reduced accordingly. No background pingers. MCP session longevity is maintained via `sandbox.mcp.keepalive-interval: 300`.
|`false`|**Analysis mode**| Continue with download pipeline below → `04-analysis-pipeline.md` → analysis-only PR (see `07-commit-and-pr.md`). Do **not** generate articles in this run. |
37
+
|`true`|**Article mode**| Skip the entire download pipeline and `04-analysis-pipeline.md`. Proceed directly to `06-article-generation.md`. Optionally re-query the API and compare against `data-download-manifest.md`; add only genuinely new `dok_id` entries found since the analysis ran. |
38
+
39
+
> **Folder reuse rule**: the same `$ANALYSIS_DIR` is always reused across runs for the same `$ARTICLE_DATE` + `$SUBFOLDER` when `force_generation=false`. The legacy auto-suffix behaviour (`propositions-2`, `propositions-3`, …) is retained **only** as an explicit escape hatch when `force_generation=true`, so that a forced rerun on a merged day can produce a fresh parallel analysis without trampling the existing one.
40
+
3
41
## Goal
4
42
5
43
Populate `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/` with raw Riksdag/Regering data and a provenance manifest **before** any analysis starts.
@@ -20,7 +58,7 @@ Populate `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/` with raw Riksdag/Regering da
If the base subfolder already contains `synthesis-summary.md` from a prior merged run**and**`force_generation=false`, auto-suffix: `propositions-2`, `propositions-3`, …
61
+
If `force_generation=true` is supplied on a day whose base subfolder already contains `synthesis-summary.md` from a prior merged run, auto-suffix the subfolder (`propositions-2`, `propositions-3`, …) so the forced rerun does not overwrite the merged analysis. Under the default `force_generation=false`, the same base subfolder is reused across runs — see §Pre-flight above.
Copy file name to clipboardExpand all lines: .github/prompts/04-analysis-pipeline.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,6 +36,8 @@ Plus `documents/` subfolder with **one `{dok_id}-analysis.md` file per `dok_id`*
36
36
37
37
## Execution order
38
38
39
+
> **Fast-path**: If `SKIP_ANALYSIS=true` (set by `03-data-download.md §Pre-flight`), skip all steps 1–5 below and proceed directly to `06-article-generation.md`. The full analysis already exists on disk from a prior run — do not re-run downloads, Pass 1, Pass 2, or the gate.
40
+
39
41
1.**Read all 6 methodologies first** (one tool call per file, do not skip).
40
42
2.**Read all 8 templates first.**
41
43
3.**Pass 1 — Create** all 9 artifacts + every per-document file. Minimum 15 minutes of real work.
|**Analysis mode** (`SKIP_ANALYSIS=false`) |`analysis/daily/$ARTICLE_DATE/$SUBFOLDER/*.md` + `*.json` (never `pass1/`) |`📊 Analysis — `|`analysis-only` + article-type |**Stop.** Do NOT generate articles. The next scheduled run will detect the analysis and enter Article mode automatically. |
In **Analysis mode**: commit analysis artifacts, create the `analysis-only` PR, then exit. Zero articles are generated in this run. The analysis stays in the `$ANALYSIS_DIR` folder; the next run of this workflow for the same `$ARTICLE_DATE` will find it and proceed directly to articles.
21
+
22
+
In **Article mode**: generate articles from existing analysis, commit, and create the articles PR.
23
+
13
24
## Stage → commit → PR
14
25
15
26
1.**Stage scoped files only.** Never stage the whole repo.
@@ -21,8 +32,6 @@ Workflows declare `safe-outputs.create-pull-request.max: 1`. Attempting a second
Repo-memory persistence is handled separately by `tools.repo-memory` and pushed to the `memory/news-generation` branch by the safe-outputs runner job. **Do not** create, stage, or commit any `memory/news-generation/*.json` files in the content PR — there is no `memory/` directory in the working tree of `main`.
25
-
26
35
Never stage `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/documents/` wholesale — it often contains 100+ files. Stage only `documents/*.md`**if** your `documents/` stays under the safe-outputs 100-file cap; otherwise stage only summary files. Never stage `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/pass1/` — it is a local gate-evidence snapshot (see `04-analysis-pipeline.md`), not a deliverable.
27
36
28
37
2.**100-file guard.** Before calling safeoutputs, count staged files. If the count > 99, unstage everything under `documents/` except `synthesis-summary.md` and re-check.
In every other case, commit whatever exists and call `create_pull_request` once.
91
100
92
-
## Final checkpoint — before the PR call
101
+
## Deadline enforcement
93
102
94
-
Immediately before calling `safeoutputs___create_pull_request`, run the **phase checkpoint**from `00-base-contract.md` with label `phase-07-final`. This snapshots the final authoritative analysis + article state to repo memory, so even if the PR call, the safe-outputs runner, or the post-job push fails, the last good state survives on the `memory/news-generation` branch.
103
+
> **Root cause**: The Copilot API session is bound to the `github.token` baked in at step start. That token expires at approximately **60 minutes**and is never refreshed mid-run (gh-aw issue #24920). Every tool call and inference request fails silently after that point — the agent appears to run but makes no progress and the PR is never created. Setup steps consume ~5 minutes, so the agent has at most **~55 minutes** of usable session time, and safe-outputs publishing needs several minutes on top.
95
104
96
-
For `news-translate`, run the checkpoint with label `phase-translate-<lang>` after each per-language batch succeeds (before the final PR call), so individual language translations are preserved even if later languages fail.
105
+
The target PR-creation window depends on which mode the run is in (see `03-data-download.md §Pre-flight`):
97
106
98
-
## Deadline enforcement
107
+
| Mode | Target PR window | Hard deadline |
108
+
|------|------------------|---------------|
109
+
| Run 1 — Analysis | 40–45 min after agent start |**48 min**|
110
+
| Run 2 — Articles | 20–25 min after agent start |**30 min**|
99
111
100
-
If the run exceeds 40 minutes with no safe-output call yet:
112
+
**If the run exceeds its hard deadline with no safe-output call yet:**
101
113
102
114
1. Stop analysis / article work immediately.
103
-
2. Stage whatever exists on disk.
104
-
3. Commit.
115
+
2. Stage whatever exists on disk (analysis artifacts and/or partial articles).
116
+
3. Commit with message including `[early-pr]` to signal partial content.
105
117
4. Call `safeoutputs___create_pull_request` with label `analysis-only` if articles are incomplete.
106
118
107
-
Do not attempt to "save" work via a second PR — there is no second PR.
119
+
Do not attempt to "save" work via a second PR — there is no second PR. Creating the PR early is always better than losing all work to a token expiry. The hard deadlines above leave ~7 minutes of margin on the 55-minute `timeout-minutes` cap for staging and safe-outputs publishing before the ~60-minute Copilot API token expiry.
0 commit comments