Hack23 · Copilot · Apr 22, 2026 · Apr 22, 2026
diff --git a/.github/prompts/00-base-contract.md b/.github/prompts/00-base-contract.md
@@ -57,15 +57,15 @@ Valuable analysis must never be lost. After each pipeline phase completes, snaps
 | 04 Analysis Pass 1 | `phase-04-pass1` | `$ANALYSIS_DIR` top-level artifacts |
 | 04 Analysis Pass 2 | `phase-04-pass2` | `$ANALYSIS_DIR` top-level artifacts |
 | 05 Gate pass | `phase-05-gate` | `$ANALYSIS_DIR` top-level artifacts |
-| 06 Article generated | `phase-06-article` | `$ANALYSIS_DIR` + today's `news/${ARTICLE_DATE}-*.html` |
-| 07 Immediately before `create_pull_request` | `phase-07-final` | `$ANALYSIS_DIR` + articles from `news/${ARTICLE_DATE}-*.html` |
-| `news-translate` per batch | `phase-translate-<lang>` | Translated `news/${ARTICLE_DATE}-*.html` |
+| 06 Article generated | `phase-06-article` | `$ANALYSIS_DIR` + **this-run** article HTML candidates from `news/${ARTICLE_DATE}-*.html` (size/file-count constrained) |
+| 07 Immediately before `create_pull_request` | `phase-07-final` | `$ANALYSIS_DIR` + **this-run** article HTML candidates from `news/${ARTICLE_DATE}-*.html` (size/file-count constrained) |
+| `news-translate` per batch | `phase-translate-<lang>` | **This-run** translated `news/${ARTICLE_DATE}-*.html` candidates (size/file-count constrained) |
 
 Each checkpoint is mandatory. Skipping them forfeits the only cross-run safety net for analysis work.
 
 ### Reusable snippet
 
-Run this bash block at the end of every phase (pass the phase label as `$1`). Article HTML is written directly under the flat `news/` directory, so checkpoint copies must use `news/${ARTICLE_DATE}-*.html` rather than `news/$YYYY/$MM/$DD/*.html`:
+Run this bash block at the end of every phase (pass the phase label as `$1`). Article HTML is written directly under the flat `news/` directory, so checkpoint discovery uses `news/${ARTICLE_DATE}-*.html` rather than `news/$YYYY/$MM/$DD/*.html`. To keep repo-memory pushes reliable, only files changed in the **current run** are eligible, and per-file/count limits are enforced before copy (`GH_AW_MEMORY_MAX_FILE_SIZE`, `GH_AW_MEMORY_MAX_FILE_COUNT`; defaults 51200 bytes / 50 files):
 
 ```bash
 set -Eeuo pipefail
@@ -75,16 +75,38 @@ set -Eeuo pipefail
 PHASE="${1:?phase label required, e.g. phase-04-pass1}"
 ANALYSIS_DIR="${ANALYSIS_DIR:-analysis/daily/$ARTICLE_DATE/$SUBFOLDER}"
 DEST="$GH_AW_MEMORY_DIR/$ARTICLE_DATE/$SUBFOLDER/$PHASE"
+MAX_MEMORY_FILE_SIZE="${GH_AW_MEMORY_MAX_FILE_SIZE:-51200}"
+MAX_MEMORY_FILE_COUNT="${GH_AW_MEMORY_MAX_FILE_COUNT:-50}"
 mkdir -p "$DEST" 2>/dev/null || { echo "[checkpoint] mkdir failed for $DEST — continuing"; exit 0; }
+
+copy_into_checkpoint() {
+  src="$1"
+  [ -f "$src" ] || return 0
+  current_count="$(find "$DEST" -maxdepth 1 -type f 2>/dev/null | wc -l | tr -d ' ')"
+  [ "${current_count:-0}" -lt "$MAX_MEMORY_FILE_COUNT" ] || { echo "[checkpoint] skip (file-count cap reached): $src"; return 0; }
+  size="$(wc -c < "$src" 2>/dev/null | tr -d ' ')"
+  [ "${size:-0}" -le "$MAX_MEMORY_FILE_SIZE" ] || { echo "[checkpoint] skip (too large ${size}B>${MAX_MEMORY_FILE_SIZE}B): $src"; return 0; }
+  cp -f "$src" "$DEST"/ 2>/dev/null || true
+}
+
 # Snapshot top-level analysis artifacts (never documents/ — often 100+ files — and never pass1/).
 if [ -d "$ANALYSIS_DIR" ]; then
-  find "$ANALYSIS_DIR" -maxdepth 1 -type f \( -name '*.md' -o -name '*.json' \) \
-    -exec cp -f {} "$DEST"/ \; 2>/dev/null || true
+  while IFS= read -r -d '' f; do
+    copy_into_checkpoint "$f"
+  done < <(find "$ANALYSIS_DIR" -maxdepth 1 -type f \( -name '*.md' -o -name '*.json' \) -print0 2>/dev/null)
 fi
-# Snapshot today's produced article HTML from the flat news/ directory (if any exists at this phase).
+# Snapshot only this-run article HTML candidates from the flat news/ directory (if any exists at this phase).
 if [ -d "news" ]; then
-  find "news" -maxdepth 1 -type f -name "${ARTICLE_DATE}-*.html" \
-    -exec cp -f {} "$DEST"/ \; 2>/dev/null || true
+  while IFS= read -r path; do
+    [ -n "$path" ] || continue
+    copy_into_checkpoint "$path"
+  done < <(
+    {
+      # AMCR = Added/Modified/Copied/Renamed in this run; prevents sweeping in untouched historical files.
+      git --no-pager diff --name-only --diff-filter=AMCR -- 'news/*.html'
+      git --no-pager ls-files --others --exclude-standard -- 'news/*.html'
+    } | sort -u | grep "^news/${ARTICLE_DATE}-.*\\.html$" || true
+  )
 fi
 COUNT="$(find "$DEST" -maxdepth 1 -type f 2>/dev/null | wc -l | tr -d ' ')"
 echo "[checkpoint] $PHASE → $DEST ($COUNT files)"
@@ -97,8 +119,9 @@ exit 0
 |------|-----------|
 | **Never block on checkpoint failure** — always `exit 0`. | Repo-memory is a safety net, not a gate. |
 | Do **not** copy `$ANALYSIS_DIR/documents/` or `$ANALYSIS_DIR/pass1/`. | `documents/` exceeds the 50-file push cap; `pass1/` is local gate evidence only. |
+| Copy article HTML to repo-memory only when it was created/modified in the current run **and** within the configured file-size cap. | Prevents unrelated or oversized historical article files from breaking `push_repo_memory`. |
 | Do **not** stage or commit anything under `$GH_AW_MEMORY_DIR`. | gh-aw's `push_repo_memory` post-job publishes it; see `07-commit-and-pr.md`. |
-| Prefer small summary `.md` / `.json` files (≤ 50 KB each, ≤ 50 per push). | gh-aw silently drops files exceeding the push caps. |
+| Prefer small summary `.md` / `.json` files (defaults: ≤ 50 KB each, ≤ 50 per push; override via `GH_AW_MEMORY_MAX_FILE_SIZE` / `GH_AW_MEMORY_MAX_FILE_COUNT`). | gh-aw silently drops files exceeding the push caps. |
 | Re-run the snippet at every phase, even if earlier phases already snapshotted — it overwrites with the latest content. | Ensures the final state is always preserved, and earlier snapshots remain on the branch from prior runs. |
 | For `news-translate`, use `SUBFOLDER=batch/<lang-or-batch-id>` so memory paths don't collide with analysis runs. | Keeps the branch organised by article type. |
 

diff --git a/.github/workflows/README.md b/.github/workflows/README.md
@@ -162,6 +162,24 @@ Every workflow in this directory implements defence-in-depth — see [`WORKFLOWS
 
 ---
 
+## 🧯 Recovery for expired/long-running news sessions
+
+When a news run ends early (for example, session expiry or runner interruption), analysis is still produced in two places:
+
+1. **Working tree output during the run**: `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/`
+2. **Repo-memory safety snapshots**: `$GH_AW_MEMORY_DIR/$ARTICLE_DATE/$SUBFOLDER/phase-*` pushed by the post-job to branch `memory/news-generation`
+
+If a run fails after analysis but before a PR is created:
+
+1. Download the `repo-memory-default` artifact from the failed run (GitHub UI, or `gh run download <run-id> -n repo-memory-default`).
+2. Inspect `phase-05-gate` and `phase-07-final` under `$ARTICLE_DATE/$SUBFOLDER/` for the latest complete analysis files.
+3. Restore those files into `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/` in a new branch.
+4. Commit restored analysis and create a single PR (analysis-only if article HTML is incomplete), following [`.github/prompts/07-commit-and-pr.md`](../prompts/07-commit-and-pr.md).
+
+The phase checkpoint contract in [`.github/prompts/00-base-contract.md`](../prompts/00-base-contract.md) enforces size/file-count-safe snapshots so `push_repo_memory` remains reliable across long runs.
+
+---
+
 ## 🧭 Where to go next
 
 | I want to… | Read |