Skip to content

Commit 63b2d12

Browse files
authored
Merge pull request #1933 from Hack23/copilot/fix-agentic-workflows
Document sandbox-blocked shell expansion patterns in prompt shell-safety guide
2 parents 33f3aff + e72f9bd commit 63b2d12

7 files changed

Lines changed: 86 additions & 4 deletions

.github/prompts/00-base-contract.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,63 @@ Analysis Gate → Article (if applicable) → Stage → Commit → ONE create_pu
4545

4646
No step may be skipped, reordered, or executed in parallel with its successor.
4747

48+
## Phase checkpoint — persist every phase to repo memory
49+
50+
Valuable analysis must never be lost. After each pipeline phase completes, snapshot its output to the gh-aw repo-memory mount at `$GH_AW_MEMORY_DIR` (runtime default `/tmp/gh-aw/repo-memory/default`). gh-aw pushes that directory to the `memory/news-generation` branch in a **separate post-job** — so checkpoints survive even if the content PR job fails, crashes, or times out.
51+
52+
### Mandatory checkpoint points
53+
54+
| After phase | Phase label | Source(s) |
55+
|-------------|-------------|-----------|
56+
| 03 Data download | `phase-03-download` | `$ANALYSIS_DIR` (manifest + fetched data summaries) |
57+
| 04 Analysis Pass 1 | `phase-04-pass1` | `$ANALYSIS_DIR` top-level artifacts |
58+
| 04 Analysis Pass 2 | `phase-04-pass2` | `$ANALYSIS_DIR` top-level artifacts |
59+
| 05 Gate pass | `phase-05-gate` | `$ANALYSIS_DIR` top-level artifacts |
60+
| 06 Article generated | `phase-06-article` | `$ANALYSIS_DIR` + today's `news/${ARTICLE_DATE}-*.html` |
61+
| 07 Immediately before `create_pull_request` | `phase-07-final` | `$ANALYSIS_DIR` + articles from `news/${ARTICLE_DATE}-*.html` |
62+
| `news-translate` per batch | `phase-translate-<lang>` | Translated `news/${ARTICLE_DATE}-*.html` |
63+
64+
Each checkpoint is mandatory. Skipping them forfeits the only cross-run safety net for analysis work.
65+
66+
### Reusable snippet
67+
68+
Run this bash block at the end of every phase (pass the phase label as `$1`). Article HTML is written directly under the flat `news/` directory, so checkpoint copies must use `news/${ARTICLE_DATE}-*.html` rather than `news/$YYYY/$MM/$DD/*.html`:
69+
70+
```bash
71+
set -Eeuo pipefail
72+
: "${GH_AW_MEMORY_DIR:=/tmp/gh-aw/repo-memory/default}"
73+
: "${ARTICLE_DATE:?ARTICLE_DATE required for checkpoint}"
74+
: "${SUBFOLDER:?SUBFOLDER required for checkpoint (use batch/<lang> for news-translate)}"
75+
PHASE="${1:?phase label required, e.g. phase-04-pass1}"
76+
ANALYSIS_DIR="${ANALYSIS_DIR:-analysis/daily/$ARTICLE_DATE/$SUBFOLDER}"
77+
DEST="$GH_AW_MEMORY_DIR/$ARTICLE_DATE/$SUBFOLDER/$PHASE"
78+
mkdir -p "$DEST" 2>/dev/null || { echo "[checkpoint] mkdir failed for $DEST — continuing"; exit 0; }
79+
# Snapshot top-level analysis artifacts (never documents/ — often 100+ files — and never pass1/).
80+
if [ -d "$ANALYSIS_DIR" ]; then
81+
find "$ANALYSIS_DIR" -maxdepth 1 -type f \( -name '*.md' -o -name '*.json' \) \
82+
-exec cp -f {} "$DEST"/ \; 2>/dev/null || true
83+
fi
84+
# Snapshot today's produced article HTML from the flat news/ directory (if any exists at this phase).
85+
if [ -d "news" ]; then
86+
find "news" -maxdepth 1 -type f -name "${ARTICLE_DATE}-*.html" \
87+
-exec cp -f {} "$DEST"/ \; 2>/dev/null || true
88+
fi
89+
COUNT="$(find "$DEST" -maxdepth 1 -type f 2>/dev/null | wc -l | tr -d ' ')"
90+
echo "[checkpoint] $PHASE$DEST ($COUNT files)"
91+
exit 0
92+
```
93+
94+
### Checkpoint rules
95+
96+
| Rule | Rationale |
97+
|------|-----------|
98+
| **Never block on checkpoint failure** — always `exit 0`. | Repo-memory is a safety net, not a gate. |
99+
| Do **not** copy `$ANALYSIS_DIR/documents/` or `$ANALYSIS_DIR/pass1/`. | `documents/` exceeds the 50-file push cap; `pass1/` is local gate evidence only. |
100+
| Do **not** stage or commit anything under `$GH_AW_MEMORY_DIR`. | gh-aw's `push_repo_memory` post-job publishes it; see `07-commit-and-pr.md`. |
101+
| Prefer small summary `.md` / `.json` files (≤ 50 KB each, ≤ 50 per push). | gh-aw silently drops files exceeding the push caps. |
102+
| Re-run the snippet at every phase, even if earlier phases already snapshotted — it overwrites with the latest content. | Ensures the final state is always preserved, and earlier snapshots remain on the branch from prior runs. |
103+
| For `news-translate`, use `SUBFOLDER=batch/<lang-or-batch-id>` so memory paths don't collide with analysis runs. | Keeps the branch organised by article type. |
104+
48105
## Output contract
49106

50107
- Commit real files on disk under `analysis/daily/` and/or `news/`.

.github/prompts/01-bash-and-shell-safety.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,25 @@ bash({
3232

3333
Parameter expansion (`${VAR}`, `${VAR:-x}`, `${VAR##*/}`, …) and command substitution (`$(cmd)`) are **safe** under the agentic-workflow firewall — the firewall inspects outbound network egress, not shell syntax. Process substitution `<(…)` is best avoided because some runners disable `/dev/fd`.
3434

35+
## Banned expansion patterns (sandbox blocklist)
36+
37+
The execution sandbox **rejects** commands containing any of the following patterns before they run. Rewrite using the safe equivalent instead of trying to work around the block.
38+
39+
| Banned pattern | Why it's blocked | Safe equivalent |
40+
|----------------|------------------|-----------------|
41+
| `${var@P}` / `${var@Q}` / `${var@E}` / `${var@A}` / `${var@a}` | These parameter transformations can produce shell-reparsable fragments or syntax-bearing representations, which can smuggle attacker-controlled content into later parsing steps — a known prompt-injection vector. | Expand the target explicitly: `printf '%s' "$var"`, or a plain `"$var"` substitution. |
42+
| `${!var}` | Indirect expansion uses `$var`'s *value* as another variable name — lets an attacker-controlled string pick which variable is read. | Use an associative array: `declare -A MAP; MAP[foo]=bar; echo "${MAP[$key]}"`. |
43+
| Nested `$(…$(…)…)` | Builds a command string dynamically from inner results — the classic staged injection shape. | Split into two lines with a temporary variable: `inner=$(cmd2); outer=$(cmd1 "$inner")`. |
44+
| Chained builder assignments that progressively construct a command substitution (`a=foo; b="$a"bar; c=$($b)`) | Same staged-injection shape, just spread across multiple statements. | Construct commands as arrays, invoke via `"${cmd[@]}"`; never re-parse a string as a command. |
45+
| `eval` on variable contents (or eval-like constructs such as `bash -c "$var"`, `source /dev/stdin <<<"$var"`) | Direct arbitrary-code execution from data. | Never required for our workflows — refuse and rewrite using arrays, `case`, or explicit branches. |
46+
47+
These rules apply equally to inline bash in prompts AND to bash commands the agent composes at runtime. The sandbox rejects matching commands before they run.
48+
3549
## Secret safety
3650

3751
- Never pass secrets through `$(…)` into a log-visible command — echoing `curl -H "Authorization: $(…)"` will leak if the step is rerun in debug.
38-
- Use env blocks (`env:` on the step) or `${{ secrets.FOO }}` directly; the runner masks secret values in output.
52+
- Expose secrets through the step's `env:` block (for example `env: { FOO: <GitHub-Actions secrets expression for FOO> }`) rather than inlining a raw secrets expression inside the prompt; the runner masks secret values in output.
53+
- Note: this prompt file is loaded via `runtime-import`, and the gh-aw validator rejects any GitHub Actions template expression (the double-curly-brace syntax used for `secrets`, `env`, `inputs`, etc. in workflow YAML) that is not on the safe allow-list — so never embed such an expression in prompt modules, even inside code spans. Keep secret references in the workflow YAML only.
3954

4055
## Temporary files
4156

.github/prompts/03-data-download.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,3 +69,5 @@ Always produce `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/data-download-manifest.m
6969
## Next step
7070

7171
On success, proceed to `04-analysis-pipeline.md`. Never start analysis while `data-download-manifest.md` is missing or empty.
72+
73+
After the manifest is written, run the **phase checkpoint** from `00-base-contract.md` with label `phase-03-download` so the download provenance is persisted to repo memory before any analysis begins.

.github/prompts/04-analysis-pipeline.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,9 @@ For each article with charts, produce accompanying JSON under `analysis/daily/$A
6666

6767
## Next step
6868

69-
On completion, proceed to `05-analysis-gate.md`. Do not start article generation until the gate passes.
69+
Proceed to `05-analysis-gate.md`. Do not start article generation until the gate passes.
70+
71+
After completing Pass 1 (before Pass 2), run the **phase checkpoint** from `00-base-contract.md` with label `phase-04-pass1`. After completing Pass 2 (before the gate), run it again with label `phase-04-pass2`. This guarantees both iterations survive even if the gate, article, or commit phase later fails.
7072

7173
## External references
7274

.github/prompts/05-analysis-gate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ Exit code 0 = pass, non-zero = fail with per-check report. Precondition for chec
153153

154154
## Outcome
155155

156-
- **Pass** → proceed to `06-article-generation.md`.
156+
- **Pass**run the **phase checkpoint** from `00-base-contract.md` with label `phase-05-gate`, then proceed to `06-article-generation.md`.
157157
- **Fail** → fix flagged files (never delete them), re-run the gate, then proceed.
158158
- **Unrecoverable fail after fixes** → stage whatever analysis exists, commit with label `analysis-only`, call `safeoutputs___create_pull_request` once (see `07-commit-and-pr.md`). Do **not** generate articles.
159159

.github/prompts/06-article-generation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,4 +84,4 @@ Each article ≥ 1000 words, minimum 3 of 5 mandatory analytical sections presen
8484

8585
## Next step
8686

87-
Stage all analysis + article + visualisation files, then call `07-commit-and-pr.md`.
87+
Run the **phase checkpoint** from `00-base-contract.md` with label `phase-06-article` to persist the generated articles to repo memory. Then stage all analysis + article + visualisation files, and call `07-commit-and-pr.md`.

.github/prompts/07-commit-and-pr.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,12 @@ Call `safeoutputs___noop({"message": "<reason>"})` **only** if:
8989

9090
In every other case, commit whatever exists and call `create_pull_request` once.
9191

92+
## Final checkpoint — before the PR call
93+
94+
Immediately before calling `safeoutputs___create_pull_request`, run the **phase checkpoint** from `00-base-contract.md` with label `phase-07-final`. This snapshots the final authoritative analysis + article state to repo memory, so even if the PR call, the safe-outputs runner, or the post-job push fails, the last good state survives on the `memory/news-generation` branch.
95+
96+
For `news-translate`, run the checkpoint with label `phase-translate-<lang>` after each per-language batch succeeds (before the final PR call), so individual language translations are preserved even if later languages fail.
97+
9298
## Deadline enforcement
9399

94100
If the run exceeds 40 minutes with no safe-output call yet:

0 commit comments

Comments
 (0)