-
Notifications
You must be signed in to change notification settings - Fork 2
Improve agentic workflow resilience, tool surface, and gh-aw v0.69.3 alignment #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
5ccd889
6c18c96
e9d2eba
d38cd8f
1cec920
a8e1dc5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -52,6 +52,29 @@ Translations for the remaining twelve languages are produced by the dedicated ** | |||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| 5. **Do not** `git push`, `git checkout`, or `git checkout -b` after the call. The safe-outputs runner job publishes the PR; subsequent agent commits are not added. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ## Cache-memory recovery (resilience for failed PRs) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Every news workflow declares `tools.cache-memory:` keyed by `news-${{ github.workflow }}-${{ inputs.article_date || 'today' }}` with 14-day retention (see `02-mcp-access.md` §Servers & tool naming). gh-aw automatically restores the cache from the previous run on each invocation — analysis artifacts under `/tmp/gh-aw/cache-memory/` survive across failed runs and can be reused on the next attempt. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| **On every run, immediately after MCP pre-warm:** | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| 1. Check whether `/tmp/gh-aw/cache-memory/$ARTICLE_DATE/$SUBFOLDER/` exists with prior analysis artifacts (Family A/B/C/D `.md` files). If so, this is a **retry of a failed run**. Copy them into `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/` *before* re-running the analysis pipeline so Pass 2 builds on Pass 1 work that previous runs already paid for. | ||||||||||||||||||||||||||||||
| 2. After a successful Pass 1 (or after the analysis gate passes), copy the produced `.md` artifacts back to `/tmp/gh-aw/cache-memory/$ARTICLE_DATE/$SUBFOLDER/` so the next run can recover them if `safeoutputs___create_pull_request` fails or the run is killed by Timer A/B/C. | ||||||||||||||||||||||||||||||
| 3. The cache is **automatically saved** by gh-aw at job end — the agent does **not** call any safe-output tool to persist it. Just write to `/tmp/gh-aw/cache-memory/`. | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Every news workflow declares `tools.cache-memory:` keyed by `news-${{ github.workflow }}-${{ inputs.article_date || 'today' }}` with 14-day retention (see `02-mcp-access.md` §Servers & tool naming). gh-aw automatically restores the cache from the previous run on each invocation — analysis artifacts under `/tmp/gh-aw/cache-memory/` survive across failed runs and can be reused on the next attempt. | |
| **On every run, immediately after MCP pre-warm:** | |
| 1. Check whether `/tmp/gh-aw/cache-memory/$ARTICLE_DATE/$SUBFOLDER/` exists with prior analysis artifacts (Family A/B/C/D `.md` files). If so, this is a **retry of a failed run**. Copy them into `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/` *before* re-running the analysis pipeline so Pass 2 builds on Pass 1 work that previous runs already paid for. | |
| 2. After a successful Pass 1 (or after the analysis gate passes), copy the produced `.md` artifacts back to `/tmp/gh-aw/cache-memory/$ARTICLE_DATE/$SUBFOLDER/` so the next run can recover them if `safeoutputs___create_pull_request` fails or the run is killed by Timer A/B/C. | |
| 3. The cache is **automatically saved** by gh-aw at job end — the agent does **not** call any safe-output tool to persist it. Just write to `/tmp/gh-aw/cache-memory/`. | |
| Every news workflow declares `tools.cache-memory:` keyed by `news-${{ github.workflow }}-${{ inputs.article_date || 'today' }}` with 14-day retention (see `02-mcp-access.md` §Servers & tool naming). gh-aw automatically restores the cache from the last successfully persisted run on each invocation. Analysis artifacts under `/tmp/gh-aw/cache-memory/` can therefore be reused on the next attempt when a previous run reached the cache-update stage, but newly generated cache-memory content from an agent job that fails or times out is **not** guaranteed to persist for the next retry. | |
| **On every run, immediately after MCP pre-warm:** | |
| 1. Check whether `/tmp/gh-aw/cache-memory/$ARTICLE_DATE/$SUBFOLDER/` exists with prior analysis artifacts (Family A/B/C/D `.md` files). If so, treat this as a **retry with recoverable prior work**. Copy them into `analysis/daily/$ARTICLE_DATE/$SUBFOLDER/` *before* re-running the analysis pipeline so Pass 2 builds on Pass 1 work that a previous successful agent run already produced. | |
| 2. After a successful Pass 1 (or after the analysis gate passes), copy the produced `.md` artifacts back to `/tmp/gh-aw/cache-memory/$ARTICLE_DATE/$SUBFOLDER/` so they are available for persistence if the workflow later fails during PR publication or another post-agent stage. | |
| 3. The agent does **not** call any safe-output tool to persist cache-memory; it only writes to `/tmp/gh-aw/cache-memory/`. In compiled workflows, the updated cache is saved for the next run by a separate cache-update step/job that runs only after a **successful agent job**, so recovery is reliable for post-agent failures (for example PR-publication problems) but not for agent-job failures/timeouts. |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -99,6 +99,24 @@ Each agentic workflow is a **pair**: an authored `.md` source + a compiled `.loc | |||||
| 8. `../prompts/07-commit-and-pr.md` — stage → commit → exactly one `create_pull_request` | ||||||
| 9. *(Tier-C workflows only)* `../prompts/ext/tier-c-aggregation.md` — 14-artifact gate, period multipliers | ||||||
|
|
||||||
| ### Common tool surface (every `news-*.md`) | ||||||
|
|
||||||
| Every news workflow declares the **same** tool & runtime surface for parity, resilience, and full gh-aw v0.69.3 capability coverage: | ||||||
|
|
||||||
| | Field | Value | Purpose | | ||||||
| |-------|-------|---------| | ||||||
| | `runtimes.node.version` | `"25"` | Pinned Node 25 for IMF CLI + render scripts | | ||||||
| | `tools.github.toolsets` | `[all]` | Full GitHub MCP surface (issues, PRs, repos, code-search, actions, releases, discussions, …); see [`github-tools.md`](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/github-tools.md) | | ||||||
| | `tools.bash` / `tools.edit` / `tools.web-fetch` / `tools.agentic-workflows` | enabled | Full local tool surface; `web-fetch` reaches non-MCP public sources (`statskontoret.se`, `riksdagsmonitor.com`) through the AWF firewall | | ||||||
| | `tools.cache-memory` | keyed by `news-${workflow}-${article_date}`, 14-day retention | **Resilience knob** — analysis artifacts persisted at `/tmp/gh-aw/cache-memory/`; restored on the next run if the previous PR failed (see [`07-commit-and-pr.md` §Cache-memory recovery](../prompts/07-commit-and-pr.md)) | | ||||||
|
||||||
| | `tools.cache-memory` | keyed by `news-${workflow}-${article_date}`, 14-day retention | **Resilience knob** — analysis artifacts persisted at `/tmp/gh-aw/cache-memory/`; restored on the next run if the previous PR failed (see [`07-commit-and-pr.md` §Cache-memory recovery](../prompts/07-commit-and-pr.md)) | | |
| | `tools.cache-memory` | keyed by `news-${workflow}-${article_date}`; best-effort cache persistence aligned with a 14-day recovery window | **Resilience knob** — analysis artifacts persisted at `/tmp/gh-aw/cache-memory/`; may be restored on the next run if the previous PR failed and the cache entry is still available (see [`07-commit-and-pr.md` §Cache-memory recovery](../prompts/07-commit-and-pr.md)) | |
Copilot
AI
Apr 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keepalive description mentions enabling “45–50 min sessions” / “full 45–50 min job”, but these workflows are configured with timeout-minutes: 45. Suggest adjusting this wording to avoid implying runs can exceed the configured job timeout (e.g., “full 45‑minute job budget”).
| | `sandbox.mcp.keepalive-interval` | `300` (5 min) | Compiles to gateway `keepaliveInterval`; overrides upstream default `1500 s (25 min)` so HTTP MCPs (`riksdag-regering`) stay warm for the full 45–50 min job (see [`02-mcp-access.md` §MCP gateway keepalive](../prompts/02-mcp-access.md)) | | |
| | `sandbox.mcp.keepalive-interval` | `300` (5 min) | Compiles to gateway `keepaliveInterval`; overrides upstream default `1500 s (25 min)` so HTTP MCPs (`riksdag-regering`) stay warm for the full 45-minute job budget (see [`02-mcp-access.md` §MCP gateway keepalive](../prompts/02-mcp-access.md)) | |
Copilot
AI
Apr 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including the containers ecosystem identifier materially broadens outbound egress (compiled locks now allow multiple container registries like *.docker.io, ghcr.io, quay.io, etc.). Since this is a security-relevant expansion, it would help to explicitly document the expected/required registry set (and/or why broad wildcards are acceptable here) so reviewers can validate it against the project’s egress/allowlist policy.
| | `network.allowed` | `node`, `containers`, `github`, `defaults` + IMF/SCB/Riksdag/Statskontoret/site domains | Ecosystem identifiers preferred per upstream `network.md`; `containers` covers `node:25-alpine` images for SCB + World Bank MCPs | | |
| | `network.allowed` | `node`, `containers`, `github`, `defaults` + IMF/SCB/Riksdag/Statskontoret/site domains | Ecosystem identifiers preferred per upstream `network.md`; `containers` is required only for the MCP container images (`node:25-alpine`) used by the SCB and World Bank servers. Reviewers should expect Docker Hub resolution for these pulls (`docker.io`, `registry-1.docker.io`, `auth.docker.io`, and `production.cloudflare.docker.com`). Upstream ecosystem expansion can cause compiled locks to include broader container-registry patterns; in this repo that broader capability is accepted only because current workflows are intended to pull Docker Hub-hosted `node:25-alpine` images. Any switch to `ghcr.io`, `quay.io`, or other registries must be explicitly documented and reviewed against the egress allowlist policy before merge. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the
cache-memoryrow, the phrase “restores from previous run on cache miss” is misleading: a cache miss means nothing is restored. If the intent is thatrestore-keyscan fall back to older keys, reword to something like “restores from the most recent prior cache via restore-keys when the exact key isn’t found” to avoid confusing operators/agents.