Target Workflow: pelis-agent-factory-advisor
Source report: #1676
Estimated cost per run: $2.02
Total tokens per run: ~1,399K (759K input · 13K output · 627K cache_read)
Cache hit rate: 45.2% (cold on turn 1; warms to ~49% by turn 6)
LLM turns: 11
Model: claude-sonnet-4.6 via Copilot endpoint
Session time: ~4 minutes
Current Configuration
| Setting |
Value |
| Tools loaded |
37 total (8 agentic-workflows + 24 github + 1 bash + 1 web-fetch + 1 cache-memory + 4 safeoutputs) |
| Tools actually used |
~5 (web-fetch · cache-memory · agentic-workflows · bash · safeoutputs) |
| GitHub MCP tools used |
0 — confirmed by firewall: 0 calls to api.github.com |
imports: |
shared/mcp-pagination.md (~806 tokens) — only relevant for GitHub MCP tools |
| Network groups |
github (api.github.com etc.), github.github.io |
Pre-agent steps: |
No |
| Prompt size |
7,654 chars (~1,913 tokens); grows to ~23,000+ with injected system boilerplate |
| Input context growth |
40K tokens (turn 1) → 83K tokens (turn 11), monotonic accumulation |
▼ 26 requests | 26 allowed | 0 blocked | 2 unique domains
api.githubcopilot.com 17 (LLM API calls)
github.github.io 9 (web-fetch — documentation crawl)
Zero calls to api.github.com → the 24 loaded GitHub MCP tools were never invoked.
Recommendations
1. Remove github: toolset and mcp-pagination.md import
Estimated savings: ~193K tokens/run (~25%) · ~$0.48/run
The 24 GitHub MCP tools (context, repos, issues, pull_requests, actions toolsets) are loaded every turn but never called. Each tool schema costs ~700 tokens. The mcp-pagination.md import exists solely to guide GitHub MCP usage — it also becomes dead weight.
Token math:
- 24 github tools × ~700 tokens = ~16,800 tokens/turn
mcp-pagination.md = ~806 tokens/turn
- Total per-turn savings: ~17,600 tokens × 11 turns = ~193,600 tokens
- Cost savings: 193K × $2.50/M = ~$0.48 (-24%)
Implementation — edit .github/workflows/pelis-agent-factory-advisor.md frontmatter:
-imports:
- - shared/mcp-pagination.md
tools:
agentic-workflows:
- github:
- toolsets: [default, actions]
bash:
- "*"
web-fetch:
cache-memory: true
network:
allowed:
- - github
- "github.github.io"
Note on network group: The github network group enables api.github.com, which agentic-workflows uses internally (via its own GITHUB_TOKEN passthrough — not through agent tool calls). Verify whether removing the github group breaks agentic-workflows before removing it; keep it if needed.
Also remove the GitHub-centric analysis instruction from the prompt body (Phase 2.3: "Use GitHub tools to understand recent repository activity"). Replace with:
### Step 2.3: Assess Recent Activity via Workflow Runs
Use the `agentic-workflows` tool to check recent run history and status:
- `status` — current workflow health
- `audit` — any security or configuration issues
2. Pre-fetch Pelis Agent Factory documentation in steps:
Estimated savings: ~100–150K tokens/run (~10%) · ~$0.25–0.37/run
Phase 1 (Steps 1.1–1.2) requires the agent to crawl github.github.io using web-fetch. This produced 9 HTTP calls spread across multiple turns, adding fetched HTML/markdown to the growing conversation history.
The documentation at github.github.io rarely changes day-to-day. Pre-fetching it as a steps: pre-step and injecting via \{\{steps.fetch-docs.output}} removes 3–4 agent turns of documentation discovery, flattening the context growth curve.
Implementation — add steps: block to frontmatter:
steps:
fetch-docs:
name: Fetch Pelis Agent Factory Docs
run: |
set -e
BASE="(github.github.io/redacted)
OUTPUT=""
for PATH_SUFFIX in \
"/blog/2026-01-12-welcome-to-pelis-agent-factory/" \
"/introduction/overview/" \
"/guides/workflow-patterns/" \
"/guides/best-practices/"; do
CONTENT=$(curl -sf "$BASE$PATH_SUFFIX" | \
python3 -c "import sys,html,re; t=sys.stdin.read(); t=re.sub(r'<[^>]+>','',t); print(html.unescape(t)[:8000])" 2>/dev/null || echo "(not found)")
OUTPUT+="### $BASE$PATH_SUFFIX\n$CONTENT\n\n"
done
echo "$OUTPUT" > /tmp/gh-aw/agent/pelis-docs.md
echo "Fetched $(wc -c < /tmp/gh-aw/agent/pelis-docs.md) bytes of documentation"
Then update Phase 1 in the prompt body to use injected data instead of live web-fetch:
## Phase 1: Pelis Agent Factory Patterns
The following documentation was pre-fetched from the Pelis Agent Factory site:
\{\{steps.fetch-docs.output}}
Use this as your primary reference. If you need additional pages, use `web-fetch`
to supplement (limit to 2–3 additional fetches).
Why this helps:
- Eliminates 3–4 early turns → prevents ~30–50K of accumulated conversation history
- Documentation is deterministic; pre-fetching doesn't add LLM uncertainty
- Pre-fetched content is the same every turn (stable prefix → better cache utilization)
3. Reduce turn count by restructuring the 4-phase prompt
Estimated savings: ~80–120K tokens/run (~8%) · ~$0.20–0.30/run
The current 4-phase prompt (Learn → Analyze → Identify → Report) requires the agent to maintain state across many turns. The context grows from 40K → 83K tokens because each exchange carries the full history.
The agentics repository exploration (Step 1.2) adds a full gh search_repositories + file browsing loop that generates large tool responses. Consider replacing with a static pointer:
-### Step 1.2: Explore the Agentics Repository
-
-Clone knowledge from the agentics repository to understand reference implementations:
-- Repository: https://github.com/githubnext/agentics
-- Use the GitHub tools to explore the repository structure
+### Step 1.2: Reference Agentics Patterns
+
+Reference patterns from https://github.com/githubnext/agentics. Key files to check
+via `web-fetch` if needed: `README.md` and `.github/workflows/*.md`. Limit to
+2 fetches maximum. Use cache-memory to persist any patterns found for future runs.
This alone can reduce turns from 11 → 7–8 by eliminating the open-ended exploration loop.
4. Improve cache hit rate with a stable prompt prefix
Estimated savings: ~50–80K tokens/run (~5%) · ~$0.12–0.20/run (for sequential runs within 5-minute window)
The first turn always starts cold (39,950 tokens, 0% cache) because there's no warm cache from a previous run. Cache TTL is ~5 minutes, so daily scheduled runs never benefit from it.
However, the cache hits on turns 2–11 (36–49%) show the system prompt is being cached within the same session. To maximize this:
-
Move variable content to the end of the prompt. The \{\{steps.*}} injections, GitHub context (run ID, actor), and any dynamic data should appear after the stable system instructions, not interleaved in the middle. Stable prefix = larger cacheable chunk.
-
Consider using cache-memory to skip Phase 1 when docs are unchanged. On each run, store a content hash of the fetched docs. On the next run, check the hash first:
Use cache-memory to check `pelis_docs_hash`. If it matches today's fetched content,
skip re-summarizing Phase 1 patterns and use your existing cached knowledge.
This can save 2–3 turns per re-run when docs haven't changed (common on weekdays).
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
~1,399K |
~1,000–1,100K |
~21–28% |
| Input tokens/run |
~759K |
~540–600K |
~21–29% |
| Cost/run |
$2.02 |
~$1.30–$1.54 |
~$0.48–0.72 (-24–36%) |
| LLM turns |
11 |
~7–9 |
-2 to -4 |
| Session time |
~4 min |
~2.5–3 min (est.) |
~25% |
| Cache hit rate |
45.2% |
50–65% (est.) |
↑ stable prefix |
Rec 1 alone is the highest-confidence saving (~$0.48, confirmed by firewall data).
Recs 2–4 are estimates based on observed context growth patterns.
Implementation Checklist
Generated by Daily Copilot Token Optimization Advisor · ◷
Target Workflow:
pelis-agent-factory-advisorSource report: #1676
Estimated cost per run: $2.02
Total tokens per run: ~1,399K (759K input · 13K output · 627K cache_read)
Cache hit rate: 45.2% (cold on turn 1; warms to ~49% by turn 6)
LLM turns: 11
Model: claude-sonnet-4.6 via Copilot endpoint
Session time: ~4 minutes
Current Configuration
api.github.comimports:shared/mcp-pagination.md(~806 tokens) — only relevant for GitHub MCP toolsgithub(api.github.com etc.),github.github.iosteps:Firewall evidence (run §23993514169)
Zero calls to
api.github.com→ the 24 loaded GitHub MCP tools were never invoked.Recommendations
1. Remove
github:toolset andmcp-pagination.mdimportEstimated savings: ~193K tokens/run (~25%) · ~$0.48/run
The 24 GitHub MCP tools (
context,repos,issues,pull_requests,actionstoolsets) are loaded every turn but never called. Each tool schema costs ~700 tokens. Themcp-pagination.mdimport exists solely to guide GitHub MCP usage — it also becomes dead weight.Token math:
mcp-pagination.md= ~806 tokens/turnImplementation — edit
.github/workflows/pelis-agent-factory-advisor.mdfrontmatter:Also remove the GitHub-centric analysis instruction from the prompt body (Phase 2.3: "Use GitHub tools to understand recent repository activity"). Replace with:
2. Pre-fetch Pelis Agent Factory documentation in
steps:Estimated savings: ~100–150K tokens/run (~10%) · ~$0.25–0.37/run
Phase 1 (Steps 1.1–1.2) requires the agent to crawl
github.github.iousingweb-fetch. This produced 9 HTTP calls spread across multiple turns, adding fetched HTML/markdown to the growing conversation history.The documentation at
github.github.iorarely changes day-to-day. Pre-fetching it as asteps:pre-step and injecting via\{\{steps.fetch-docs.output}}removes 3–4 agent turns of documentation discovery, flattening the context growth curve.Implementation — add
steps:block to frontmatter:Then update Phase 1 in the prompt body to use injected data instead of live
web-fetch:Why this helps:
3. Reduce turn count by restructuring the 4-phase prompt
Estimated savings: ~80–120K tokens/run (~8%) · ~$0.20–0.30/run
The current 4-phase prompt (Learn → Analyze → Identify → Report) requires the agent to maintain state across many turns. The context grows from 40K → 83K tokens because each exchange carries the full history.
The agentics repository exploration (Step 1.2) adds a full
gh search_repositories+ file browsing loop that generates large tool responses. Consider replacing with a static pointer:This alone can reduce turns from 11 → 7–8 by eliminating the open-ended exploration loop.
4. Improve cache hit rate with a stable prompt prefix
Estimated savings: ~50–80K tokens/run (~5%) · ~$0.12–0.20/run (for sequential runs within 5-minute window)
The first turn always starts cold (39,950 tokens, 0% cache) because there's no warm cache from a previous run. Cache TTL is ~5 minutes, so daily scheduled runs never benefit from it.
However, the cache hits on turns 2–11 (36–49%) show the system prompt is being cached within the same session. To maximize this:
Move variable content to the end of the prompt. The
\{\{steps.*}}injections, GitHub context (run ID, actor), and any dynamic data should appear after the stable system instructions, not interleaved in the middle. Stable prefix = larger cacheable chunk.Consider using
cache-memoryto skip Phase 1 when docs are unchanged. On each run, store a content hash of the fetched docs. On the next run, check the hash first:This can save 2–3 turns per re-run when docs haven't changed (common on weekdays).
Expected Impact
Implementation Checklist
github: toolsets: [default, actions]from frontmatterimports: [shared/mcp-pagination.md]from frontmatter- githubfromnetwork.allowed:(verifyagentic-workflowsstill works first)steps: fetch-docs:pre-step to frontmatter\{\{steps.fetch-docs.output}}injectiongh aw compile .github/workflows/pelis-agent-factory-advisor.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts