Skip to content

⚡ Copilot Token Optimization2026-04-05 — Pelis Agent Factory Advisor #1677

@github-actions

Description

@github-actions

Target Workflow: pelis-agent-factory-advisor

Source report: #1676
Estimated cost per run: $2.02
Total tokens per run: ~1,399K (759K input · 13K output · 627K cache_read)
Cache hit rate: 45.2% (cold on turn 1; warms to ~49% by turn 6)
LLM turns: 11
Model: claude-sonnet-4.6 via Copilot endpoint
Session time: ~4 minutes


Current Configuration

Setting Value
Tools loaded 37 total (8 agentic-workflows + 24 github + 1 bash + 1 web-fetch + 1 cache-memory + 4 safeoutputs)
Tools actually used ~5 (web-fetch · cache-memory · agentic-workflows · bash · safeoutputs)
GitHub MCP tools used 0 — confirmed by firewall: 0 calls to api.github.com
imports: shared/mcp-pagination.md (~806 tokens) — only relevant for GitHub MCP tools
Network groups github (api.github.com etc.), github.github.io
Pre-agent steps: No
Prompt size 7,654 chars (~1,913 tokens); grows to ~23,000+ with injected system boilerplate
Input context growth 40K tokens (turn 1) → 83K tokens (turn 11), monotonic accumulation

Firewall evidence (run §23993514169)

▼ 26 requests | 26 allowed | 0 blocked | 2 unique domains
  api.githubcopilot.com  17   (LLM API calls)
  github.github.io        9   (web-fetch — documentation crawl)

Zero calls to api.github.com → the 24 loaded GitHub MCP tools were never invoked.


Recommendations

1. Remove github: toolset and mcp-pagination.md import

Estimated savings: ~193K tokens/run (~25%) · ~$0.48/run

The 24 GitHub MCP tools (context, repos, issues, pull_requests, actions toolsets) are loaded every turn but never called. Each tool schema costs ~700 tokens. The mcp-pagination.md import exists solely to guide GitHub MCP usage — it also becomes dead weight.

Token math:

  • 24 github tools × ~700 tokens = ~16,800 tokens/turn
  • mcp-pagination.md = ~806 tokens/turn
  • Total per-turn savings: ~17,600 tokens × 11 turns = ~193,600 tokens
  • Cost savings: 193K × $2.50/M = ~$0.48 (-24%)

Implementation — edit .github/workflows/pelis-agent-factory-advisor.md frontmatter:

-imports:
-  - shared/mcp-pagination.md
 tools:
   agentic-workflows:
-  github:
-    toolsets: [default, actions]
   bash:
     - "*"
   web-fetch:
   cache-memory: true
 network:
   allowed:
-    - github
     - "github.github.io"

Note on network group: The github network group enables api.github.com, which agentic-workflows uses internally (via its own GITHUB_TOKEN passthrough — not through agent tool calls). Verify whether removing the github group breaks agentic-workflows before removing it; keep it if needed.

Also remove the GitHub-centric analysis instruction from the prompt body (Phase 2.3: "Use GitHub tools to understand recent repository activity"). Replace with:

### Step 2.3: Assess Recent Activity via Workflow Runs

Use the `agentic-workflows` tool to check recent run history and status:
- `status` — current workflow health
- `audit` — any security or configuration issues

2. Pre-fetch Pelis Agent Factory documentation in steps:

Estimated savings: ~100–150K tokens/run (~10%) · ~$0.25–0.37/run

Phase 1 (Steps 1.1–1.2) requires the agent to crawl github.github.io using web-fetch. This produced 9 HTTP calls spread across multiple turns, adding fetched HTML/markdown to the growing conversation history.

The documentation at github.github.io rarely changes day-to-day. Pre-fetching it as a steps: pre-step and injecting via \{\{steps.fetch-docs.output}} removes 3–4 agent turns of documentation discovery, flattening the context growth curve.

Implementation — add steps: block to frontmatter:

steps:
  fetch-docs:
    name: Fetch Pelis Agent Factory Docs
    run: |
      set -e
      BASE="(github.github.io/redacted)
      OUTPUT=""
      for PATH_SUFFIX in \
        "/blog/2026-01-12-welcome-to-pelis-agent-factory/" \
        "/introduction/overview/" \
        "/guides/workflow-patterns/" \
        "/guides/best-practices/"; do
        CONTENT=$(curl -sf "$BASE$PATH_SUFFIX" | \
          python3 -c "import sys,html,re; t=sys.stdin.read(); t=re.sub(r'<[^>]+>','',t); print(html.unescape(t)[:8000])" 2>/dev/null || echo "(not found)")
        OUTPUT+="### $BASE$PATH_SUFFIX\n$CONTENT\n\n"
      done
      echo "$OUTPUT" > /tmp/gh-aw/agent/pelis-docs.md
      echo "Fetched $(wc -c < /tmp/gh-aw/agent/pelis-docs.md) bytes of documentation"

Then update Phase 1 in the prompt body to use injected data instead of live web-fetch:

## Phase 1: Pelis Agent Factory Patterns

The following documentation was pre-fetched from the Pelis Agent Factory site:

\{\{steps.fetch-docs.output}}

Use this as your primary reference. If you need additional pages, use `web-fetch`
to supplement (limit to 2–3 additional fetches).

Why this helps:

  • Eliminates 3–4 early turns → prevents ~30–50K of accumulated conversation history
  • Documentation is deterministic; pre-fetching doesn't add LLM uncertainty
  • Pre-fetched content is the same every turn (stable prefix → better cache utilization)

3. Reduce turn count by restructuring the 4-phase prompt

Estimated savings: ~80–120K tokens/run (~8%) · ~$0.20–0.30/run

The current 4-phase prompt (Learn → Analyze → Identify → Report) requires the agent to maintain state across many turns. The context grows from 40K → 83K tokens because each exchange carries the full history.

The agentics repository exploration (Step 1.2) adds a full gh search_repositories + file browsing loop that generates large tool responses. Consider replacing with a static pointer:

-### Step 1.2: Explore the Agentics Repository
-
-Clone knowledge from the agentics repository to understand reference implementations:
-- Repository: https://github.com/githubnext/agentics
-- Use the GitHub tools to explore the repository structure
+### Step 1.2: Reference Agentics Patterns
+
+Reference patterns from https://github.com/githubnext/agentics. Key files to check
+via `web-fetch` if needed: `README.md` and `.github/workflows/*.md`. Limit to
+2 fetches maximum. Use cache-memory to persist any patterns found for future runs.

This alone can reduce turns from 11 → 7–8 by eliminating the open-ended exploration loop.


4. Improve cache hit rate with a stable prompt prefix

Estimated savings: ~50–80K tokens/run (~5%) · ~$0.12–0.20/run (for sequential runs within 5-minute window)

The first turn always starts cold (39,950 tokens, 0% cache) because there's no warm cache from a previous run. Cache TTL is ~5 minutes, so daily scheduled runs never benefit from it.

However, the cache hits on turns 2–11 (36–49%) show the system prompt is being cached within the same session. To maximize this:

  1. Move variable content to the end of the prompt. The \{\{steps.*}} injections, GitHub context (run ID, actor), and any dynamic data should appear after the stable system instructions, not interleaved in the middle. Stable prefix = larger cacheable chunk.

  2. Consider using cache-memory to skip Phase 1 when docs are unchanged. On each run, store a content hash of the fetched docs. On the next run, check the hash first:

    Use cache-memory to check `pelis_docs_hash`. If it matches today's fetched content,
    skip re-summarizing Phase 1 patterns and use your existing cached knowledge.
    

    This can save 2–3 turns per re-run when docs haven't changed (common on weekdays).


Expected Impact

Metric Current Projected Savings
Total tokens/run ~1,399K ~1,000–1,100K ~21–28%
Input tokens/run ~759K ~540–600K ~21–29%
Cost/run $2.02 ~$1.30–$1.54 ~$0.48–0.72 (-24–36%)
LLM turns 11 ~7–9 -2 to -4
Session time ~4 min ~2.5–3 min (est.) ~25%
Cache hit rate 45.2% 50–65% (est.) ↑ stable prefix

Rec 1 alone is the highest-confidence saving (~$0.48, confirmed by firewall data).
Recs 2–4 are estimates based on observed context growth patterns.


Implementation Checklist

  • Rec 1: Remove github: toolsets: [default, actions] from frontmatter
  • Rec 1: Remove imports: [shared/mcp-pagination.md] from frontmatter
  • Rec 1: Remove - github from network.allowed: (verify agentic-workflows still works first)
  • Rec 1: Remove "Use GitHub tools" instruction from Phase 2.3 in prompt body
  • Rec 2: Add steps: fetch-docs: pre-step to frontmatter
  • Rec 2: Update Phase 1 prompt to use \{\{steps.fetch-docs.output}} injection
  • Rec 3: Refactor Step 1.2 to limit agentics repo exploration to 2 web-fetch calls
  • Rec 4: Add content-hash cache check at the start of Phase 1
  • Recompile: gh aw compile .github/workflows/pelis-agent-factory-advisor.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Verify CI passes on PR
  • Compare token usage on next scheduled run vs this baseline ($2.02)

Generated by Daily Copilot Token Optimization Advisor ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions