Skip to content

feat(agent-docs-audit): policy, deterministic scanner, and weekly semantic audit workflow#3296

Merged
caio-pizzol merged 1 commit into
mainfrom
caio/agent-docs-audit
May 14, 2026
Merged

feat(agent-docs-audit): policy, deterministic scanner, and weekly semantic audit workflow#3296
caio-pizzol merged 1 commit into
mainfrom
caio/agent-docs-audit

Conversation

@caio-pizzol
Copy link
Copy Markdown
Contributor

Adds a layered audit for agent-context docs (CLAUDE.md, AGENTS.md, .claude/rules/*.md) against a new agent-docs-policy.md. Modeled on risk-assess.yml: L1 deterministic scan (sizes, symlinks, broken refs), L2 Haiku triage on flagged docs ($0.01/doc), and scheduled/manual-only L3 Sonnet deep analysis via @anthropic-ai/claude-agent-sdk with Read/Glob/Grep/Bash tools ($0.20/flagged doc).

  • Triggers: weekly Monday cron, doc-path PRs, manual dispatch.
  • PRs run L1 only. The audited input is prompt text that a PR author can modify; running a tool-using model with Bash + ANTHROPIC_API_KEY over PR-authored markdown would be a prompt-injection path. Scheduled and workflow_dispatch runs use the full L1+L2+L3 pipeline against main, where the input has passed code review.
  • Warning-only: uploads artifacts and writes a Step Summary. No PR comments, no failing CI. We can promote behavior after seeing a few real runs.
  • AI layers skip gracefully when ANTHROPIC_API_KEY is missing.
  • Policy codifies size budgets, placement rules (root vs nested vs .claude/rules/), the verifiable-claims standard, and five finding labels (KEEP / TRIM / MOVE / UPDATE / INVESTIGATE).
  • Manual prototype run against current main: 5 of 9 L1-flagged docs reached L3, 15 concrete findings for ~$1.19 total. Notable catch the deterministic layer cannot make: `blockIdToEntry` identifier in `packages/layout-engine/AGENTS.md` does not exist in `renderer.ts` (the real symbol is `pageIndexToState`).

Required secret: `ANTHROPIC_API_KEY`. Already used by `risk-assess.yml`, no new secret needed.

Follow-ups (not in this PR): act on the 15 findings from the prototype run; revisit PR-time L2/L3 enablement once we have a tighter sandbox (e.g., disallow Bash entirely); consider a `--max-budget-usd` cap for safety.

@caio-pizzol caio-pizzol requested a review from a team as a code owner May 14, 2026 13:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdadcc4ea0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/scripts/agent-docs-l1.mjs Outdated
Comment thread .github/scripts/agent-docs-l1.mjs Outdated
@caio-pizzol caio-pizzol force-pushed the caio/agent-docs-audit branch from cdadcc4 to 2a236a2 Compare May 14, 2026 13:48
…workflow

Three-layer audit modeled on risk-assess.mjs:
- L1 (.github/scripts/agent-docs-l1.mjs): deterministic scan. Walks
  agent-context docs, counts lines, classifies AGENTS/CLAUDE pairs,
  detects broken @imports, broken path refs (with context-aware
  resolution), and unresolved pnpm commands. No model calls.
- L2 (.github/scripts/agent-docs-audit.mjs Haiku triage): given an
  L1-flagged doc + policy, decides via tool-use whether the doc
  needs deep review. Cheap (~$0.01).
- L3 (same file, Sonnet via claude-agent-sdk): reads the doc, uses
  Read/Glob/Grep/Bash to verify concrete claims (paths, identifiers,
  commands, architecture). Emits structured KEEP/TRIM/MOVE/UPDATE/
  INVESTIGATE findings. ~$0.20/doc.

Workflow (.github/workflows/agent-docs-audit.yml): triggers on
PR doc-path changes, weekly Monday cron, and workflow_dispatch.
Skips AI layers gracefully if ANTHROPIC_API_KEY missing (fork PRs).
Warning-only: uploads /tmp/agent-docs-audit.json and
/tmp/agent-docs-audit-summary.md as artifacts plus a Step Summary,
no PR comments and no failing CI yet.

Policy (agent-docs-policy.md, 91 lines): codifies size budgets,
placement rules, write/do-not-write criteria, verifiable claims
standard, and the five finding labels.

Manual prototype run against current main: 5 of 9 L1-flagged docs
passed Haiku triage to Sonnet review, 15 concrete findings produced
for $1.19 total. Notable finds the deterministic scanner alone
cannot catch: `blockIdToEntry` identifier in
packages/layout-engine/AGENTS.md does not exist in renderer.ts
(stale symbol).
@caio-pizzol caio-pizzol force-pushed the caio/agent-docs-audit branch from 2a236a2 to 407b278 Compare May 14, 2026 13:53
@caio-pizzol caio-pizzol merged commit 051d208 into main May 14, 2026
12 checks passed
@caio-pizzol caio-pizzol deleted the caio/agent-docs-audit branch May 14, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant