feat(agent-docs-audit): policy, deterministic scanner, and weekly semantic audit workflow#3296
Merged
Merged
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cdadcc4ea0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
cdadcc4 to
2a236a2
Compare
…workflow Three-layer audit modeled on risk-assess.mjs: - L1 (.github/scripts/agent-docs-l1.mjs): deterministic scan. Walks agent-context docs, counts lines, classifies AGENTS/CLAUDE pairs, detects broken @imports, broken path refs (with context-aware resolution), and unresolved pnpm commands. No model calls. - L2 (.github/scripts/agent-docs-audit.mjs Haiku triage): given an L1-flagged doc + policy, decides via tool-use whether the doc needs deep review. Cheap (~$0.01). - L3 (same file, Sonnet via claude-agent-sdk): reads the doc, uses Read/Glob/Grep/Bash to verify concrete claims (paths, identifiers, commands, architecture). Emits structured KEEP/TRIM/MOVE/UPDATE/ INVESTIGATE findings. ~$0.20/doc. Workflow (.github/workflows/agent-docs-audit.yml): triggers on PR doc-path changes, weekly Monday cron, and workflow_dispatch. Skips AI layers gracefully if ANTHROPIC_API_KEY missing (fork PRs). Warning-only: uploads /tmp/agent-docs-audit.json and /tmp/agent-docs-audit-summary.md as artifacts plus a Step Summary, no PR comments and no failing CI yet. Policy (agent-docs-policy.md, 91 lines): codifies size budgets, placement rules, write/do-not-write criteria, verifiable claims standard, and the five finding labels. Manual prototype run against current main: 5 of 9 L1-flagged docs passed Haiku triage to Sonnet review, 15 concrete findings produced for $1.19 total. Notable finds the deterministic scanner alone cannot catch: `blockIdToEntry` identifier in packages/layout-engine/AGENTS.md does not exist in renderer.ts (stale symbol).
2a236a2 to
407b278
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a layered audit for agent-context docs (
CLAUDE.md,AGENTS.md,.claude/rules/*.md) against a newagent-docs-policy.md. Modeled onrisk-assess.yml: L1 deterministic scan (sizes, symlinks, broken refs), L2 Haiku triage on flagged docs ($0.01/doc), and scheduled/manual-only L3 Sonnet deep analysis via$0.20/flagged doc).@anthropic-ai/claude-agent-sdkwithRead/Glob/Grep/Bashtools (Bash+ANTHROPIC_API_KEYover PR-authored markdown would be a prompt-injection path. Scheduled andworkflow_dispatchruns use the full L1+L2+L3 pipeline againstmain, where the input has passed code review.ANTHROPIC_API_KEYis missing..claude/rules/), the verifiable-claims standard, and five finding labels (KEEP / TRIM / MOVE / UPDATE / INVESTIGATE).main: 5 of 9 L1-flagged docs reached L3, 15 concrete findings for ~$1.19 total. Notable catch the deterministic layer cannot make: `blockIdToEntry` identifier in `packages/layout-engine/AGENTS.md` does not exist in `renderer.ts` (the real symbol is `pageIndexToState`).Required secret: `ANTHROPIC_API_KEY`. Already used by `risk-assess.yml`, no new secret needed.
Follow-ups (not in this PR): act on the 15 findings from the prototype run; revisit PR-time L2/L3 enablement once we have a tighter sandbox (e.g., disallow Bash entirely); consider a `--max-budget-usd` cap for safety.