Run through this before deploying any agent. Every NO is a failure mode you haven't addressed.
- Does the agent have a specific function described in one sentence?
- Could you distinguish this agent's output from a generic chatbot's?
- If client-facing, is there a brand name separate from your personal identity?
- Have you defined what the agent is NOT? (scope boundaries)
- Is every platform the agent interacts with listed?
- Are authentication requirements noted (key names, not keys)?
- Are platform-specific constraints documented (rate limits, blocked endpoints, format requirements)?
- Could someone else deploy this agent on these platforms without asking you questions?
- Is the trigger type explicit (manual, cron, event)?
- If cron: have you calculated the daily/monthly token cost of the schedule?
- If event: is the trigger condition specific enough to avoid firing on noise?
- Can you predict exactly when this agent will run in the next 24 hours?
- Is every step numbered?
- Are there explicit decision points for ambiguous situations?
- Does the workflow include "flag for human review" at the right moments?
- Is the output location specified (file path, naming convention)?
- If the agent followed these steps literally and did nothing else, would you get a usable deliverable?
- Are format requirements specific (markdown, plain text, JSON, etc.)?
- Are length constraints given as ranges, not vibes?
- Could you use these standards to reject a bad deliverable with a specific reason?
- Have you described what bad output looks like?
- Is this an explicit allowlist (not "use whatever you need")?
- For each tool: is the permitted use case stated?
- Is anything NOT on the list implicitly denied?
- Are all hard NOs written as prohibitions, not suggestions?
- Does the list include: publishing, spending, client interaction, scope boundaries?
- If the agent violated any of these, would it cause a real problem?
- Does the agent know when to stop iterating?
- Does the agent know when to flag uncertainty instead of guessing?
- Is there a cost ceiling or iteration cap?
- Would this section prevent a 10x budget overrun on a single run?
- Read the entire spec as if you're the agent seeing it for the first time. Does it make sense without any context you haven't written down?
- Is the spec short enough to fit in a context window without crowding out the actual work? (Target: under 1000 words for most agents)
- Have you removed every sentence that says "be helpful" or "use your best judgment"? Those aren't instructions. They're wishes.