Skip to content

Latest commit

 

History

History
117 lines (73 loc) · 5.58 KB

File metadata and controls

117 lines (73 loc) · 5.58 KB

Format Guide — Section by Section

Why each section exists

Every section in this spec format solves a specific failure mode we've seen in agent deployments. Cut a section and you reintroduce the problem it prevents.


IDENTITY

Prevents: Generic output. Tone inconsistency. Agents that sound like a customer service chatbot when they should sound like an engineer.

Your agent's identity is not a personality description. It's an operational constraint. When you write "you are a technical writer who produces API docs," every output the agent generates gets filtered through that lens. When you write "you are a helpful assistant," nothing gets filtered through anything.

Include:

  • What the agent IS (function, voice, register)
  • What the agent is NOT (boundaries on scope and tone)
  • Brand name if the agent is client-facing (keeps the agent from leaking your personal or parent operation identity)

Test: If you swapped this identity block into a completely different agent framework, would the output still sound right? If yes, it's specific enough.


PLATFORMS

Prevents: Agents trying to use tools they don't have access to. Agents ignoring available resources. Confusion about where work comes from and where it goes.

List every platform the agent touches. Include:

  • What role each platform plays (sourcing work, delivering work, research, storage)
  • Authentication details the agent needs (API key names, not the keys themselves)
  • Platform-specific constraints (rate limits, blocked endpoints, format requirements)

Test: Could someone else deploy this agent on these platforms using only what's written here plus the API keys? If they'd need to ask you questions, the section is incomplete.


TRIGGER

Prevents: Agents running when they shouldn't. Agents not running when they should. Runaway loops from misconfigured triggers.

Three types:

  • Manual: Someone fires the agent with a brief. Safest. Best for high-value work.
  • Cron: Scheduled. Best for scanning and monitoring. Keep the interval sane — an hourly scan that takes 5 minutes and costs 500 tokens means 12,000 tokens/day on one routine.
  • Event: Fires in response to a signal (webhook, file change, message). Powerful but dangerous — make sure the trigger condition is specific enough that it doesn't fire on noise.

Test: Can you predict exactly when this agent will run in the next 24 hours? If not, tighten the trigger.


WORKFLOW

Prevents: Agents making up their own process. Inconsistent output across runs. Agents skipping quality checks. Agents proceeding when they should stop and ask.

This is the most important section. Number every step. Include:

  • The exact sequence from trigger to deliverable
  • Decision points with explicit branches (if ambiguous → flag, don't guess)
  • Where the output goes (workspace path, file naming convention)
  • Who reviews it and how it gets handed off

Common mistake: Writing the workflow as what the agent SHOULD do rather than what it MUST do. "Consider checking the brief" vs "Read the full brief before writing. If the brief is ambiguous, write your interpretation and flag for confirmation before proceeding."

Test: If the agent followed these steps literally and did nothing else, would you get a usable deliverable? If you're relying on the agent's "judgment" to fill gaps, the workflow has gaps.


OUTPUT STANDARDS

Prevents: Agents producing technically correct but practically useless output. Word count inflation. Format mismatches. The dreaded "in this comprehensive guide we will explore."

Be specific about:

  • Format (markdown, plain text, JSON, whatever the deliverable requires)
  • Length (ranges, not vibes — "750-850 words" not "keep it concise")
  • Tone (match examples if possible)
  • What bad output looks like (this is more useful than describing good output)

Test: Could you use these standards to reject a bad deliverable with a specific reason? "This doesn't meet standards" isn't a rejection. "This is 1,800 words when the brief asked for 800" is.


TOOL PERMISSIONS

Prevents: Agents using expensive tools when cheap ones would do. Agents accessing systems they shouldn't. Unexpected API calls and costs.

Explicit allowlist. Anything not listed is denied. For each tool:

  • What it is
  • What the agent uses it for
  • Any constraints on usage (rate limits, cost caps)

Test: If the agent tried to use a tool not on this list, would you want to know about it? If yes, the list is correct.


STANDING LIMITS

Prevents: The catastrophic failures. Publishing unreviewed work. Spending money. Interacting with clients unsupervised. Scope creep into areas the agent wasn't designed for.

These are hard NOs, not guidelines. Write them as prohibitions:

  • Never publish anything to any platform
  • Never interact with customers
  • Never spend credits or tokens beyond allocated budget
  • Nothing leaves the operation without human review

Test: If the agent violated one of these limits, would it cause a real problem — lost money, reputation damage, client confusion? If yes, it belongs here.


TOKEN DISCIPLINE

Prevents: Agents burning budget on self-improvement loops. Perfectionism spirals. The agent that rewrites the same paragraph eight times and produces something worse than draft two.

Tell the agent:

  • When to stop iterating and deliver what it has
  • When to flag uncertainty rather than guessing
  • What "done enough" looks like
  • The rough cost ceiling per run (if you track this)

Test: Would this section prevent the agent from consuming 10x its expected budget on a single run? If not, add a harder ceiling.