feat: add orb-webhooks skill by leggetter · Pull Request #63 · hookdeck/webhook-skills

leggetter · 2026-05-13T18:41:52Z

Summary

Adds a complete orb-webhooks provider skill for Orb (usage-based billing). HMAC-SHA256 manual verification with the unusual signed-content format v1:{X-Orb-Timestamp}:{rawBody} (literal v1 prefix, ISO-8601 timestamp, colon separators).

What's included

skills/orb-webhooks/SKILL.md — entry point with frontmatter and the verification core
skills/orb-webhooks/references/ — overview (event taxonomy + summary-webhooks variant), setup (dashboard config + per-endpoint secret), verification (signature algorithm, gotchas, idempotency recommendation)
skills/orb-webhooks/examples/ — Express, Next.js App Router, FastAPI handlers with tests
Integration: providers.yaml, README.md, .claude-plugin/marketplace.json (both as a standalone plugin and added to the webhook-skills bundle)

Notes

Header pair: X-Orb-Signature: v1=<hex> carries the HMAC; X-Orb-Timestamp: <ISO-8601> carries the timestamp separately.
Signed content: v1:{X-Orb-Timestamp}:{raw-body} — literal v1, colon, ISO timestamp (as a string, not a Unix epoch), colon, raw body bytes. Pass the raw request body; don't JSON.parse and re-serialize.
Signing key: per-endpoint signing secret from the Orb dashboard. Each webhook endpoint gets its own secret (NOT the account API key).
Replay protection: the docs don't mandate a tolerance window — Orb just delivers X-Orb-Timestamp and recommends consumers pick a threshold. The skill recommends a 5-minute window in handlers plus event-id idempotency for at-least-once delivery safety.
Common events: customer (customer.created, customer.credit_balance_dropped), subscriptions (subscription.created / .started / .ended / .plan_changed / .edited / .usage_exceeded), invoices (invoice.issued / .payment_succeeded / .payment_failed / .edited), data exports (data_exports.transfer_success).
Summary webhooks: opt-in variant covering the same events with smaller payloads (line_items omitted from invoices; customer/plan minified to identification fields). Same signature scheme. Skill recommends fetching full resources via API when detail is needed.
SDKs: orb-billing on both npm and PyPI (same package name on both). Neither SDK exposes a Stripe-style unwrap()/constructEvent() helper at the time of writing — manual HMAC verification is the canonical path. The SDK is declared in providers.yaml's sdks field so future review runs will catch stale pins.

Test plan

cd skills/orb-webhooks/examples/express && npm install && npm test
cd skills/orb-webhooks/examples/nextjs && npm install && npm test
cd skills/orb-webhooks/examples/fastapi && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt && pytest test_webhook.py -v
Verify the signature helpers reproduce the exact format from https://docs.withorb.com/integrations-and-exports/webhooks ("v1=" + HMAC-SHA256(secret, "v1:" + iso_ts + ":" + body).hexdigest())
Confirm event names match the live docs across both regular and summary webhook variants
Confirm the webhook-skills marketplace bundle now lists 38 skill paths (37 → 38)

Generation details

Generated via ./scripts/generate-skills.sh generate orb --config providers.yaml --model claude-opus-4-7
1 iteration (initial generation passed review on first pass)
Locally: npx hookdeck-cli listen 3000 orb --path /webhooks/orb

https://claude.ai/code/session_01NNTgQRJss1V7gyzzJ9rjnB

Generated by Claude Code

Adds a webhook skill for Orb (usage-based billing) with HMAC-SHA256 manual verification over `v1:{X-Orb-Timestamp}:{rawBody}`, plus runnable Express, Next.js, and FastAPI examples with tests. https://claude.ai/code/session_01NNTgQRJss1V7gyzzJ9rjnB

…tplace.json - README.md: add Orb row (alphabetically between OpenClaw and Paddle), linkified to official docs - providers.yaml: add orb entry with HMAC-SHA256/`v1:{ts}:{body}` scheme notes, common events, summary-webhooks variant, and `orb-billing` SDK declared for both npm and pip so the version-tracker covers it - .claude-plugin/marketplace.json: add `orb-webhooks` plugin entry (matching the per-skill pattern from PR #62) and append `./skills/orb-webhooks` to the `webhook-skills` bundle Skill content (skills/orb-webhooks/) landed in the previous commit via the generator. https://claude.ai/code/session_01NNTgQRJss1V7gyzzJ9rjnB

… merged since count was last set) The README claimed "38 skills" in two places (bundle install copy) but the bundle in .claude-plugin/marketplace.json now lists 40. The two extra skills (knock-webhooks #64, orb-webhooks #63) were merged to main after the bundle count was last updated, and the webhook-dx-audit PR branch picked them up via merge from main without re-syncing the count. Test plan item from PR #67 calls for "the bundle still totals 38 skills" - that's now 40. README updated in both occurrences (lines 111, 120) to match the actual bundle contents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add webhook-dx-audit skill Adds a meta/audit skill that reviews the developer experience of any platform sending outbound webhooks and produces a scored review with prioritized recommendations across signing, retries, event catalog, observability, local dev, and agent readiness. This is a different category from the existing repo skills (which receive, send, or verify webhooks): the audit evaluates how other platforms expose their webhook DX. README adds a new "Webhook DX & Audit Skills" section to make the distinction clear, and the bundle includes the new skill. * docs: broaden audit skill scope to event destinations The skill audits webhooks AND event destinations (SQS, RabbitMQ, Pub/Sub, EventBridge, Kafka), not just webhooks. Surface the distinction in the README section title and marketplace metadata, and add an event-destinations keyword for discovery. * feat(webhook-dx-audit): align with Event Destinations initiative The scope is webhooks AND event destinations, not just webhooks. Industry terminology is shifting (Stripe "event destinations" with direct EventBridge/Event Grid delivery, Shopify "Event Subscriptions"). Update the rubric and SKILL.md to assess the broader concept and benchmark against the Event Destinations initiative (https://eventdestinations.org). Changes: - SKILL.md: add scope paragraph referencing the spec and the terminology shift; clarify that the audit applies regardless of what the platform calls it. - rubric.md intro: add the spec reference and a summary of its required/recommended capabilities. - rubric category 5 (Security & authentication): split into webhook-specific signing criteria and a new "Destination-native auth" criterion covering IAM, service accounts, managed identities, and SASL/mTLS for non-HTTP destinations. Webhook criteria become Not assessed for queue-only platforms. - rubric category 6 (Delivery semantics & reliability): add a "Destination type breadth" criterion - the spec's central required capability. - rubric category 8 (SDKs & verification libraries): clarify that this stays webhook-focused because webhooks remain the most common destination and hand-rolled HMAC is where integrators get burned; non-HTTP destinations use native SDK auth and are scored under category 5. - methodology.md: add a step 0 to identify destination types up front, since that determines which criteria apply. * fix(webhook-dx-audit): tighten rubric and methodology from Stripe test run Test-drove the skill against Stripe (Pass 1, public surface only; 84/B). Stripe's three-destination story (webhooks + EventBridge + Event Grid) surfaced anchor ambiguities and methodology gaps. This commit applies all 22 ranked fixes from that test, batched because they cohere and each is small. rubric.md (10 fixes): - Cat 4: machine-readable spec anchors name OpenAPI 3.1 webhooks block, AsyncAPI, and per-event JSON Schema explicitly; a single polymorphic event envelope scores 1, not 2. - Cat 5: add explicit "if both webhooks AND native destinations are offered, score all six criteria" rule at the top. - Cat 5 destination-auth-options: clarify it scores configurable bearer/headers/OAuth2/mTLS independently of the signature scheme. - Cat 6 failure handling & auto-disable: split anchors into the two distinct gaps (post-retry behavior docs, auto-disable feature with reactivation). - Cat 6 failure alerting: limit scoring to push channels; dashboard widgets count under cat 10 (observability) instead. - Cat 6 manual replay: anchors recognize partial coverage (sandbox-only, UI-only, CLI-only) at level 1. - Cat 7 IaC: split community-vs-official cliff; community provider with current coverage can score 1, vendor-maintained scores 2. - Cat 10 latency: spell out the three signals (attempt count, next-retry time, per-attempt response latency). - Cat 12 push-to-agent: add a 1 anchor for partial coverage. - Cat 12 MCP: 1 anchor names "agent SDK or function-calling toolkit"; 2 requires MCP or a deliberate scoped surface. methodology.md (5 fixes): - Read-what-a-human-reads: distinguish evidence collection (any source) from scoring (HTML page). - Step 0 destination types: expand search-term list (endpoint, partner event source, stream, etc.). - Step 2 specs: name OpenAPI 3.1 webhooks block, AsyncAPI, per-event JSON Schema as the three things to look for. - Step 9 agent readiness: define "scoped sensibly" for llms.txt. - What good looks like: handle calibration circularity when the audit subject is itself a reference platform (calibrate against the broader Event Destinations bar instead). SKILL.md (2 fixes): - Add evidence-vs-scoring distinction for .md exports. - Document the Pass-1-only exit path: skip the human checklist, mark gated criteria Not assessed, proceed to scoring. scoring.md (2 fixes): - Add a second worked example with a Not-Assessed exclusion. - Add the renormalization formula and a worked example for when a category is fully dropped. report-template.md (3 fixes): - Access field examples signal Pass 1 vs Pass 1+2. - Caption under the scorecard clarifies Overall is weight-adjusted. - Findings section: always list every criterion, mark unreached ones Not assessed inline. program-mapping.md (3 fixes): - New row: endpoint health (auto-disable, alerting, reactivation) -> Hookdeck Event Gateway in front of the consumer endpoint. Addresses the highest-impact cat 6 gap most platforms have. - New row: OpenAPI lacks webhooks block -> webhook skill in hookdeck/webhook-skills as an agent-shaped substitute. - Broaden the existing webhook-skill row to acknowledge it can also surface cat 3/6 docs gaps to agent consumers, not just cat 12. Test artifacts (Stripe audit + findings doc) saved in /tmp; not committed. * feat(webhook-dx-audit): add workflow/scenario simulation criterion Investigation of Stripe, Shopify, and Paddle revealed a real maturity differentiator the rubric was not capturing: - Paddle ships named "Scenarios" (subscription_creation = 12 events, renewal = 7, etc.) that fire curated lifecycle sequences in one trigger. - Stripe has implicit prerequisite chaining (firing payment_intent.succeeded also fires payment_intent.created) plus CLI fixtures for scripted multi-step composition. - Shopify's webhook trigger is explicitly single-event only with fixed payload, recommending real Shopify actions for end-to-end tests. Add a "Workflow / scenario simulation" criterion under category 11 (local dev / testing) as a sibling to test/sandbox parity. Framed as a maturity differentiator, not a baseline: 0 is acceptable for most platforms; 1 covers Stripe-style fixtures or implicit chains; 2 covers Paddle-style named lifecycle scenarios. Update methodology step 8 with search terms (scenario, fixture, lifecycle, workflow, trigger sequence) and the three platform patterns as calibration anchors. * fix(webhook-dx-audit): three-state taxonomy + dual-score aggregation The rubric was collapsing three different states into one "Not assessed" label, which caused Pass-1-only grades to inflate (the Stripe re-audit went from 84/B to 85/A largely because of this). Split the states and adjust the math so the labels mean what they say. Three states (rubric.md): - Not Supported: capability should exist but doesn't. Score 0; numerator 0, full weight in denominator. (Existing 0 behavior; the label clarifies intent in evidence.) - Not Applicable: a logical rule excludes the criterion *as a concept* (e.g. Cat 5 destination-native auth on a webhook-only platform — there are no non-HTTP destinations to score auth for). Drop from both numerator and denominator. Critically NOT for "the platform should have this but doesn't" cases — those are Not Supported = 0. - Not Assessed: should assess but cannot reach right now (HITL gap, gated dashboard). Treated differently across the two roll-ups below; signals HITL would lift the score. Two roll-ups from the same per-criterion data (scoring.md): - Public-scope grade. "How good are the parts we could see?" Drops both Not Applicable and Not Assessed from numerator and denominator. Honest score over what was reachable. - Provisional minimum. "What's the floor if HITL never runs?" Drops Not Applicable only. Treats Not Assessed as 0 in numerator with full weight in denominator. HITL Pass 2 can only raise this number. When HITL completes (no Not Assessed criteria remain), the two scores converge on a single final grade. Report template (assets/report-template.md): - Scorecard now shows both columns per category and overall. - Header shows the headline number twice: Public-scope leads when no HITL is planned; Provisional minimum leads when HITL is planned (conservative bound the customer can rely on). - Coverage line under the scorecard counts how many criteria landed in each state. - Optional "Context" line in the frontmatter for audits that are existing-customer deliverables. - Recommendations template encourages "Concrete change (platform side) / Hookdeck offering (already available or in path)" framing for existing customers. Cat 5 (rubric.md): - Header now spells out three branches (webhook-only, non-HTTP- only, multi-destination). Per-criterion N/A clauses encode the logical rules. Stripe and Shopify both cited as multi- destination examples (Shopify ships HTTP + EventBridge + Pub/ Sub destinations). Cat 12 CLI for agents: - Was incorrectly allowed an N/A escape hatch in the prior draft. Reverted: a CLI is a recommended capability for any developer platform; absence is a gap (Not Supported = 0), not a logical exclusion. The 0 anchor language now makes this explicit. Other updates: - SKILL.md scope paragraph reflects Shopify is multi-destination, not webhook-only. Adds a one-paragraph summary of the three states + two roll-ups. - methodology.md adds a "Pick the right label" note explaining when to use each state and why the arithmetic differs. No customer-specific content; this work is generic to the skill. * fix(webhook-dx-audit): correct N/A definition examples (Cat 12 CLI is 0, not N/A) * feat(webhook-dx-audit): N/A logic table as single source of truth Add an explicit N/A logic table after the Categories list. Apply mechanically based on the destination types identified at methodology step 0; do not re-derive N/A from per-criterion text. The table lists the four possible step-0 facts and which criteria become N/A for each. Currently seven criteria (Cat 5 x 6 + Cat 8 x 1) can be N/A, all driven by two boolean facts (offers webhooks? offers non-HTTP destinations?). To make the table the single source of truth: - Stripped the redundant per-criterion "(Not Applicable if X)" clauses from Cat 5 (5 criteria) and Cat 8 (1 criterion). - Trimmed the Cat 5 header: removed the three-branch list (webhook-only / non-HTTP-only / multi-destination) since that's now encoded in the table. Kept the security-philosophy paragraph because it explains why the criteria differ by destination type. - Cat 8 verification helper 0 anchor reframed to acknowledge the upstream Cat 5 dependency (no signature scheme to verify -> the helper question is downstream of that gap). New criteria with N/A conditions should add a row to the table rather than introducing a new inline clause. Comments in the commit message of any future change should reference the table row affected. * feat(webhook-dx-audit): access-level table as source of truth for Not Assessed Add a deterministic table tagging which criteria require account-level (L1) or active-usage (L2) access to score, alongside the existing N/A logic table. Pass-1 audits at L0 (public only) now have a mechanical rule for which criteria become Not Assessed; the agent does not have to derive it per-criterion. Three access levels (rubric.md): - L0: public docs, SDK source, machine specs, llms.txt - L1: logged-in session; can read dashboard, settings, account-gated docs - L2: L1 plus at least one delivered event observed; delivery logs, retries, alerting visible in practice How L1 or L2 was obtained does not matter to the rubric. The auditor may have signed up themselves, used agent-driven signup (e.g. Stripe Projects, https://projects.dev), or been given access by the platform's operator. Future-proof: as agent-signup capabilities mature, more audits can declare L1/L2 without changing the rubric. ~12 criteria are tagged with required access levels (mostly Cat 1, 2, 7, 9, 10, 11). A few are "L1 or L0 if docs are thorough enough" - those remain agent judgment within a tighter frame. Other changes: - Report template's Access line dropped the "customer-provided access" wording (that was an Outpost-audit context leak) and now uses the L0/L1/L2 levels directly. A note clarifies that the means of obtaining access does not matter, only the level reached. - methodology.md "What good looks like" adds Stripe Projects (projects.dev) as an agent-driven provisioning calibration anchor for Cat 12 Action-layer scoring. This makes Not Assessed deterministic at the level the framework can reasonably enforce. The remaining agent judgment is limited to the few criteria explicitly tagged "L1 or L0 if ..." in the table. * fix(webhook-dx-audit): remove Hookdeck Outpost / Svix from calibration anchors Hookdeck Outpost and Svix are webhook delivery products platforms use to send events. Naming them as calibration anchors for sender-DX scoring was the wrong reference frame: integrators typically experience the *platform* (its docs, signing scheme, dashboard), not the delivery infrastructure embedded behind it. And this skill lives in hookdeck/webhook-skills, so naming Hookdeck specifically as a benchmark would be a conflict of interest. Use platforms integrators directly experience and benchmark against: Stripe as the primary anchor; SendGrid (ECDSA signing), GitHub (event taxonomy), Twilio (per-attempt status callbacks) for specific features. The Event Destinations initiative (eventdestinations.org) sets the broader floor. Hookdeck Outpost stays in program-mapping.md as a gap-closing recommendation for the platform side. Hookdeck Event Gateway tools stay in the "Hookdeck tooling" section of methodology.md as evidence-gathering aids during the audit (Console test URLs for inspecting payloads, CLI for receiving on localhost) - those are ingestion tools for the auditor, not benchmarks. * fix(webhook-dx-audit): rule for L0 scoring from absence; HITL headroom; modern docs platforms Three small refinements surfaced by the customer audit re-run. A1: rubric.md access-level table now explicitly authorizes scoring from L0 absence-of-documentation. If public docs are completely silent on a capability tagged L1 or L2, score 0 (Not Supported) from L0 rather than Not Assessed. The access-level requirement is for VERIFICATION of a documented capability; confirming non-existence is an L0 finding. Removes the only judgment call I had to make by interpretation during the re-run. A2: report-template.md scorecard now surfaces "HITL headroom: NN points" prominently between the table and the renormalization caption. Small headroom means HITL won't materially change the grade; large headroom means HITL is load-bearing. Easier to see than the gap in the dual-score columns. A3: Cat 12 push-to-agent criterion now defaults to Not Assessed (not 0) for docs hosted on modern platforms (Mintlify 2025+, Docusaurus 3+, GitBook, ReadMe) where Copy-as-Markdown and Open-in-X are typically JS-rendered. A non-browser fetch may not see the buttons; the right call is to defer to HITL rather than score 0 from rendering blindness. The first customer audit had to interpret all three of these rules; the framework now encodes them. * fix(webhook-dx-audit): six refinements from HITL audit feedback All surfaced by the HITL Pass 2 of a real customer audit. Each addresses a real ambiguity or editorial leak in the rubric. scoring.md grade bands: - Dropped the editorial "Reading" column entirely. Grade letters alone; the "band is a headline, not the point" note already carried the framing. Per audit feedback that "painful or risky"-style language doesn't belong in audit output. - Added explicit "do not write qualitative judgments of the grade into the audit report" line. - Added boundary-zone note for 28-32 (F/D) and 83-87 (B/A) — these are sanity-check zones where rounding shifts the band. Cat 4 payload shape guidance: - Relaxed the 2 anchor. Was "explicit thin-vs-fat rationale, OR standard envelope like CloudEvents". Now "envelope is consistent across all event types and documented". CloudEvents alignment and thin-vs-fat rationale moved to bonus signals worth citing in evidence but not required for 2. Most platforms with strong event catalogs don't formally address the meta-framing; the prior anchor over-penalized them. Cat 1 free/test access: - Reworked anchors to handle two underlying questions (does free tier reach config? are test deliveries free?) as a sliding scale. 1 covers the partial case (e.g. paid plan required for config but test deliveries free once configured, the audited customer's shape) which the prior binary 0/2 anchor missed. Cat 5 destination auth options: - Requires auth framing in docs for any score above 0. A platform shipping an arbitrary header passthrough field without documenting it as an auth mechanism now correctly scores 0; previously a strict reading allowed 1 for mere field existence. Audience scoping (new N/A logic Table 2): - Two audiences: developer-platform (default, where integrators are software engineers) and no-code-saas (where integrators are power users in a UI). For no-code-saas, Cat 7 IaC and Cat 11 workflow simulation and local-to-production transition become N/A. The third option "mixed" defaults to scoring all criteria unless the platform clearly serves one exclusively. - Audience declared at methodology step 0 and in report frontmatter alongside Access level. - Existing destination-type N/A logic becomes Table 1; audience becomes Table 2. Methodology audit voice guidance: - Explicit "stay factual, no editorial" rule. Per-category prose describes observation; reactions and synthesis go in the summary and recommendations. Examples cited: don't use "surprising", "impressive", "disappointing", "painful" in per-criterion or per-category text. These changes are derived from real auditor experience; the customer audit itself stays at /tmp (not committed). * fix(webhook-dx-audit): Cat 1 reframe - discoverability only, no business model The previous Cat 1 "Free/test access" criterion conflated two distinct concerns: (a) business model (is the platform/feature free to access), and (b) DX (can you test webhooks without producing real production activity). Per repeated audit feedback, business model is not a DX question and shouldn't penalize platforms that offer webhooks behind a paid plan. The testability question is already covered by Cat 2 Test event / trigger and Cat 11 Test / sandbox parity. Cat 1 changes: - Removed "Free/test access" criterion entirely. - Reframed "Signup friction to webhook config" as "In-product discoverability of webhook configuration". Explicitly handles plan-gating: plan-gated features are fine as long as the configuration surface is visible in product navigation. - Findability of webhook docs criterion now distinguishes deep-nav (1) from top-level (2) explicitly. Cat 1 now has 2 criteria, both focused on discoverability: 1. Can a developer find the webhook docs from the top-level docs or product nav? (the "docs side" of discovery) 2. From a signed-in account on any tier, can a user discover that the platform offers webhooks and find where they would be configured? (the "in-product side" of discovery) Also added a note clarifying what is NOT scored in Cat 1: pre- purchase evaluation (business model) and production-data isolation (covered in Cat 2 and Cat 11). Updated the access-level table to reflect the criterion name change and removal. Total rubric criterion count drops by one. Derived from real auditor experience on a paid-plan-gated platform where the prior rubric incorrectly penalized the business model. * feat(webhook-dx-audit): add idempotency criteria (Cat 3 + Cat 4) Cat 3 Documentation quality: - Added "Idempotency guidance" criterion. Scores whether the docs (a) identify the unique delivery ID developers should dedupe on (a top-level event ID in the payload, a webhook-id-style header, or equivalent), and (b) explain the high-level dedup pattern (check ID -> process -> store ID -> return success for duplicates). 0/1/2. - Removed "idempotency" from the Best-practices coverage anchor list since it now has its own criterion. Best-practices now covers: out-of-order delivery, consumer-side retries, timeouts. Cat 4 Event catalog & schema: - Added "Per-event unique ID" criterion. Scores whether the platform delivers a documented per-event unique delivery ID — in the payload, in headers (e.g. webhook-id, X-GitHub-Delivery, x-outpost-event-id), or equivalent. Distinct from any domain ID inside the payload (e.g. post.id is not a delivery ID). 0: none. 1: ID delivered but docs don't identify it as the dedup key. 2: clearly documented as the dedup key. The two criteria are explicitly linked: Cat 4 scores whether the ID exists in the schema; Cat 3 scores whether the docs teach how to use it. A platform can ship the ID (Cat 4 = 1) without documenting it (Cat 3 = 0) — exactly the pattern surfaced by the customer audit, where Outpost ships x-outpost-event-id on every delivery but the customer didn't surface this to their integrators. Net effect: rubric grows by two criteria. Platforms that document idempotency at a high level (signal mention) but don't identify the dedup ID will now score 1 instead of 2 on the new Cat 3 criterion, surfacing a specific actionable finding. * docs: expand HITL acronym on first use across audit skill HITL is used 16+ times across SKILL.md, rubric.md, methodology.md, scoring.md, and report-template.md without ever being expanded or defined. Readers outside AI/ML circles can struggle to parse it. Expand to "human-in-the-loop (HITL)" on the first occurrence in each file so the abbreviation has a definition before subsequent uses. Subsequent uses stay as HITL once defined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: rename Cat 3 "Documentation quality" to "Implementation guidance" "Documentation quality" reads as a sweeping judgment on the platform's docs, but the 5 criteria the category scores (verification walkthrough, processing & handler guidance, idempotency guidance, best-practices coverage, accuracy & freshness) all measure implementation-guidance content for integrators consuming webhooks. The event catalog and API reference are scored separately under Cat 4 "Event catalog & schema". A platform with a comprehensive event catalog but no handler patterns or signing walkthroughs scores 0% on Cat 3, which reads confusingly because their webhook docs do exist. The new name makes it clear that Cat 3 scores integration-implementation content specifically. Sweep applied via replace_all to rubric.md (category list and section heading), scoring.md (weight table), and report-template.md (scorecard row). 4 files, 4 lines net. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: scope Cat 3 intro to webhooks explicitly After renaming Cat 3 from "Documentation quality" to "Implementation guidance", the previous intro ("The webhook section as a developer reads it, not the marketing page") no longer fit. The contrast with marketing was meaningful when the name was generic; under the new name, implementation guidance is obviously not marketing. The new intro is explicit about scope. Cat 3's 5 criteria are webhook-specific in practice (HMAC verification, 2xx HTTP handler patterns, dedup ID delivered with HTTP webhooks). Non-HTTP destinations (SQS, Pub/Sub, RabbitMQ, etc.) rely on destination- native SDKs; their integration-guidance equivalents are scored under Cat 5 (destination-native auth) and Cat 6 (delivery semantics). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: clean up Cat 2, 5, 7, 11 intro lines Cat 2 Onboarding & first event: drop "verified" from "I received a verified event" since verification (HMAC) is a webhook-only concept. For non-HTTP destinations, the event is just received, not verified in the same sense. Cat 5 Security & authentication: replace the editorial intro ("The capability most often weak and most consequential") with a scope description that mirrors the rest of the rubric: HTTP webhooks (signing, replay protection, secret rotation) and non-HTTP destinations (destination-native auth). Weight note kept. Cat 7 Setup surfaces: "webhooks" -> "webhooks and event destinations" to match the audit's full scope (the category's criteria already cover both). Cat 11 Local dev: drop the vague "The program calls this out explicitly" trailing sentence (unclear what "the program" referenced). Replace with an explicit scope note: criteria focus on HTTP webhooks (localhost tunnels and replay); non-HTTP destinations rely on cloud-provider emulators (LocalStack, GCP Pub/Sub emulator, Azure Service Bus emulator) as equivalents. The other 8 categories' intros were already clean or were updated previously (Cat 3 just landed in 933b724 and 37761ff). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: broaden methodology scope and fix stale Cat 5 example in scoring Methodology step 3: "Read the webhook docs properly" was webhook-only in framing but the step's scope covers Categories 3, 4, 5, 6. Categories 4 (event catalog), 5 (security including non-HTTP destination auth), and 6 (delivery semantics across all destination types) cover event destinations beyond webhooks. Broaden the step title and add destination-type-breadth and per-destination-native- auth as evidence to capture. Methodology step 5: "API endpoints for webhook CRUD" and "Terraform provider and whether it covers webhooks" narrowed Category 7 to webhooks only. Cat 7's intro now reads "webhooks and event destinations"; the step now matches: webhook and destination CRUD, and Terraform coverage of webhooks and destinations. Scoring Example 1: "Security has 5 criteria" was stale; Cat 5 has 6 criteria (the destination-native-auth criterion was added but Example 1 was never updated). Examples 2-4 already use 6 criteria. Example 1 now matches: 6 criteria, score 2/1/1/0/2/2, sum 8, max 12, both roll-ups 67%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: report-template summary scope and program-mapping link format Line 10 summary instruction said "the platform's webhook DX" but the audit's scope is webhooks AND event destinations (SQS, Pub/Sub, RabbitMQ, EventBridge, Kafka, Azure Event Grid). A literal reader might omit non-HTTP destination coverage. Broaden to "webhook and event-destination DX". Line 60 referenced "(see program-mapping)" without a file extension or backticks, reading as a placeholder. Line 43 already references `rubric.md` with backticks; match that pattern: "(see `program-mapping.md`)". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: catch remaining Cat 3 references after rename Two spots survived the original "Documentation quality" sweep because they used lowercase or paraphrased forms. SKILL.md line 45: agent-responsibilities list said "documentation quality" (lowercase) which the title-case sweep missed. Rename to "implementation guidance" to match the new category name. program-mapping.md line 16: "category 3/6 documentation gaps" was ambiguous after the rename. Replace with "category 3 implementation- guidance and category 6 delivery-semantics gaps" so both category references are explicit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: scope Summary list to webhook-surface features only The Summary instruction told the writer to summarize the platform's webhook DX but did not say what counts as a webhook-surface feature. Audit agents reading the instruction listed positive platform signals they noticed (OpenAPI specs, MCP servers, CLIs) without distinguishing which ones actually apply to the webhook and event-destination surface. This produced misleading Summaries where, for example, an OpenAPI 3.1 spec without a `webhooks` block was listed as evidence of a working webhook surface even though the spec does not carry webhook payload contracts (which scores 1 under Cat 4 for that exact reason). The customer reads the listed item as a strength, then later finds the caveat that excludes it. The instruction now scopes the list: include only items that contribute to the webhook and event-destination surface. An OpenAPI spec without a `webhooks` block, an MCP server without webhook tools, or a CLI that does not manage webhook configuration are platform features that do not apply in the Summary; they belong in their respective category findings, with the scores that reflect their limitations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: restructure Cat 12 Action layer (combine CLI+MCP, add API access) Cat 12's Action-layer scoring had two issues: 1. CLI and MCP were scored as separate criteria, requiring each surface to exist for full credit. In practice an agent only needs one agent-shaped interface beyond the raw API; either suffices. 2. The MCP criterion accepted "an MCP server exists" without requiring webhook scope, so a platform-wide MCP that excludes webhook management could score 2 even though Cat 12 measures webhook agent-readiness. The Ordinal audit hit this tension (hosted MCP for the core API but no webhook tools, scored 2). 3. The foundational layer (whether the webhook configuration API is publicly callable by an agent) was implicit, scattered across Cat 7 API configuration and Cat 4 machine-readable spec. The agent-readiness view of the API was not captured as its own signal in Cat 12. The Action layer now has two criteria: - API access for agents: foundational. Documented public HTTP API for webhook configuration. Overlaps with Cat 7 / Cat 4 but captures the agent's-eye view distinctly. 0 if dashboard-only or undocumented, 1 if SDK-only, 2 if documented HTTP API. - CLI or MCP for the webhook surface: higher-leverage. CLI or MCP (either suffices) covering webhook management with structured output / agent-friendly tools. 0 if neither covers webhooks (explicitly including platform-wide MCPs without webhook tools), 1 if partial coverage, 2 if full. Methodology step 9 Action sentence updated to walk the new criteria. Cat 12 still has 6 criteria total; weight unchanged. Existing audits that scored MCP at 2 because a platform-wide MCP exists should re-evaluate under the new combined criterion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: instruct HITL to capture and share a real delivery payload Audits that only have docs evidence produce conditional recommendations ("in default mode the header is X; in Standard Webhooks mode it's Y"). A single actual delivery payload (request headers + body) lets the auditor recommend directly: name the specific signature header, dedup ID, timestamp format, and custom headers that are actually in use. The Ordinal audit hit this exact case: the audit framed signing conditionally because HITL had not shared an example delivery. Once Phil shared a screenshot of a real delivery (Standard Webhooks mode active; webhook-signature, webhook-timestamp, webhook-id headers present; x-api-key set via the custom-headers feature), the recommendation became concrete: "document the webhook-signature you're already sending" rather than "add a signature scheme". Two updates to the audit skill: Roles section: add a "Critical HITL capture: an example delivery payload" paragraph explaining what to capture and why. Whenever the human fires a test event or observes a real delivery, they capture and paste back the full delivery payload (all request headers and the body) so the auditor can score signing, idempotency, event schema, and destination-auth criteria against the actual delivery shape rather than docs alone. How an audit runs step 3 (the HITL checklist examples): add a third example to the checklist, phrased as "capture and paste back the full request payload of one real delivery, including all headers and the body, so I can name the actual signature header, dedup ID, and any custom headers in the recommendations". Future audits should now produce concrete signature/dedupe/auth recommendations whenever HITL is available, since the checklist specifically requests the payload capture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: require verified audience designation with cited signals The audience declaration drives the audit's N/A logic (Table 2 in the rubric) but the methodology had it as a brief read-and-judge step with `developer-platform` as a silent default. In practice agents either took the default without verification or relied on HITL Pass 2 to set the designation, leading to misframed audits when the default didn't match reality. The Ordinal audit hit this: HITL Pass 2 declared no-code-saas without site verification; the no-code designation triggered the Cat 11 audience-N/A logic; later correction to mixed required re-scoring two criteria. A site-verified audience designation at audit start would have produced the correct framing from Pass 1. Three updates: Methodology step 0: explicit checklist of signals to verify the designation against (hero copy, nav structure, testimonials, pricing tiers, API prominence, onboarding CTA framing). Requires citing at least three signals with quoted marketing copy. `mixed` listed as a first-class option, with guidance to prefer it when the platform clearly serves more than one audience. The `developer-platform` default is allowed only as a Pass-1 fallback when the homepage cannot be reached; Pass 2 must verify. SKILL.md "Audience matters" paragraph: `mixed` named as one of three options (not just a fallback). Adds the verification requirement and points at the methodology checklist. Notes that mixed audiences score by judgment per criterion. report-template Audience header: now requires inline citations of the signals that informed the designation (e.g. "mixed (primary marketing teams per hero copy 'X'; secondary agencies via 'Y' nav; tertiary developers via mid-page API mention)"). The bare designation alone is no longer sufficient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: Cat 3 ingest-at-scale guidance becomes first-class Cat 3 "Processing & handler guidance" was scored against a generic anchor ("covers the handler lifecycle"). Platforms that mention "respond fast" and "process async" in passing could score 2 without teaching the actual ingest-verify-queue pattern, naming their response timeout, or pointing integrators at concrete architectures. This is the criterion most directly tied to Hookdeck Event Gateway's value prop (and to cloud-native alternatives like AWS EventBridge + API Gateway, GCP Pub/Sub + serverless function), but the rubric didn't surface that connection. Three updates: rubric.md Cat 3 Processing & handler guidance: criterion text now spells out the ingest-verify-queue pattern as the production-traffic contract integrators need: acknowledge quickly with 2xx, verify the signature, queue work to a background processor so burst traffic and slow downstream work do not exceed the timeout. 2-anchor now requires the platform to (a) name the timeout window, (b) explain the pattern, and (c) point at concrete reference architectures (Hookdeck Event Gateway, cloud-native EventBridge+API Gateway or Pub/Sub+serverless function, or queue+worker on the integrator's own infrastructure). 1-anchor covers partial coverage. methodology.md step 3 (Read the webhook docs): explicit prompt to look for the response timeout window, the ingest-verify-queue pattern, and architecture references. Tactics search-term list adds "timeout", "respond", "async", "queue", "EventBridge", "Pub/Sub", "Event Gateway", "ingest". program-mapping.md: new row mapping the ingest-at-scale gap to Hookdeck Event Gateway as the integrator's ingest layer (or cloud- native alternatives EventBridge+API Gateway, Pub/Sub+serverless function). Distinguished from the existing endpoint-health row: that one is about reliability for an existing handler, this one is about teaching integrators the pattern itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: phrasing - 'ingest-at-scale' -> 'ingest reliably' The pattern matters at any volume, not just at scale. A 5-second timeout kills a delivery whether the integrator is handling 1 req/sec or 1000. 'Reliably ingest' captures the goal (don't time out, don't lose deliveries) better than 'at scale', which implies high volume specifically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add PLAN-v2.md for review Captures the v2 pass: migrate audit format to structured YAML (primary driver: cloud agent + public website for URL-submitted audits), consolidate v1's rubric and methodology learnings, and preserve Ordinal's HITL Pass 2 evidence so it does not need to be re-collected. Seven phases sketched: schema design, consolidate v1 learnings, migrate audit template to YAML, update SKILL.md and methodology, preserve and port Ordinal HITL evidence, decide downstream backwards-compat path, re-run Ordinal under v2, cascade to downstream skill. Includes a complete inventory of HITL-derived facts to carry forward (active usage observations, signing and delivery shape from the captured payload, audience verification, scoring decisions). Cross-check this list at every phase boundary. Plan is for review before execution; commit per phase once execution starts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: PLAN-v2 - all-in on YAML upstream, lockstep downstream cascade Per direction: upstream skill emits YAML only. No Markdown audit output, no renderer, no Markdown template, no transitional backwards-compat phase. The downstream outpost-customer-audit-report skill is updated in lockstep to consume YAML. Changes from the previous PLAN-v2 draft: - Target layout drops renderers/ and assets/report-template.md - Phase 2 simplifies: produce assets/report-template.yaml and delete the Markdown template; no renderer - Phase 3 SKILL.md update: explicit YAML-only output - Phase 5 (was "decide on backwards-compatibility") removed; no decision to make - downstream cascades in lockstep - Phase 5 (new, was Phase 6) re-runs Ordinal; produces audit.yaml only; v1 audit.md gets archived to customers/ordinal/archive/ - Phase 6 (new, was Phase 7) cascades to downstream skill in hookdeck-skills-internal: input becomes YAML, customer report stays Markdown (still the customer-facing deliverable) Customer report format kept as Markdown for now since it is the customer-facing artifact. Open question for review: if the cloud agent's website ends up rendering customer reports as well, that decision can flip to YAML in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: PLAN-v2 - resolve customer report format decision Customer report stays Markdown. The cloud agent has no current plan to render customer reports; the customer-facing artifact is sent or shared as a file. Decision settled, not an open question. Open Question 3 ("Customer report format") removed from the open list and added to a new "Resolved decisions" section at the top of the resolved choices that v2 execution should not relitigate (upstream YAML-only, customer Markdown, downstream lockstep). Remaining open questions renumbered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: PLAN-v2 - add commit refs, schema sketch, open-question recommendations Three additions to make PLAN-v2 self-sufficient for a fresh agent picking up the v2 work cold: Phase 1 consolidation list: each item now has the v1 commit hash inline so the rationale is one git show away. Editorial qualifier rules also annotated as downstream-only with a pointer at the internal repo's methodology. Schema sketch (illustrative): inline YAML showing the rough shape of audit.yaml and hitl-evidence.yaml. Field names, nesting, status enums, and scoring decision records all present. Marked as a starting point that Phase 0 refines against the schema linter; not authoritative. Open question recommendations: each open question now has a "Recommendation:" line so a fresh agent has a default to push back against rather than picking from scratch (schema tooling, YAML lib and lint config, cloud-agent field reservation, archive location; re-audit timing already had one). Open questions remain genuine questions; the recommendations are starting points. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(webhook-dx-audit): schema and lint tooling for v2 audit YAML PLAN-v2 Phase 0. Adds the JSON Schema (Draft 2020-12, authored in YAML) that defines the v2 audit format and a Node-based linter that validates audit or hitl-evidence YAML against it. - schema/audit.schema.yaml: full audit shape with locked CategoryId and CriterionId enums, status taxonomy, dual-score support, embedded HITL evidence, and reserved cloud-agent fields. - schema/hitl-evidence.schema.yaml: companion shape for the standalone hitl-evidence pre-load file. - schema/*.example.yaml: illustrative Stripe-shaped examples that validate against the schemas. - schema/README.md: file layout, status taxonomy, dual-score handling, and how to run the linter. - scripts/lint-audit.mjs + package.json: ajv + js-yaml CLI that auto-detects which schema to use and reports JSON-pointer paths on failure. Examples use Stripe (the methodology calibration anchor) rather than any customer; the public repo carries no customer-identifying content. Refs: skills/webhook-dx-audit/PLAN-v2.md Phase 0. * docs: PLAN-v2 - drop added_by_report from upstream schema spec The flag belongs in the downstream report skill's own schema; upstream audit stays free of downstream concepts. Aligns the plan with the v2 schema that landed in 7f23b1c. * docs(webhook-dx-audit): consolidate v1 rubric and methodology learnings PLAN-v2 Phase 1. Walked rubric, methodology, scoring, and program-mapping end to end against the eleven enumerated v1 commits; every rule reads cleanly in isolation. Three categories of edit landed: - YAML field-name translation. Prose references to v1 Markdown structure ("the report's Access frontmatter line", "the report's Access limits", the audit's `Audience:` header line) now point at v2 YAML fields (`audience.designation`, `audience.signals`, `access_limits`, `summary`, `recommendations`). - Editorial rules tightened. The "Stay factual; no editorial" tactic now carries the two specific sub-rules from the downstream methodology Section 3: no company-stage commentary and no unanchored qualifiers. These apply to upstream audit prose too; lifting them keeps audit-side voice consistent with how the customer-facing report reads them. - Summary scoping rule added as a methodology tactic. The Summary should list only platform features that contribute to the webhook and event-destination surface; OpenAPI specs without `webhooks` blocks, platform-wide MCPs without webhook tools, and CLIs that do not manage webhook config belong in their respective category findings, not in the summary. The other Phase 1 items (Cat 3 rename and intro, Cat 12 restructure, Cat 2/5/7/11 intro cleanups, audience verification at Pass 1, HITL acronym expansion on first use, methodology steps 3 and 5 broadened to webhook AND event destinations, Cat 5 six-criteria example correction, program-mapping reliable-ingestion row) were already in place; verified without further edits. HITL payload capture lives in SKILL.md and is Phase 3 territory. Refs: skills/webhook-dx-audit/PLAN-v2.md Phase 1. * feat(webhook-dx-audit): migrate audit template to YAML PLAN-v2 Phase 2. Replaces assets/report-template.md (deleted) with assets/report-template.yaml: a structural skeleton enumerating all 12 categories and all 54 criterion IDs with placeholder values that lint clean against schema/audit.schema.yaml. Inline comments explain each field's purpose, valid values, and the rubric/methodology section that anchors it. The template carries the v2-specific guidance directly: - summary scoping rule (only items contributing to the webhook surface). - editorial rules (no company-stage commentary, no unanchored qualifiers). - status taxonomy quick reference (scored / not_supported / not_applicable / not_assessed). - Cat 5 Table 1 reminder and Cat 7/11 Table 2 reminder for N/A logic. - Cat 12 reminder not to re-score Cat 4 / Cat 8 surfaces. The lint script now covers the template alongside both example files so schema drift catches it. Refs: skills/webhook-dx-audit/PLAN-v2.md Phase 2. * docs(webhook-dx-audit): SKILL.md to YAML-only flow PLAN-v2 Phase 3. Frontmatter: description now states "produces a structured YAML audit file" and points at schema/audit.schema.yaml so callers understand the output shape from the trigger. Version bumped to 0.2.0 to match the schema, template, and example files. Roles section: HITL captures fill structured fields - delivery payload to hitl_evidence.delivery_payload_capture, in-product observations as findings[].criteria[].evidence strings keyed by criterion id, scoring decisions as hitl_evidence.scoring_decisions records. Explicitly bars free-form Pass-2 narratives in the summary; the dual-score data lives in grade.public_scope / grade.provisional_minimum and the closed Pass-2 criteria live in passes.pass_2.closed_criteria. How an audit runs: step 0 scaffolds the audit YAML from the template and leads with the default flow (Pass 1 unattended, Pass 2 HITL prompted by the agent at step 4). Pre-loaded HITL evidence is called out as the exception. Step 4 collects HITL evidence and writes a sibling hitl-evidence.yaml so the next re-audit can pre-load it; when step 0 pre-loaded a companion file, step 4 updates it in place. Steps 2-6 reference the YAML fields they populate (audit.findings, audit.scorecard, audit.grade, audit.hitl_evidence, audit.summary, audit.recommendations) and include a lint step before handoff. Output and Reference files: structured YAML output described by field group; new entries for schema/audit.schema.yaml and schema/README.md; template reference updated to report-template.yaml. Path conventions left to the caller: no customers/<name>/... prescribed in upstream prose; the companion file is described as a sibling of the audit file. Acceptance: no references to "fill in the Markdown template" remain; no "written review (Markdown)" framing; SKILL.md reads coherently against the YAML-only flow. Refs: skills/webhook-dx-audit/PLAN-v2.md Phase 3. * docs(webhook-dx-audit): methodology tightenings from v2 Ordinal run Two refinements surfaced during the v2 Ordinal dress rehearsal (downstream commit 7621577 on feat/v2-ordinal-audit): 1. Generalize the JS-rendered-nav carve-out. The rubric currently notes this only for Cat 12 push-to-agent doc actions, but Cat 1 findability hits the same issue when the docs site renders its top nav in JavaScript (Mintlify, Docusaurus 3+, GitBook, ReadMe). Methodology step 1 now instructs the agent to default Not Assessed from a plain fetch and verify in a browser during HITL rather than scoring 0 from an empty fetch. 2. Tactic: fetch the OpenAPI spec directly when scoring Cat 4 machine-readable-spec. An LLM-summarized doc read can confuse the top-level `webhooks` key (per-event payload contracts, the 3.1 feature this criterion scores) with a Tag named "Webhooks" that groups CRUD endpoints under `/webhooks`. Curl the spec and check `len(spec.webhooks)` programmatically; the same applies to AsyncAPI presence and per-event JSON Schema files. Both edits are methodology-only; rubric anchors, schema, and template unchanged. * feat(webhook-dx-audit): require `why` on recommendations + reviewer-artifact rule Two methodology refinements surfaced during the downstream Outpost-customer review of the v2 Ordinal report: recommendations consistently lacked an articulated user-facing benefit, and HITL-captured deliveries muddied operator-side practice with reviewer-introduced artifacts. Schema: - Recommendation gains a required `why` field, separate from `body`. `body` describes what to change; `why` names the integrator-side benefit and the user-facing pain the gap creates. The split is load-bearing so downstream renderers can use each independently and so recommendations read as arguments rather than orders. - schema/audit.schema.example.yaml updated to show `why` on both illustrative Stripe recommendations. - assets/report-template.yaml carries a `why: TBD` placeholder. methodology.md: - New "Writing recommendations" section codifies the benefit-not-rule framing, anchoring to specific user-facing pain, and the rule against duplicating benefit framing in `body`. Calls out the Cat 6 destination-type-breadth pitfall: phrase the recommendation around adding non-HTTP destination types (SQS, Pub/Sub, EventBridge, Kafka, Event Grid) with the integrator benefit named, not as "rename your HTTP endpoint" - renaming an endpoint does not change what is delivered. Stripe's evolution is the cited example: existing webhook product stayed in place; new destination types were added alongside. - New "Distinguishing reviewer artifacts from operator-side practice" section: HITL captures often involve reviewer-configured headers, test webhooks, and synthetic deliveries. Anything the reviewer set up to enable the capture is not evidence of operator behavior. Findings and recommendations citing observed deliveries must anchor on operator- controlled docs, API surface, or in-product copy; reviewer-set custom headers (the borderline case that surfaced this) must not be cited as evidence the operator surfaces a feature in practice. Annotate borderline HITL records in `audit.hitl_evidence.other_observations` keyed by criterion id so reviewer artifacts cannot be mistaken for operator state. SKILL.md step 6 now points at "Writing recommendations" so the `why` requirement is discoverable from the workflow walk-through. Verification: `npm run lint:file` clean on schema/audit.schema.example.yaml and assets/report-template.yaml against the updated schema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(webhook-dx-audit): target_scores + depends_on + effort on Recommendation; projected grade Add three optional Recommendation fields and a projected grade roll-up so downstream renderers can show current-vs-potential impact, sequence recommendations by hard dependencies, and surface coarse implementation effort on each one. Schema (schema/audit.schema.yaml): - Recommendation gains three optional fields: - `target_scores`: list of {criterion_id, target_score, note?} records naming which criteria the recommendation lifts to which 0/1/2 score. The 2-anchor is honest: documenting a single existing auth option reaches 1 (one documented option), not 2 (multiple). When multiple recommendations target the same criterion, downstream renderers take the max. - `depends_on`: list of recommendation IDs that must land before this one delivers its full value. Hard dependencies only (e.g. Rec 2's verification step references Rec 1's signing documentation). Soft sequencing preferences belong in body or summary. - `effort`: coarse implementation effort, enum docs|s|m|l. `docs` is one page or section with little or no engineering work; `s` is a small product change (a button, a new endpoint, a config knob) on the order of days; `m` is a new feature surface on the order of weeks; `l` is an architectural change on the order of months. - Two new top-level $defs back the fields: - TargetScore: required {criterion_id, target_score}, optional note - EffortLevel: enum docs|s|m|l with calibration description - Grade gains an optional `projected` GradeRollup. Present when at least one recommendation has `target_scores`. Computed by taking the max target_score across recommendations per criterion (current score carries forward for criteria no recommendation touches), rolled up via the standard category weighting. Lets downstream renderers display the audit as current vs potential. - ScorecardEntry gains an optional `projected_pct` per category, present when `grade.projected` is present. Powers a side-by-side scorecard in downstream renderers. Methodology (references/methodology.md): - "Writing recommendations" gains a "Populating `target_scores`, `depends_on`, `effort`" sub-section covering how to choose each recommendation's targets honestly (don't over-promise 2 when the rubric anchor isn't reachable), how to scope dependencies (hard only), and how to calibrate effort (against the EffortLevel enum, judged on what the platform team would do, not on the operator's team size). - "Computing `grade.projected` and `scorecard[].projected_pct`" sub-section codifies the projection rule: per criterion, take the max target_score across all recommendations targeting it; criteria no recommendation touches carry their current score forward; roll up via scoring.md; carry N/A criteria the same way. SKILL.md step 6 updated to instruct populating the three new fields on every recommendation that closes a scored gap, and to compute the projected grade when at least one recommendation has target_scores. assets/report-template.yaml gains commented-out placeholders for the new fields with calibration notes. schema/audit.schema.example.yaml updated: Stripe illustrative audit now shows `grade.projected` at 95% (A), per-category `projected_pct` on the two categories the recommendations affect (Cat 5 Security 83 -> 100; Cat 7 Setup 75 -> 100), and the two recommendations carry `target_scores` and `effort`. The Rec 1 (Terraform provider) example also demonstrates the `effort: m` calibration for product work; Rec 2 (auth docs framing) demonstrates `effort: docs`. Backwards compatibility: all four new fields (target_scores, depends_on, effort, grade.projected, scorecard[].projected_pct) are optional. Audits that pre-date this change still lint clean. The projected roll-up only appears in downstream renderers when target_scores are populated. Verification: lint clean on schema/audit.schema.example.yaml and assets/report-template.yaml against the updated schema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(webhook-dx-audit): clarify effort reflects platform-context cost Effort ratings on recommendations were calibrated implicitly against a from-scratch baseline, but the audit reviews a specific operator's surface and the operator's actual cost depends on what their delivery backend ships. A recommendation that would be `l` for a platform building delivery primitives themselves can drop to `m` or `s` when the underlying capability is shipped by a backend like Outpost, Svix, or Convoy. Updated the "Populating `target_scores`, `depends_on`, `effort`" section in references/methodology.md to make this explicit. The rater consults `audit.context` when calibrating effort: when the context names a delivery backend that ships the capability being recommended, rate the remaining surfacing work (dashboard / API / docs), not the from-scratch implementation cost. When the audit has no platform context, rate from-scratch as the safer default and flag the assumption in the audit `summary` so a downstream skill with platform knowledge can override. No schema changes. The Stripe example in schema/audit.schema.example .yaml keeps its `m` rating on the Terraform-provider recommendation because Stripe builds the provider themselves; no delivery-backend translation applies there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(webhook-dx-audit): API-endpoint disambiguation; soften destination-type rename rule Two related methodology refinements surfaced during HITL review of the Ordinal report. API-endpoint disambiguation (Tactics section): The word "endpoint" is overloaded in webhook contexts. It can mean the integrator's HTTP receiver (the URL they expose to receive webhook deliveries) or the platform's management API endpoint (the route integrators call to create / list / delete webhook destinations). A bare `POST /webhooks` could read as either. Reviewers in this project hit the ambiguity twice on Rec 3 of the Ordinal v2 report: once on "rename POST /webhooks to POST /event-destinations" (read as renaming an integrator-receiver URL, which is nonsensical) and again on "your current POST /webhooks stays in place" (same ambiguity). New Tactics rule: whenever the audit or recommendations name an HTTP route, qualify it with the role it plays. "the destination-creation API endpoint POST /webhooks"; "the webhook-management API at /api-reference/webhooks/"; "the integrator's webhook-receiving URL". A reader who cannot tell at a glance which side of the wire an endpoint sits on will misread the recommendation. Destination-type-breadth rename rule (Writing recommendations): The previous version of this rule said "do not phrase the recommendation as 'rename POST /webhooks to POST /event-destinations'; renaming an HTTP endpoint does not change what is delivered, and the framing confuses an API design decision with the underlying capability." That was overcorrecting. Once the destination-creation API endpoint is extended to create non-HTTP destinations (SQS, Pub/Sub, EventBridge, Kafka, Event Grid), the endpoint name `POST /webhooks` arguably misrepresents what it does, and renaming to `POST /event-destinations` or `POST /destinations` is a valid API-design refinement that signals the broader scope to integrators reading the docs. The rule now reads: lead with the capability addition (not the rename) as the primary recommendation; surface the API rename as an optional secondary refinement; recommend keeping the original endpoint as an alias for backwards compatibility. Together these two rules sharpen the audit and report voice when recommending category-6 destination-type-breadth changes specifically, and any other recommendation that names an API endpoint generally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(webhook-dx-audit): "ingest-verify-queue" is a practice, not a pattern; type the implementations "Ingest-verify-queue" is not a canonical industry-named pattern with formal references. It is best-practice shorthand for the goal of acknowledging fast, verifying, and handing off to async processing. Calling it a "pattern" overclaims; the rubric, methodology, and program-mapping now call it a "practice". The four items previously grouped as "concrete implementations of the pattern" are different categories of thing, and the umbrella implied each was itself a pattern. Each now carries its own type: - Hookdeck Event Gateway: a managed solution that ships the practice out of the box - AWS EventBridge + API Gateway: a cloud-native composition - GCP Pub/Sub + a serverless function: a cloud-native composition - Queue + worker on the integrator's own infrastructure: a self-hosted setup Only the queue + worker option is genuinely a pattern in the generic sense; the others are products and cloud compositions. Rubric Cat 3 criterion updated end to end with the new terminology. Methodology step 3 and program-mapping Cat 3 row updated to match. No schema changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update bundle count to 40 skills (knock-webhooks + orb-webhooks merged since count was last set) The README claimed "38 skills" in two places (bundle install copy) but the bundle in .claude-plugin/marketplace.json now lists 40. The two extra skills (knock-webhooks #64, orb-webhooks #63) were merged to main after the bundle count was last updated, and the webhook-dx-audit PR branch picked them up via merge from main without re-syncing the count. Test plan item from PR #67 calls for "the bundle still totals 38 skills" - that's now 40. README updated in both occurrences (lines 111, 120) to match the actual bundle contents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude added 2 commits May 13, 2026 18:34

feat: add orb-webhooks skill

1547281

Adds a webhook skill for Orb (usage-based billing) with HMAC-SHA256 manual verification over `v1:{X-Orb-Timestamp}:{rawBody}`, plus runnable Express, Next.js, and FastAPI examples with tests. https://claude.ai/code/session_01NNTgQRJss1V7gyzzJ9rjnB

leggetter merged commit 22697e2 into main May 14, 2026
6 checks passed

leggetter deleted the feat/orb-webhooks branch May 14, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add orb-webhooks skill#63

feat: add orb-webhooks skill#63
leggetter merged 2 commits into
mainfrom
feat/orb-webhooks

leggetter commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leggetter commented May 13, 2026

Summary

What's included

Notes

Test plan

Generation details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants