|
| 1 | +--- |
| 2 | +name: new-sep |
| 3 | +description: >- |
| 4 | + Scaffold a sep-NNNN.yaml requirement-traceability file for the MCP |
| 5 | + conformance repo from a SEP PR's spec diff. Runs the new-sep CLI, then |
| 6 | + parses the modelcontextprotocol/modelcontextprotocol spec diff to populate |
| 7 | + `requirements[]` with the RFC 2119 sentences and proposed check IDs. |
| 8 | +argument-hint: '<sep-number>' |
| 9 | +--- |
| 10 | + |
| 11 | +# new-sep: SEP traceability YAML scaffolding |
| 12 | + |
| 13 | +You are bootstrapping a `sep-NNNN.yaml` file for a new SEP in the MCP conformance repo. The output is the requirement-traceability file specified by SEP-2484: a YAML that maps each normative sentence from the SEP's spec diff to a `check:` ID (testable) or an `excluded:` reason (not testable). The CLI gets the skeleton; you fill in the rows by reading the spec diff. |
| 14 | + |
| 15 | +## Step 0: Pre-flight checks |
| 16 | + |
| 17 | +Before doing anything else, verify GitHub CLI authentication: |
| 18 | + |
| 19 | +```bash |
| 20 | +gh auth status 2>&1 |
| 21 | +``` |
| 22 | + |
| 23 | +If this fails, stop immediately and tell the user: |
| 24 | + |
| 25 | +> GitHub authentication is required for this skill. Please run `gh auth login` first, then re-run. |
| 26 | +
|
| 27 | +Verify you're running inside the conformance repo: |
| 28 | + |
| 29 | +```bash |
| 30 | +test -f package.json && jq -r '.name' package.json |
| 31 | +``` |
| 32 | + |
| 33 | +The name should be `@modelcontextprotocol/conformance`. If not, stop and ask the user to `cd` into the conformance repo first. |
| 34 | + |
| 35 | +## Step 1: Parse arguments |
| 36 | + |
| 37 | +Extract from the user's input: |
| 38 | + |
| 39 | +- **sep-number** (required): the SEP number, e.g. `2164`. This is also the PR number in `modelcontextprotocol/modelcontextprotocol` by convention. |
| 40 | + |
| 41 | +## Step 2: Generate the skeleton |
| 42 | + |
| 43 | +Run the CLI: |
| 44 | + |
| 45 | +```bash |
| 46 | +npm run --silent build |
| 47 | +node dist/index.js new-sep <NNNN> |
| 48 | +``` |
| 49 | + |
| 50 | +(For development against a non-built source tree: `npx tsx src/index.ts new-sep ...`.) |
| 51 | + |
| 52 | +The CLI writes `src/seps/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`. |
| 53 | + |
| 54 | +If the CLI errors with "does not change any docs/specification/draft/\*.mdx", the SEP's spec changes landed in a separate PR — ask the user for the spec file path and rerun with `--spec-path docs/specification/draft/<path>`. Do not guess. |
| 55 | + |
| 56 | +## Step 3: Fetch the spec diff |
| 57 | + |
| 58 | +`AGENTS.md` (lines 64–72) is explicit that severity must come from the spec text itself, not the SEP markdown or the conformance PR description: |
| 59 | + |
| 60 | +```bash |
| 61 | +gh api "repos/modelcontextprotocol/modelcontextprotocol/pulls/<NNNN>/files" \ |
| 62 | + --jq '.[] | select(.filename | test("^docs/specification/draft/.*\\.mdx$")) | {filename, patch}' |
| 63 | +``` |
| 64 | + |
| 65 | +For each file, pull the added (`+`-prefixed) lines from `patch`. If `patch` is truncated for a large file, fall back to fetching the whole file at the PR's head ref: |
| 66 | + |
| 67 | +```bash |
| 68 | +gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/<path>?ref=<sep-branch>" \ |
| 69 | + --jq '.content' | base64 -d |
| 70 | +``` |
| 71 | + |
| 72 | +## Step 4: Extract RFC 2119 requirements |
| 73 | + |
| 74 | +Walk the added lines and identify sentences containing the keywords: **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **MAY**, **OPTIONAL**. |
| 75 | + |
| 76 | +**Quote the whole sentence**, not just the matched line. The matched word may sit inside a bullet point whose lead-in sentence supplies the keyword by inheritance — e.g.: |
| 77 | + |
| 78 | +> Servers SHOULD return standard JSON-RPC errors for common failure cases: |
| 79 | +> |
| 80 | +> - Resource not found: -32602 (Invalid Params) |
| 81 | +
|
| 82 | +The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/seps/sep-2164.yaml` for the canonical example. |
| 83 | + |
| 84 | +**Regex alone is insufficient** (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in. |
| 85 | + |
| 86 | +## Step 5: Map severity → check vs. excluded |
| 87 | + |
| 88 | +From `AGENTS.md:50-56`: |
| 89 | + |
| 90 | +| Keyword | Severity | YAML field | |
| 91 | +| ---------------------------------------------- | -------- | -------------------------- | |
| 92 | +| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE | `check: sep-<NNNN>-<slug>` | |
| 93 | +| SHOULD / SHOULD NOT | WARNING | `check: sep-<NNNN>-<slug>` | |
| 94 | +| MAY / OPTIONAL | — | _no row — skip entirely_ | |
| 95 | + |
| 96 | +MAY / OPTIONAL sentences are noted in Step 4 only so you consciously skip them — they never produce a yaml row. |
| 97 | + |
| 98 | +A row is `excluded:` when a MUST/SHOULD requirement can't be protocol-observed by the harness. Do **not** write any `excluded:` row on your own authority — every exclusion goes through Step 6. |
| 99 | + |
| 100 | +While classifying, sort each MUST/SHOULD row into one of three buckets: |
| 101 | + |
| 102 | +- **`check:`** — observably testable on the wire. |
| 103 | +- **clearly-excluded** — you're confident it can't be observed (e.g. "clients SHOULD also accept -32002" when the harness only drives servers). |
| 104 | +- **borderline** — you'd default to `check:` but observability is questionable. Markers: |
| 105 | + - _Internal state_ — verbs like _record_, _store_, _associate_, _track_, _cache_. The harness sees wire traffic, not memory; usually only observable via a downstream row already in your list. |
| 106 | + - _UI / human-facing_ — _display_, _show_, _render_, _prompt the user_. |
| 107 | + - _Precondition phrasing_ — "Before doing X, the implementation MUST Y" where X is itself another row. |
| 108 | + |
| 109 | +Slug convention: lowercase-kebab, derived from the verb phrase. Examples from `sep-2164.yaml`: `no-empty-contents`, `error-code`. Same `id` is used for SUCCESS and FAILURE (`AGENTS.md:52`). |
| 110 | + |
| 111 | +## Step 6: Confirm exclusions with the user |
| 112 | + |
| 113 | +Nothing becomes `excluded:` without sign-off. Two rounds: |
| 114 | + |
| 115 | +**Round 1 — clearly-excluded, single batch question.** One `AskUserQuestion` listing all clearly-excluded rows in the question body (slug + one-line reason each). Options: |
| 116 | + |
| 117 | +- `Exclude all as listed (Recommended)` |
| 118 | +- `Flip all to check:` |
| 119 | +- `Let me adjust per-row` — if chosen, append these rows to round 2. |
| 120 | + |
| 121 | +Skip this round if the bucket is empty. |
| 122 | + |
| 123 | +**Round 2 — borderline, one question per row.** One `AskUserQuestion` call with a question per borderline row (loop in batches of 4 if needed). For each: |
| 124 | + |
| 125 | +- header: the proposed slug |
| 126 | +- question: quote the requirement sentence + your one-line observability concern |
| 127 | +- options (list `check:` first — it's the default for borderline): |
| 128 | + - `check:` — keep as a testable check |
| 129 | + - `excluded: <reason>` — drop to excluded with your stated reason |
| 130 | + - `merge into <other-slug>` — offer when the row is a precondition for another row already in the list |
| 131 | + |
| 132 | +Apply the answers before writing. For any `excluded:` outcome, write the reason verbatim into the yaml and add an `issue:` URL if the user supplies one. A `merge` outcome means: drop this row, and append its `text:` to the surviving row's `text:` separated by `/` so the traceability isn't lost. |
| 133 | + |
| 134 | +## Step 7: Rewrite the YAML |
| 135 | + |
| 136 | +Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/seps/sep-2164.yaml`). |
| 137 | + |
| 138 | +**Key order within each row** — for `check:` rows the **`check:` key comes first**, then `text:`, then any optional `url:`. Scanning the left margin should reveal every check ID without reading the quoted sentences. For `excluded:` rows the order is **`text:` first**, then `excluded:`, then optional `issue:` — there's no ID to scan for, so lead with the requirement. |
| 139 | + |
| 140 | +**Row order in the file** — all `check:` rows first (in spec-diff order), then **all `excluded:` rows grouped at the bottom**, separated from the checks by **one blank line**. Do not interleave. |
| 141 | + |
| 142 | +```yaml |
| 143 | +requirements: |
| 144 | + - check: sep-NNNN-first-slug |
| 145 | + text: '...' |
| 146 | + - check: sep-NNNN-second-slug |
| 147 | + text: '...' |
| 148 | + |
| 149 | + - text: '...' |
| 150 | + excluded: 'reason' |
| 151 | + issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN> |
| 152 | +``` |
| 153 | +
|
| 154 | +If a requirement is ambiguous or you're not confident, leave it as a `TODO:` row rather than guessing — humans review this yaml before scenarios get written. |
| 155 | + |
| 156 | +Also fix the `spec_url`: the CLI emits the page URL with no anchor. If the requirements you extracted live under a specific spec subsection (e.g. `#error-handling`), append it. |
| 157 | + |
| 158 | +If a requirement comes from a **different spec page** than `spec_url` (the SEP touched multiple `.mdx` files — the CLI prints these as "PR also changes N other spec file(s)"), give that row a full `url:` override: |
| 159 | + |
| 160 | +```yaml |
| 161 | +- check: sep-NNNN-slug |
| 162 | + text: '...' |
| 163 | + url: https://modelcontextprotocol.io/specification/draft/other/page#anchor |
| 164 | +``` |
| 165 | + |
| 166 | +A row's effective spec reference is `row.url ?? file.spec_url`. |
| 167 | + |
| 168 | +Write the result back to `$YAML`. |
| 169 | + |
| 170 | +## Step 8: Suggest a host scenario |
| 171 | + |
| 172 | +`AGENTS.md` prefers **fewer scenarios with more checks** over one-scenario-per-check. Before telling the user to write a new scenario, look for an existing one the new checks could be folded into. |
| 173 | + |
| 174 | +Determine the suite directory from the requirement subjects ("MCP clients MUST…" → `client/`, "Servers MUST…" → `server/`, "authorization servers MUST…" → `authorization-server/`; a SEP may map to more than one). Then search that directory for scenarios touching the same spec area: |
| 175 | + |
| 176 | +```bash |
| 177 | +rg -l -i '<domain-term>|<domain-term-2>' src/scenarios/<suite>/ --type ts |
| 178 | +``` |
| 179 | + |
| 180 | +Pick 2–3 domain terms from the SEP's subject matter (for a discovery SEP: `metadata`, `well-known`; for an auth-response SEP: `redirect`, `callback`, `pkce`). For each hit, pull the scenario's `name`/`description` to confirm relevance: |
| 181 | + |
| 182 | +```bash |
| 183 | +rg -A1 'name:|description:' <hit.ts> |
| 184 | +``` |
| 185 | + |
| 186 | +If you find a plausible host, recommend it by path. If nothing fits, say so explicitly — a new scenario file is then the right call. |
| 187 | + |
| 188 | +## Step 9: Hand-off |
| 189 | + |
| 190 | +Report to the user, in this order: |
| 191 | + |
| 192 | +1. Path to the generated yaml. |
| 193 | +2. Row counts: "`N check:` rows, `M excluded:` rows" — and note which exclusions the user signed off in Step 6. |
| 194 | +3. Any requirements you left as `TODO:` and why. |
| 195 | +4. **Host-scenario recommendation** from Step 8 — either "consider adding these checks to `src/scenarios/<suite>/<file>.ts` (it already exercises _X_)" or "no existing scenario covers this area; a new file is appropriate". |
| 196 | +5. Remaining next steps the user owns: |
| 197 | + - add the checks to the host scenario (or create one) under `src/scenarios/{client,server,authorization-server}/`, |
| 198 | + - register any new scenario in `src/scenarios/index.ts` (`AGENTS.md:48`), |
| 199 | + - add a passing example to the everything-client/server and a negative test, per `AGENTS.md:74-81`. |
| 200 | + |
| 201 | +Do **not** generate or edit scenario `.ts` files or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml plus the recommendation. |
0 commit comments