modelcontextprotocol
diff --git a/‎.claude/skills/new-sep/SKILL.md‎
Lines changed: 89 additions & 23 deletions b/‎.claude/skills/new-sep/SKILL.md‎
Lines changed: 89 additions & 23 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 1 addition & 1 deletion b/‎AGENTS.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/new-sep/index.test.ts‎
Lines changed: 13 additions & 44 deletions b/‎src/new-sep/index.test.ts‎
Lines changed: 13 additions & 44 deletions
@@ -5,7 +5,7 @@ description: >-
   conformance repo from a SEP PR's spec diff. Runs the new-sep CLI, then
   parses the modelcontextprotocol/modelcontextprotocol spec diff to populate
   `requirements[]` with the RFC 2119 sentences and proposed check IDs.
-argument-hint: '<sep-number> [--target client|server|authorization-server]'
+argument-hint: '<sep-number>'
 ---
 
 # new-sep: SEP traceability YAML scaffolding
@@ -37,20 +37,19 @@ The name should be `@modelcontextprotocol/conformance`. If not, stop and ask the
 Extract from the user's input:
 
 - **sep-number** (required): the SEP number, e.g. `2164`. This is also the PR number in `modelcontextprotocol/modelcontextprotocol` by convention.
-- **--target client|server|authorization-server** (optional): which scenarios subdirectory to write to. Inferred from the spec path if omitted.
 
 ## Step 2: Generate the skeleton
 
 Run the CLI:
 
 ```bash
 npm run --silent build
-node dist/index.js new-sep <NNNN> [--target <target>]
+node dist/index.js new-sep <NNNN>
 ```
 
 (For development against a non-built source tree: `npx tsx src/index.ts new-sep ...`.)
 
-The CLI writes `src/scenarios/<target>/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.
+The CLI writes `src/seps/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.
 
 If the CLI errors with "does not change any docs/specification/draft/\*.mdx", the SEP's spec changes landed in a separate PR — ask the user for the spec file path and rerun with `--spec-path docs/specification/draft/<path>`. Do not guess.
 
@@ -80,29 +79,77 @@ Walk the added lines and identify sentences containing the keywords: **MUST**, *
 >
 > - Resource not found: -32602 (Invalid Params)
 
-The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/scenarios/server/sep-2164.yaml` for the canonical example.
+The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/seps/sep-2164.yaml` for the canonical example.
 
 **Regex alone is insufficient** (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in.
 
 ## Step 5: Map severity → check vs. excluded
 
 From `AGENTS.md:50-56`:
 
-| Keyword                                        | Severity                  | YAML field                 |
-| ---------------------------------------------- | ------------------------- | -------------------------- |
-| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE                   | `check: sep-<NNNN>-<slug>` |
-| SHOULD / SHOULD NOT                            | WARNING                   | `check: sep-<NNNN>-<slug>` |
-| MAY / OPTIONAL                                 | (not enforced as a check) | `excluded: '<reason>'`     |
+| Keyword                                        | Severity | YAML field                 |
+| ---------------------------------------------- | -------- | -------------------------- |
+| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE  | `check: sep-<NNNN>-<slug>` |
+| SHOULD / SHOULD NOT                            | WARNING  | `check: sep-<NNNN>-<slug>` |
+| MAY / OPTIONAL                                 | —        | _no row — skip entirely_   |
 
-If a requirement is testable in principle but you can't see how to drive it from the harness, write a `check:` row anyway and leave it for the human to wire up — do **not** silently demote to `excluded:`.
+MAY / OPTIONAL sentences are noted in Step 4 only so you consciously skip them — they never produce a yaml row.
 
-Use `excluded:` only when the requirement genuinely can't be protocol-observed (e.g. "clients SHOULD also accept -32002" — the conformance harness tests servers, so client-side acceptance is not observable here). When you use `excluded:`, write the reason verbatim and add an `issue:` URL if there's a tracking issue.
+A row is `excluded:` when a MUST/SHOULD requirement can't be protocol-observed by the harness. Do **not** write any `excluded:` row on your own authority — every exclusion goes through Step 6.
+
+While classifying, sort each MUST/SHOULD row into one of three buckets:
+
+- **`check:`** — observably testable on the wire.
+- **clearly-excluded** — you're confident it can't be observed (e.g. "clients SHOULD also accept -32002" when the harness only drives servers).
+- **borderline** — you'd default to `check:` but observability is questionable. Markers:
+  - _Internal state_ — verbs like _record_, _store_, _associate_, _track_, _cache_. The harness sees wire traffic, not memory; usually only observable via a downstream row already in your list.
+  - _UI / human-facing_ — _display_, _show_, _render_, _prompt the user_.
+  - _Precondition phrasing_ — "Before doing X, the implementation MUST Y" where X is itself another row.
 
 Slug convention: lowercase-kebab, derived from the verb phrase. Examples from `sep-2164.yaml`: `no-empty-contents`, `error-code`. Same `id` is used for SUCCESS and FAILURE (`AGENTS.md:52`).
 
-## Step 6: Rewrite the YAML
+## Step 6: Confirm exclusions with the user
+
+Nothing becomes `excluded:` without sign-off. Two rounds:
+
+**Round 1 — clearly-excluded, single batch question.** One `AskUserQuestion` listing all clearly-excluded rows in the question body (slug + one-line reason each). Options:
+
+- `Exclude all as listed (Recommended)`
+- `Flip all to check:`
+- `Let me adjust per-row` — if chosen, append these rows to round 2.
+
+Skip this round if the bucket is empty.
+
+**Round 2 — borderline, one question per row.** One `AskUserQuestion` call with a question per borderline row (loop in batches of 4 if needed). For each:
+
+- header: the proposed slug
+- question: quote the requirement sentence + your one-line observability concern
+- options (list `check:` first — it's the default for borderline):
+  - `check:` — keep as a testable check
+  - `excluded: <reason>` — drop to excluded with your stated reason
+  - `merge into <other-slug>` — offer when the row is a precondition for another row already in the list
+
+Apply the answers before writing. For any `excluded:` outcome, write the reason verbatim into the yaml and add an `issue:` URL if the user supplies one. A `merge` outcome means: drop this row, and append its `text:` to the surviving row's `text:` separated by `/` so the traceability isn't lost.
+
+## Step 7: Rewrite the YAML
+
+Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/seps/sep-2164.yaml`).
+
+**Key order within each row** — for `check:` rows the **`check:` key comes first**, then `text:`, then any optional `url:`. Scanning the left margin should reveal every check ID without reading the quoted sentences. For `excluded:` rows the order is **`text:` first**, then `excluded:`, then optional `issue:` — there's no ID to scan for, so lead with the requirement.
+
+**Row order in the file** — all `check:` rows first (in spec-diff order), then **all `excluded:` rows grouped at the bottom**, separated from the checks by **one blank line**. Do not interleave.
 
-Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/scenarios/server/sep-2164.yaml`).
+```yaml
+requirements:
+  - check: sep-NNNN-first-slug
+    text: '...'
+  - check: sep-NNNN-second-slug
+    text: '...'
+
+  - text: '...'
+    excluded: 'reason'
+    issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
+```
 
 If a requirement is ambiguous or you're not confident, leave it as a `TODO:` row rather than guessing — humans review this yaml before scenarios get written.
 
@@ -111,25 +158,44 @@ Also fix the `spec_url`: the CLI emits the page URL with no anchor. If the requi
 If a requirement comes from a **different spec page** than `spec_url` (the SEP touched multiple `.mdx` files — the CLI prints these as "PR also changes N other spec file(s)"), give that row a full `url:` override:
 
 ```yaml
-- text: '...'
-  check: sep-NNNN-slug
+- check: sep-NNNN-slug
+  text: '...'
   url: https://modelcontextprotocol.io/specification/draft/other/page#anchor
 ```
 
 A row's effective spec reference is `row.url ?? file.spec_url`.
 
 Write the result back to `$YAML`.
 
-## Step 7: Hand-off
+## Step 8: Suggest a host scenario
+
+`AGENTS.md` prefers **fewer scenarios with more checks** over one-scenario-per-check. Before telling the user to write a new scenario, look for an existing one the new checks could be folded into.
+
+Determine the suite directory from the requirement subjects ("MCP clients MUST…" → `client/`, "Servers MUST…" → `server/`, "authorization servers MUST…" → `authorization-server/`; a SEP may map to more than one). Then search that directory for scenarios touching the same spec area:
+
+```bash
+rg -l -i '<domain-term>|<domain-term-2>' src/scenarios/<suite>/ --type ts
+```
+
+Pick 2–3 domain terms from the SEP's subject matter (for a discovery SEP: `metadata`, `well-known`; for an auth-response SEP: `redirect`, `callback`, `pkce`). For each hit, pull the scenario's `name`/`description` to confirm relevance:
+
+```bash
+rg -A1 'name:|description:' <hit.ts>
+```
+
+If you find a plausible host, recommend it by path. If nothing fits, say so explicitly — a new scenario file is then the right call.
+
+## Step 9: Hand-off
 
 Report to the user, in this order:
 
 1. Path to the generated yaml.
-2. Number of rows extracted (e.g. "3 `check:` rows, 1 `excluded:` row").
-3. Any requirements you marked TODO and why.
-4. Reminder of the next steps the user still owns:
-   - implement the TypeScript scenario under `src/scenarios/<target>/`,
-   - register it in the appropriate suite list in `src/scenarios/index.ts` (`AGENTS.md:48`),
+2. Row counts: "`N check:` rows, `M excluded:` rows" — and note which exclusions the user signed off in Step 6.
+3. Any requirements you left as `TODO:` and why.
+4. **Host-scenario recommendation** from Step 8 — either "consider adding these checks to `src/scenarios/<suite>/<file>.ts` (it already exercises _X_)" or "no existing scenario covers this area; a new file is appropriate".
+5. Remaining next steps the user owns:
+   - add the checks to the host scenario (or create one) under `src/scenarios/{client,server,authorization-server}/`,
+   - register any new scenario in `src/scenarios/index.ts` (`AGENTS.md:48`),
    - add a passing example to the everything-client/server and a negative test, per `AGENTS.md:74-81`.
 
-Do **not** generate the scenario `.ts` file or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml.
+Do **not** generate or edit scenario `.ts` files or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml plus the recommendation.
@@ -79,7 +79,7 @@ Scaffold the requirement-traceability YAML with:
 npx @modelcontextprotocol/conformance new-sep <NNNN>
 ```
 
-The command looks up PR #`<NNNN>` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, picks `src/scenarios/{client,server,authorization-server}/` from the spec path, and writes `sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.
+The command looks up PR #`<NNNN>` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, and writes `src/seps/sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.
 
 ## Examples: prove it passes and fails
 
 
@@ -1,5 +1,5 @@
 import { describe, it, expect } from 'vitest';
-import { specPathToUrl, inferTarget, renderYaml } from './index';
+import { specPathToUrl, renderYaml } from './index';
 
 describe('specPathToUrl', () => {
   it('strips the docs/specification/draft/ prefix and .mdx suffix', () => {
@@ -21,39 +21,6 @@ describe('specPathToUrl', () => {
   });
 });
 
-describe('inferTarget', () => {
-  it('returns server for server/ paths', () => {
-    expect(
-      inferTarget('docs/specification/draft/server/resources.mdx')
-    ).toEqual({ target: 'server', inferred: false });
-  });
-
-  it('returns client for client/ paths', () => {
-    expect(inferTarget('docs/specification/draft/client/sampling.mdx')).toEqual(
-      { target: 'client', inferred: false }
-    );
-  });
-
-  it('returns authorization-server for basic/authorization* paths', () => {
-    expect(
-      inferTarget('docs/specification/draft/basic/authorization.mdx')
-    ).toEqual({ target: 'authorization-server', inferred: false });
-  });
-
-  it('falls back to server (inferred) for unrecognized paths', () => {
-    expect(inferTarget('docs/specification/draft/basic/lifecycle.mdx')).toEqual(
-      { target: 'server', inferred: true }
-    );
-  });
-
-  it('accepts paths already stripped of the prefix', () => {
-    expect(inferTarget('client/elicitation.mdx')).toEqual({
-      target: 'client',
-      inferred: false
-    });
-  });
-});
-
 describe('renderYaml', () => {
   it('emits placeholder yaml in the sep-2164.yaml style', () => {
     const out = renderYaml({
@@ -65,8 +32,9 @@ describe('renderYaml', () => {
       `sep: 9999
 spec_url: https://modelcontextprotocol.io/specification/draft/server/resources
 requirements:
-  - text: 'TODO: quote the normative sentence from the spec diff'
-    check: sep-9999-todo
+  - check: sep-9999-todo
+    text: 'TODO: quote the normative sentence from the spec diff'
+
   - text: 'TODO: requirement that cannot be tested'
     excluded: 'TODO: reason'
     issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
@@ -99,10 +67,11 @@ requirements:
       `sep: 2164
 spec_url: https://modelcontextprotocol.io/specification/draft/server/resources#error-handling
 requirements:
-  - text: 'Servers MUST NOT return an empty contents array for a non-existent resource'
-    check: sep-2164-no-empty-contents
-  - text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'
-    check: sep-2164-error-code
+  - check: sep-2164-no-empty-contents
+    text: 'Servers MUST NOT return an empty contents array for a non-existent resource'
+  - check: sep-2164-error-code
+    text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'
+
   - text: 'clients SHOULD also accept -32002 as a resource not found error'
     excluded: 'Client-side error handling is implementation-defined; not protocol-observable'
 `
@@ -126,10 +95,10 @@ requirements:
       `sep: 1234
 spec_url: https://modelcontextprotocol.io/specification/draft/server/a
 requirements:
-  - text: 'from primary file'
-    check: sep-1234-a
-  - text: 'from secondary file'
-    check: sep-1234-b
+  - check: sep-1234-a
+    text: 'from primary file'
+  - check: sep-1234-b
+    text: 'from secondary file'
     url: https://modelcontextprotocol.io/specification/draft/server/b#x
 `
     );