feat: add new-sep scaffolding command

pugsatoshi · pugsatoshi · commit 99f951bdeb73 · 2026-05-12T07:57:08.000Z
Related #243. This commit adds the new-sep <NNNN> subcommand, which writes the file src/scenarios/<target>/sep-<NNNN>.yaml. This file contains the "sep", "spec_url", and "requirements" rows in the same style as sep-2164.yaml, with single quotes and two spaces. The spec source resolves in order: --spec-url → --spec-path → GitHub API against modelcontextprotocol/modelcontextprotocol (explicit --pr or title search for "SEP-NNNN"). The target directory is inferred from the spec path (`server/`, `client/`, or `basic/authorization*`); the --target flag overrides this. It refuses to overwrite without the --force flag. Token resolution reuses the "gh auth token" fallback pattern from tier-check. It also ships the .claude/skills/new-sep directory, which drives the CLI. It fetches the spec diff via "gh api" and extracts RFC 2119 sentences. There is a warning against regex-only matching. Bullets inherit keywords from their lead-in sentence. It rewrites the placeholder rows per the severity rules in AGENTS.md. MUST/SHALL/REQUIRED are checked for FAILURE. SHOULD is checked for WARNING. MAY/OPTIONAL are excluded. Signed-off-by: Satoshi Ito <satoshi.ito.tf@hitachi.com>
diff --git a/.claude/skills/new-sep/SKILL.md b/.claude/skills/new-sep/SKILL.md
@@ -0,0 +1,127 @@
+---
+name: new-sep
+description: >-
+  Scaffold a sep-NNNN.yaml requirement-traceability file for the MCP
+  conformance repo from a SEP PR's spec diff. Runs the new-sep CLI, then
+  parses the modelcontextprotocol/modelcontextprotocol spec diff to populate
+  `requirements[]` with the RFC 2119 sentences and proposed check IDs.
+argument-hint: '<sep-number> [--pr <num>] [--target client|server|authorization-server]'
+---
+
+# new-sep: SEP traceability YAML scaffolding
+
+You are bootstrapping a `sep-NNNN.yaml` file for a new SEP in the MCP conformance repo. The output is the requirement-traceability file specified by SEP-2484: a YAML that maps each normative sentence from the SEP's spec diff to a `check:` ID (testable) or an `excluded:` reason (not testable). The CLI gets the skeleton; you fill in the rows by reading the spec diff.
+
+## Step 0: Pre-flight checks
+
+Before doing anything else, verify GitHub CLI authentication:
+
+```bash
+gh auth status 2>&1
+```
+
+If this fails, stop immediately and tell the user:
+
+> GitHub authentication is required for this skill. Please run `gh auth login` first, then re-run.
+
+Verify you're running inside the conformance repo:
+
+```bash
+test -f package.json && jq -r '.name' package.json
+```
+
+The name should be `@modelcontextprotocol/conformance`. If not, stop and ask the user to `cd` into the conformance repo first.
+
+## Step 1: Parse arguments
+
+Extract from the user's input:
+
+- **sep-number** (required): the SEP number, e.g. `2164`.
+- **--pr <num>** (optional): the PR number in `modelcontextprotocol/modelcontextprotocol`. If omitted, the CLI searches for a PR titled `SEP-<NNNN>` and fails loudly on 0 or >1 hits.
+- **--target client|server|authorization-server** (optional): which scenarios subdirectory to write to. Inferred from the spec path if omitted.
+
+## Step 2: Generate the skeleton
+
+Run the CLI:
+
+```bash
+npm run --silent build
+node dist/index.js new-sep <NNNN> [--pr <num>] [--target <target>]
+```
+
+(For development against a non-built source tree: `npx tsx src/index.ts new-sep ...`.)
+
+The CLI writes `src/scenarios/<target>/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.
+
+If the CLI errors with "No PRs match" or "Multiple PRs match", read the message, ask the user for the right `--pr <num>`, and rerun. Do not guess.
+
+## Step 3: Fetch the spec diff
+
+`AGENTS.md` (lines 64–72) is explicit that severity must come from the spec text itself, not the SEP markdown or the conformance PR description:
+
+```bash
+PR=$(node dist/index.js new-sep <NNNN> --help >/dev/null 2>&1; echo <pr-from-step-2>)
+gh api "repos/modelcontextprotocol/modelcontextprotocol/pulls/$PR/files" \
+  --jq '.[] | select(.filename | test("^docs/specification/draft/.*\\.mdx$")) | {filename, patch}'
+```
+
+For each file, pull the added (`+`-prefixed) lines from `patch`. If `patch` is truncated for a large file, fall back to fetching the whole file at the PR's head ref:
+
+```bash
+gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/<path>?ref=<sep-branch>" \
+  --jq '.content' | base64 -d
+```
+
+## Step 4: Extract RFC 2119 requirements
+
+Walk the added lines and identify sentences containing the keywords: **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **MAY**, **OPTIONAL**.
+
+**Quote the whole sentence**, not just the matched line. The matched word may sit inside a bullet point whose lead-in sentence supplies the keyword by inheritance — e.g.:
+
+> Servers SHOULD return standard JSON-RPC errors for common failure cases:
+>
+> - Resource not found: -32602 (Invalid Params)
+
+The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/scenarios/server/sep-2164.yaml` for the canonical example.
+
+**Regex alone is insufficient** (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in.
+
+## Step 5: Map severity → check vs. excluded
+
+From `AGENTS.md:50-56`:
+
+| Keyword                                        | Severity                  | YAML field                 |
+| ---------------------------------------------- | ------------------------- | -------------------------- |
+| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE                   | `check: sep-<NNNN>-<slug>` |
+| SHOULD / SHOULD NOT                            | WARNING                   | `check: sep-<NNNN>-<slug>` |
+| MAY / OPTIONAL                                 | (not enforced as a check) | `excluded: '<reason>'`     |
+
+If a requirement is testable in principle but you can't see how to drive it from the harness, write a `check:` row anyway and leave it for the human to wire up — do **not** silently demote to `excluded:`.
+
+Use `excluded:` only when the requirement genuinely can't be protocol-observed (e.g. "clients SHOULD also accept -32002" — the conformance harness tests servers, so client-side acceptance is not observable here). When you use `excluded:`, write the reason verbatim and add an `issue:` URL if there's a tracking issue.
+
+Slug convention: lowercase-kebab, derived from the verb phrase. Examples from `sep-2164.yaml`: `no-empty-contents`, `error-code`. Same `id` is used for SUCCESS and FAILURE (`AGENTS.md:52`).
+
+## Step 6: Rewrite the YAML
+
+Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/scenarios/server/sep-2164.yaml`).
+
+If a requirement is ambiguous or you're not confident, leave it as a `TODO:` row rather than guessing — humans review this yaml before scenarios get written.
+
+Also fix the `spec_url`: the CLI emits the page URL with no anchor. If the requirements you extracted live under a specific spec subsection (e.g. `#error-handling`), append it.
+
+Write the result back to `$YAML`.
+
+## Step 7: Hand-off
+
+Report to the user, in this order:
+
+1. Path to the generated yaml.
+2. Number of rows extracted (e.g. "3 `check:` rows, 1 `excluded:` row").
+3. Any requirements you marked TODO and why.
+4. Reminder of the next steps the user still owns:
+   - implement the TypeScript scenario under `src/scenarios/<target>/`,
+   - register it in the appropriate suite list in `src/scenarios/index.ts` (`AGENTS.md:48`),
+   - add a passing example to the everything-client/server and a negative test, per `AGENTS.md:74-81`.
+
+Do **not** generate the scenario `.ts` file or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml.
diff --git a/AGENTS.md b/AGENTS.md
@@ -71,6 +71,16 @@ Verify requirement levels against the SEP's **spec diff** — the change to `doc
 gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/docs/specification/draft/<path>?ref=<sep-branch>" --jq '.content' | base64 -d
 ```
 
+### Adding a new SEP
+
+Scaffold the requirement-traceability YAML with:
+
+```sh
+npx @modelcontextprotocol/conformance new-sep <NNNN>
+```
+
+The command searches `modelcontextprotocol/modelcontextprotocol` for a PR titled `SEP-<NNNN>`, derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, picks `src/scenarios/{client,server,authorization-server}/` from the spec path, and writes `sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-url`, `--spec-path`, or `--pr` to override the lookup when title search is ambiguous. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.
+
 ## Examples: prove it passes and fails
 
 A new scenario should come with:
diff --git a/src/index.ts b/src/index.ts
@@ -45,6 +45,7 @@ import {
   printBaselineResults
 } from './expected-failures';
 import { createTierCheckCommand } from './tier-check';
+import { createNewSepCommand } from './new-sep';
 import packageJson from '../package.json';
 
 // Note on naming: `command` refers to which CLI command is calling this.
@@ -540,6 +541,9 @@ program
 // Tier check command
 program.addCommand(createTierCheckCommand());
 
+// New SEP scaffolding command
+program.addCommand(createNewSepCommand());
+
 // List scenarios command
 program
   .command('list')
diff --git a/src/new-sep/index.test.ts b/src/new-sep/index.test.ts
@@ -0,0 +1,120 @@
+import { describe, it, expect } from 'vitest';
+import { specPathToUrl, inferTarget, renderYaml } from './index';
+
+describe('specPathToUrl', () => {
+  it('strips the docs/specification/draft/ prefix and .mdx suffix', () => {
+    expect(specPathToUrl('docs/specification/draft/server/resources.mdx')).toBe(
+      'https://modelcontextprotocol.io/specification/draft/server/resources'
+    );
+  });
+
+  it('handles nested paths', () => {
+    expect(specPathToUrl('docs/specification/draft/basic/lifecycle.mdx')).toBe(
+      'https://modelcontextprotocol.io/specification/draft/basic/lifecycle'
+    );
+  });
+
+  it('rejects paths outside docs/specification/draft/', () => {
+    expect(() =>
+      specPathToUrl('docs/specification/2025-11-25/server/x.mdx')
+    ).toThrow(/must start with/);
+  });
+});
+
+describe('inferTarget', () => {
+  it('returns server for server/ paths', () => {
+    expect(
+      inferTarget('docs/specification/draft/server/resources.mdx')
+    ).toEqual({ target: 'server', inferred: false });
+  });
+
+  it('returns client for client/ paths', () => {
+    expect(inferTarget('docs/specification/draft/client/sampling.mdx')).toEqual(
+      { target: 'client', inferred: false }
+    );
+  });
+
+  it('returns authorization-server for basic/authorization* paths', () => {
+    expect(
+      inferTarget('docs/specification/draft/basic/authorization.mdx')
+    ).toEqual({ target: 'authorization-server', inferred: false });
+  });
+
+  it('falls back to server (inferred) for unrecognized paths', () => {
+    expect(inferTarget('docs/specification/draft/basic/lifecycle.mdx')).toEqual(
+      { target: 'server', inferred: true }
+    );
+  });
+
+  it('accepts paths already stripped of the prefix', () => {
+    expect(inferTarget('client/elicitation.mdx')).toEqual({
+      target: 'client',
+      inferred: false
+    });
+  });
+});
+
+describe('renderYaml', () => {
+  it('emits placeholder yaml in the sep-2164.yaml style', () => {
+    const out = renderYaml({
+      sep: 9999,
+      specUrl:
+        'https://modelcontextprotocol.io/specification/draft/server/resources'
+    });
+    expect(out).toBe(
+      `sep: 9999
+spec_url: https://modelcontextprotocol.io/specification/draft/server/resources
+requirements:
+  - text: 'TODO: quote the normative sentence from the spec diff'
+    check: sep-9999-todo
+  - text: 'TODO: requirement that cannot be tested'
+    excluded: 'TODO: reason'
+    issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
+`
+    );
+  });
+
+  it('matches the byte-shape of the real sep-2164.yaml when given its rows', () => {
+    const out = renderYaml({
+      sep: 2164,
+      specUrl:
+        'https://modelcontextprotocol.io/specification/draft/server/resources#error-handling',
+      requirements: [
+        {
+          text: 'Servers MUST NOT return an empty contents array for a non-existent resource',
+          check: 'sep-2164-no-empty-contents'
+        },
+        {
+          text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)',
+          check: 'sep-2164-error-code'
+        },
+        {
+          text: 'clients SHOULD also accept -32002 as a resource not found error',
+          excluded:
+            'Client-side error handling is implementation-defined; not protocol-observable'
+        }
+      ]
+    });
+    expect(out).toBe(
+      `sep: 2164
+spec_url: https://modelcontextprotocol.io/specification/draft/server/resources#error-handling
+requirements:
+  - text: 'Servers MUST NOT return an empty contents array for a non-existent resource'
+    check: sep-2164-no-empty-contents
+  - text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'
+    check: sep-2164-error-code
+  - text: 'clients SHOULD also accept -32002 as a resource not found error'
+    excluded: 'Client-side error handling is implementation-defined; not protocol-observable'
+`
+    );
+  });
+
+  it('escapes single quotes by doubling them', () => {
+    const out = renderYaml({
+      sep: 1,
+      specUrl: 'https://example.com/x',
+      requirements: [{ text: "can't happen", check: 'sep-1-x' }]
+    });
+    expect(out).toContain("text: 'can''t happen'");
+  });
+});
diff --git a/src/new-sep/index.ts b/src/new-sep/index.ts