Skip to content

Commit d4aa04c

Browse files
committed
Consolidate SEP yamls into src/seps/ and tighten yaml ordering
- All sep-NNNN.yaml files now live in src/seps/ regardless of which scenario suite their checks run in - Drop --target / inferTarget; output dir is fixed - renderYaml: check rows emit `check:` first, excluded rows grouped at bottom with a blank-line separator - SKILL.md: document key/row ordering, expand exclusion-confirmation step - Move src/scenarios/server/sep-2164.yaml -> src/seps/
1 parent db6052e commit d4aa04c

5 files changed

Lines changed: 127 additions & 134 deletions

File tree

.claude/skills/new-sep/SKILL.md

Lines changed: 89 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: >-
55
conformance repo from a SEP PR's spec diff. Runs the new-sep CLI, then
66
parses the modelcontextprotocol/modelcontextprotocol spec diff to populate
77
`requirements[]` with the RFC 2119 sentences and proposed check IDs.
8-
argument-hint: '<sep-number> [--target client|server|authorization-server]'
8+
argument-hint: '<sep-number>'
99
---
1010

1111
# new-sep: SEP traceability YAML scaffolding
@@ -37,20 +37,19 @@ The name should be `@modelcontextprotocol/conformance`. If not, stop and ask the
3737
Extract from the user's input:
3838

3939
- **sep-number** (required): the SEP number, e.g. `2164`. This is also the PR number in `modelcontextprotocol/modelcontextprotocol` by convention.
40-
- **--target client|server|authorization-server** (optional): which scenarios subdirectory to write to. Inferred from the spec path if omitted.
4140

4241
## Step 2: Generate the skeleton
4342

4443
Run the CLI:
4544

4645
```bash
4746
npm run --silent build
48-
node dist/index.js new-sep <NNNN> [--target <target>]
47+
node dist/index.js new-sep <NNNN>
4948
```
5049

5150
(For development against a non-built source tree: `npx tsx src/index.ts new-sep ...`.)
5251

53-
The CLI writes `src/scenarios/<target>/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.
52+
The CLI writes `src/seps/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.
5453

5554
If the CLI errors with "does not change any docs/specification/draft/\*.mdx", the SEP's spec changes landed in a separate PR — ask the user for the spec file path and rerun with `--spec-path docs/specification/draft/<path>`. Do not guess.
5655

@@ -80,29 +79,77 @@ Walk the added lines and identify sentences containing the keywords: **MUST**, *
8079
>
8180
> - Resource not found: -32602 (Invalid Params)
8281
83-
The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/scenarios/server/sep-2164.yaml` for the canonical example.
82+
The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/seps/sep-2164.yaml` for the canonical example.
8483

8584
**Regex alone is insufficient** (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in.
8685

8786
## Step 5: Map severity → check vs. excluded
8887

8988
From `AGENTS.md:50-56`:
9089

91-
| Keyword | Severity | YAML field |
92-
| ---------------------------------------------- | ------------------------- | -------------------------- |
93-
| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE | `check: sep-<NNNN>-<slug>` |
94-
| SHOULD / SHOULD NOT | WARNING | `check: sep-<NNNN>-<slug>` |
95-
| MAY / OPTIONAL | (not enforced as a check) | `excluded: '<reason>'` |
90+
| Keyword | Severity | YAML field |
91+
| ---------------------------------------------- | -------- | -------------------------- |
92+
| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE | `check: sep-<NNNN>-<slug>` |
93+
| SHOULD / SHOULD NOT | WARNING | `check: sep-<NNNN>-<slug>` |
94+
| MAY / OPTIONAL | | _no row — skip entirely_ |
9695

97-
If a requirement is testable in principle but you can't see how to drive it from the harness, write a `check:` row anyway and leave it for the human to wire up — do **not** silently demote to `excluded:`.
96+
MAY / OPTIONAL sentences are noted in Step 4 only so you consciously skip them — they never produce a yaml row.
9897

99-
Use `excluded:` only when the requirement genuinely can't be protocol-observed (e.g. "clients SHOULD also accept -32002" — the conformance harness tests servers, so client-side acceptance is not observable here). When you use `excluded:`, write the reason verbatim and add an `issue:` URL if there's a tracking issue.
98+
A row is `excluded:` when a MUST/SHOULD requirement can't be protocol-observed by the harness. Do **not** write any `excluded:` row on your own authority — every exclusion goes through Step 6.
99+
100+
While classifying, sort each MUST/SHOULD row into one of three buckets:
101+
102+
- **`check:`** — observably testable on the wire.
103+
- **clearly-excluded** — you're confident it can't be observed (e.g. "clients SHOULD also accept -32002" when the harness only drives servers).
104+
- **borderline** — you'd default to `check:` but observability is questionable. Markers:
105+
- _Internal state_ — verbs like _record_, _store_, _associate_, _track_, _cache_. The harness sees wire traffic, not memory; usually only observable via a downstream row already in your list.
106+
- _UI / human-facing__display_, _show_, _render_, _prompt the user_.
107+
- _Precondition phrasing_ — "Before doing X, the implementation MUST Y" where X is itself another row.
100108

101109
Slug convention: lowercase-kebab, derived from the verb phrase. Examples from `sep-2164.yaml`: `no-empty-contents`, `error-code`. Same `id` is used for SUCCESS and FAILURE (`AGENTS.md:52`).
102110

103-
## Step 6: Rewrite the YAML
111+
## Step 6: Confirm exclusions with the user
112+
113+
Nothing becomes `excluded:` without sign-off. Two rounds:
114+
115+
**Round 1 — clearly-excluded, single batch question.** One `AskUserQuestion` listing all clearly-excluded rows in the question body (slug + one-line reason each). Options:
116+
117+
- `Exclude all as listed (Recommended)`
118+
- `Flip all to check:`
119+
- `Let me adjust per-row` — if chosen, append these rows to round 2.
120+
121+
Skip this round if the bucket is empty.
122+
123+
**Round 2 — borderline, one question per row.** One `AskUserQuestion` call with a question per borderline row (loop in batches of 4 if needed). For each:
124+
125+
- header: the proposed slug
126+
- question: quote the requirement sentence + your one-line observability concern
127+
- options (list `check:` first — it's the default for borderline):
128+
- `check:` — keep as a testable check
129+
- `excluded: <reason>` — drop to excluded with your stated reason
130+
- `merge into <other-slug>` — offer when the row is a precondition for another row already in the list
131+
132+
Apply the answers before writing. For any `excluded:` outcome, write the reason verbatim into the yaml and add an `issue:` URL if the user supplies one. A `merge` outcome means: drop this row, and append its `text:` to the surviving row's `text:` separated by `/` so the traceability isn't lost.
133+
134+
## Step 7: Rewrite the YAML
135+
136+
Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/seps/sep-2164.yaml`).
137+
138+
**Key order within each row** — for `check:` rows the **`check:` key comes first**, then `text:`, then any optional `url:`. Scanning the left margin should reveal every check ID without reading the quoted sentences. For `excluded:` rows the order is **`text:` first**, then `excluded:`, then optional `issue:` — there's no ID to scan for, so lead with the requirement.
139+
140+
**Row order in the file** — all `check:` rows first (in spec-diff order), then **all `excluded:` rows grouped at the bottom**, separated from the checks by **one blank line**. Do not interleave.
104141

105-
Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/scenarios/server/sep-2164.yaml`).
142+
```yaml
143+
requirements:
144+
- check: sep-NNNN-first-slug
145+
text: '...'
146+
- check: sep-NNNN-second-slug
147+
text: '...'
148+
149+
- text: '...'
150+
excluded: 'reason'
151+
issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
152+
```
106153
107154
If a requirement is ambiguous or you're not confident, leave it as a `TODO:` row rather than guessing — humans review this yaml before scenarios get written.
108155

@@ -111,25 +158,44 @@ Also fix the `spec_url`: the CLI emits the page URL with no anchor. If the requi
111158
If a requirement comes from a **different spec page** than `spec_url` (the SEP touched multiple `.mdx` files — the CLI prints these as "PR also changes N other spec file(s)"), give that row a full `url:` override:
112159

113160
```yaml
114-
- text: '...'
115-
check: sep-NNNN-slug
161+
- check: sep-NNNN-slug
162+
text: '...'
116163
url: https://modelcontextprotocol.io/specification/draft/other/page#anchor
117164
```
118165

119166
A row's effective spec reference is `row.url ?? file.spec_url`.
120167

121168
Write the result back to `$YAML`.
122169

123-
## Step 7: Hand-off
170+
## Step 8: Suggest a host scenario
171+
172+
`AGENTS.md` prefers **fewer scenarios with more checks** over one-scenario-per-check. Before telling the user to write a new scenario, look for an existing one the new checks could be folded into.
173+
174+
Determine the suite directory from the requirement subjects ("MCP clients MUST…" → `client/`, "Servers MUST…" → `server/`, "authorization servers MUST…" → `authorization-server/`; a SEP may map to more than one). Then search that directory for scenarios touching the same spec area:
175+
176+
```bash
177+
rg -l -i '<domain-term>|<domain-term-2>' src/scenarios/<suite>/ --type ts
178+
```
179+
180+
Pick 2–3 domain terms from the SEP's subject matter (for a discovery SEP: `metadata`, `well-known`; for an auth-response SEP: `redirect`, `callback`, `pkce`). For each hit, pull the scenario's `name`/`description` to confirm relevance:
181+
182+
```bash
183+
rg -A1 'name:|description:' <hit.ts>
184+
```
185+
186+
If you find a plausible host, recommend it by path. If nothing fits, say so explicitly — a new scenario file is then the right call.
187+
188+
## Step 9: Hand-off
124189

125190
Report to the user, in this order:
126191

127192
1. Path to the generated yaml.
128-
2. Number of rows extracted (e.g. "3 `check:` rows, 1 `excluded:` row").
129-
3. Any requirements you marked TODO and why.
130-
4. Reminder of the next steps the user still owns:
131-
- implement the TypeScript scenario under `src/scenarios/<target>/`,
132-
- register it in the appropriate suite list in `src/scenarios/index.ts` (`AGENTS.md:48`),
193+
2. Row counts: "`N check:` rows, `M excluded:` rows" — and note which exclusions the user signed off in Step 6.
194+
3. Any requirements you left as `TODO:` and why.
195+
4. **Host-scenario recommendation** from Step 8 — either "consider adding these checks to `src/scenarios/<suite>/<file>.ts` (it already exercises _X_)" or "no existing scenario covers this area; a new file is appropriate".
196+
5. Remaining next steps the user owns:
197+
- add the checks to the host scenario (or create one) under `src/scenarios/{client,server,authorization-server}/`,
198+
- register any new scenario in `src/scenarios/index.ts` (`AGENTS.md:48`),
133199
- add a passing example to the everything-client/server and a negative test, per `AGENTS.md:74-81`.
134200

135-
Do **not** generate the scenario `.ts` file or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml.
201+
Do **not** generate or edit scenario `.ts` files or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml plus the recommendation.

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Scaffold the requirement-traceability YAML with:
7979
npx @modelcontextprotocol/conformance new-sep <NNNN>
8080
```
8181

82-
The command looks up PR #`<NNNN>` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, picks `src/scenarios/{client,server,authorization-server}/` from the spec path, and writes `sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.
82+
The command looks up PR #`<NNNN>` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, and writes `src/seps/sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.
8383

8484
## Examples: prove it passes and fails
8585

src/new-sep/index.test.ts

Lines changed: 13 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import { describe, it, expect } from 'vitest';
2-
import { specPathToUrl, inferTarget, renderYaml } from './index';
2+
import { specPathToUrl, renderYaml } from './index';
33

44
describe('specPathToUrl', () => {
55
it('strips the docs/specification/draft/ prefix and .mdx suffix', () => {
@@ -21,39 +21,6 @@ describe('specPathToUrl', () => {
2121
});
2222
});
2323

24-
describe('inferTarget', () => {
25-
it('returns server for server/ paths', () => {
26-
expect(
27-
inferTarget('docs/specification/draft/server/resources.mdx')
28-
).toEqual({ target: 'server', inferred: false });
29-
});
30-
31-
it('returns client for client/ paths', () => {
32-
expect(inferTarget('docs/specification/draft/client/sampling.mdx')).toEqual(
33-
{ target: 'client', inferred: false }
34-
);
35-
});
36-
37-
it('returns authorization-server for basic/authorization* paths', () => {
38-
expect(
39-
inferTarget('docs/specification/draft/basic/authorization.mdx')
40-
).toEqual({ target: 'authorization-server', inferred: false });
41-
});
42-
43-
it('falls back to server (inferred) for unrecognized paths', () => {
44-
expect(inferTarget('docs/specification/draft/basic/lifecycle.mdx')).toEqual(
45-
{ target: 'server', inferred: true }
46-
);
47-
});
48-
49-
it('accepts paths already stripped of the prefix', () => {
50-
expect(inferTarget('client/elicitation.mdx')).toEqual({
51-
target: 'client',
52-
inferred: false
53-
});
54-
});
55-
});
56-
5724
describe('renderYaml', () => {
5825
it('emits placeholder yaml in the sep-2164.yaml style', () => {
5926
const out = renderYaml({
@@ -65,8 +32,9 @@ describe('renderYaml', () => {
6532
`sep: 9999
6633
spec_url: https://modelcontextprotocol.io/specification/draft/server/resources
6734
requirements:
68-
- text: 'TODO: quote the normative sentence from the spec diff'
69-
check: sep-9999-todo
35+
- check: sep-9999-todo
36+
text: 'TODO: quote the normative sentence from the spec diff'
37+
7038
- text: 'TODO: requirement that cannot be tested'
7139
excluded: 'TODO: reason'
7240
issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
@@ -99,10 +67,11 @@ requirements:
9967
`sep: 2164
10068
spec_url: https://modelcontextprotocol.io/specification/draft/server/resources#error-handling
10169
requirements:
102-
- text: 'Servers MUST NOT return an empty contents array for a non-existent resource'
103-
check: sep-2164-no-empty-contents
104-
- text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'
105-
check: sep-2164-error-code
70+
- check: sep-2164-no-empty-contents
71+
text: 'Servers MUST NOT return an empty contents array for a non-existent resource'
72+
- check: sep-2164-error-code
73+
text: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'
74+
10675
- text: 'clients SHOULD also accept -32002 as a resource not found error'
10776
excluded: 'Client-side error handling is implementation-defined; not protocol-observable'
10877
`
@@ -126,10 +95,10 @@ requirements:
12695
`sep: 1234
12796
spec_url: https://modelcontextprotocol.io/specification/draft/server/a
12897
requirements:
129-
- text: 'from primary file'
130-
check: sep-1234-a
131-
- text: 'from secondary file'
132-
check: sep-1234-b
98+
- check: sep-1234-a
99+
text: 'from primary file'
100+
- check: sep-1234-b
101+
text: 'from secondary file'
133102
url: https://modelcontextprotocol.io/specification/draft/server/b#x
134103
`
135104
);

0 commit comments

Comments
 (0)