You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consolidate SEP yamls into src/seps/ and tighten yaml ordering
- All sep-NNNN.yaml files now live in src/seps/ regardless of which
scenario suite their checks run in
- Drop --target / inferTarget; output dir is fixed
- renderYaml: check rows emit `check:` first, excluded rows grouped at
bottom with a blank-line separator
- SKILL.md: document key/row ordering, expand exclusion-confirmation step
- Move src/scenarios/server/sep-2164.yaml -> src/seps/
(For development against a non-built source tree: `npx tsx src/index.ts new-sep ...`.)
52
51
53
-
The CLI writes `src/scenarios/<target>/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.
52
+
The CLI writes `src/seps/sep-<NNNN>.yaml` with `sep`, `spec_url`, and two TODO `requirements[]` rows. Capture the output path from the CLI's `Wrote …` line and remember it as `$YAML`.
54
53
55
54
If the CLI errors with "does not change any docs/specification/draft/\*.mdx", the SEP's spec changes landed in a separate PR — ask the user for the spec file path and rerun with `--spec-path docs/specification/draft/<path>`. Do not guess.
56
55
@@ -80,29 +79,77 @@ Walk the added lines and identify sentences containing the keywords: **MUST**, *
80
79
>
81
80
> - Resource not found: -32602 (Invalid Params)
82
81
83
-
The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/scenarios/server/sep-2164.yaml` for the canonical example.
82
+
The bullet inherits `SHOULD`. The yaml row should quote the _combined_ obligation: `'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)'` — see `src/seps/sep-2164.yaml` for the canonical example.
84
83
85
84
**Regex alone is insufficient** (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in.
| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE |`check: sep-<NNNN>-<slug>`|
93
+
| SHOULD / SHOULD NOT | WARNING |`check: sep-<NNNN>-<slug>`|
94
+
| MAY / OPTIONAL |— |_no row — skip entirely_|
96
95
97
-
If a requirement is testable in principle but you can't see how to drive it from the harness, write a `check:` row anyway and leave it for the human to wire up — do **not** silently demote to `excluded:`.
96
+
MAY / OPTIONAL sentences are noted in Step 4 only so you consciously skip them — they never produce a yaml row.
98
97
99
-
Use `excluded:` only when the requirement genuinely can't be protocol-observed (e.g. "clients SHOULD also accept -32002" — the conformance harness tests servers, so client-side acceptance is not observable here). When you use `excluded:`, write the reason verbatim and add an `issue:` URL if there's a tracking issue.
98
+
A row is `excluded:` when a MUST/SHOULD requirement can't be protocol-observed by the harness. Do **not** write any `excluded:` row on your own authority — every exclusion goes through Step 6.
99
+
100
+
While classifying, sort each MUST/SHOULD row into one of three buckets:
101
+
102
+
-**`check:`** — observably testable on the wire.
103
+
-**clearly-excluded** — you're confident it can't be observed (e.g. "clients SHOULD also accept -32002" when the harness only drives servers).
104
+
-**borderline** — you'd default to `check:` but observability is questionable. Markers:
105
+
-_Internal state_ — verbs like _record_, _store_, _associate_, _track_, _cache_. The harness sees wire traffic, not memory; usually only observable via a downstream row already in your list.
106
+
-_UI / human-facing_ — _display_, _show_, _render_, _prompt the user_.
107
+
-_Precondition phrasing_ — "Before doing X, the implementation MUST Y" where X is itself another row.
100
108
101
109
Slug convention: lowercase-kebab, derived from the verb phrase. Examples from `sep-2164.yaml`: `no-empty-contents`, `error-code`. Same `id` is used for SUCCESS and FAILURE (`AGENTS.md:52`).
102
110
103
-
## Step 6: Rewrite the YAML
111
+
## Step 6: Confirm exclusions with the user
112
+
113
+
Nothing becomes `excluded:` without sign-off. Two rounds:
114
+
115
+
**Round 1 — clearly-excluded, single batch question.** One `AskUserQuestion` listing all clearly-excluded rows in the question body (slug + one-line reason each). Options:
116
+
117
+
-`Exclude all as listed (Recommended)`
118
+
-`Flip all to check:`
119
+
-`Let me adjust per-row` — if chosen, append these rows to round 2.
120
+
121
+
Skip this round if the bucket is empty.
122
+
123
+
**Round 2 — borderline, one question per row.** One `AskUserQuestion` call with a question per borderline row (loop in batches of 4 if needed). For each:
124
+
125
+
- header: the proposed slug
126
+
- question: quote the requirement sentence + your one-line observability concern
127
+
- options (list `check:` first — it's the default for borderline):
128
+
-`check:` — keep as a testable check
129
+
-`excluded: <reason>` — drop to excluded with your stated reason
130
+
-`merge into <other-slug>` — offer when the row is a precondition for another row already in the list
131
+
132
+
Apply the answers before writing. For any `excluded:` outcome, write the reason verbatim into the yaml and add an `issue:` URL if the user supplies one. A `merge` outcome means: drop this row, and append its `text:` to the surviving row's `text:` separated by `/` so the traceability isn't lost.
133
+
134
+
## Step 7: Rewrite the YAML
135
+
136
+
Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/seps/sep-2164.yaml`).
137
+
138
+
**Key order within each row** — for `check:` rows the **`check:` key comes first**, then `text:`, then any optional `url:`. Scanning the left margin should reveal every check ID without reading the quoted sentences. For `excluded:` rows the order is **`text:` first**, then `excluded:`, then optional `issue:` — there's no ID to scan for, so lead with the requirement.
139
+
140
+
**Row order in the file** — all `check:` rows first (in spec-diff order), then **all `excluded:` rows grouped at the bottom**, separated from the checks by **one blank line**. Do not interleave.
104
141
105
-
Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see `src/scenarios/server/sep-2164.yaml`).
If a requirement is ambiguous or you're not confident, leave it as a `TODO:` row rather than guessing — humans review this yaml before scenarios get written.
108
155
@@ -111,25 +158,44 @@ Also fix the `spec_url`: the CLI emits the page URL with no anchor. If the requi
111
158
If a requirement comes from a **different spec page** than `spec_url` (the SEP touched multiple `.mdx` files — the CLI prints these as "PR also changes N other spec file(s)"), give that row a full `url:` override:
A row's effective spec reference is `row.url ?? file.spec_url`.
120
167
121
168
Write the result back to `$YAML`.
122
169
123
-
## Step 7: Hand-off
170
+
## Step 8: Suggest a host scenario
171
+
172
+
`AGENTS.md`prefers **fewer scenarios with more checks** over one-scenario-per-check. Before telling the user to write a new scenario, look for an existing one the new checks could be folded into.
173
+
174
+
Determine the suite directory from the requirement subjects ("MCP clients MUST…" → `client/`, "Servers MUST…" → `server/`, "authorization servers MUST…" → `authorization-server/`; a SEP may map to more than one). Then search that directory for scenarios touching the same spec area:
Pick 2–3 domain terms from the SEP's subject matter (for a discovery SEP: `metadata`, `well-known`; for an auth-response SEP: `redirect`, `callback`, `pkce`). For each hit, pull the scenario's `name`/`description` to confirm relevance:
181
+
182
+
```bash
183
+
rg -A1 'name:|description:' <hit.ts>
184
+
```
185
+
186
+
If you find a plausible host, recommend it by path. If nothing fits, say so explicitly — a new scenario file is then the right call.
187
+
188
+
## Step 9: Hand-off
124
189
125
190
Report to the user, in this order:
126
191
127
192
1. Path to the generated yaml.
128
-
2. Number of rows extracted (e.g. "3 `check:` rows, 1 `excluded:` row").
129
-
3. Any requirements you marked TODO and why.
130
-
4. Reminder of the next steps the user still owns:
131
-
- implement the TypeScript scenario under `src/scenarios/<target>/`,
132
-
- register it in the appropriate suite list in `src/scenarios/index.ts` (`AGENTS.md:48`),
193
+
2. Row counts: "`N check:` rows, `M excluded:` rows"— and note which exclusions the user signed off in Step 6.
194
+
3. Any requirements you left as `TODO:` and why.
195
+
4. **Host-scenario recommendation** from Step 8 — either "consider adding these checks to `src/scenarios/<suite>/<file>.ts` (it already exercises _X_)" or "no existing scenario covers this area; a new file is appropriate".
196
+
5. Remaining next steps the user owns:
197
+
- add the checks to the host scenario (or create one) under `src/scenarios/{client,server,authorization-server}/`,
198
+
- register any new scenario in `src/scenarios/index.ts` (`AGENTS.md:48`),
133
199
- add a passing example to the everything-client/server and a negative test, per `AGENTS.md:74-81`.
134
200
135
-
Do **not** generate the scenario `.ts` file or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml.
201
+
Do **not** generate or edit scenario `.ts` files or touch `src/scenarios/index.ts`. The skill's scope ends at the yaml plus the recommendation.
The command looks up PR #`<NNNN>` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, picks `src/scenarios/{client,server,authorization-server}/` from the spec path, and writes `sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.
82
+
The command looks up PR #`<NNNN>` in `modelcontextprotocol/modelcontextprotocol` (SEP numbers are PR numbers), derives `spec_url` from the `docs/specification/draft/*.mdx` file it changes, and writes `src/seps/sep-<NNNN>.yaml` with TODO `requirements[]` rows. Use `--spec-path` or `--spec-url` to skip the lookup. The `new-sep` Claude Code skill drives the same flow end-to-end, parses the spec diff, and fills in the requirement rows.
0 commit comments