Skip to content

Commit 3f9048a

Browse files
feat: add specVersion classification to conformance scenarios (#147)
* feat: add specVersions classification to conformance scenarios Each scenario declares which spec versions it applies to as a list. Scenarios that carry forward (e.g. initialize) list all applicable versions ['2025-06-18', '2025-11-25']. Scenarios removed from newer specs (e.g. backcompat auth) only list their original version ['2025-03-26']. - specVersions list on Scenario and ClientScenario interfaces - --spec-version CLI filter uses simple .includes() - Tier-check conformance matrix (Server / Client: Core / Client: Auth) with per-version columns and unique All* count - 7 unit tests for specVersions helpers - Updated tier-audit skill docs with matrix format * fix: fix console output template for tier audit skill The template had an orphan table header (Check | Value | T2 | T1) with no rows above the conformance matrix, causing an empty table to render. The scorecard rows below the matrix also lacked their own header. Fix: two self-contained tables with clear labels — 'Conformance:' for the per-version matrix, 'Scorecard:' for the check rows. * fix: align skill console template with tier-check script output Add asterisk footnote ('unique scenarios — a scenario may apply to multiple spec versions') and rename 'Scorecard' to 'Repository Health' to match the labels used by the tier-check CLI output. * fix: add specVersions to CrossAppAccessCompleteFlowScenario Added in 83c446d on main after this branch diverged.
1 parent dd14862 commit 3f9048a

33 files changed

Lines changed: 521 additions & 59 deletions

.claude/skills/mcp-sdk-tier-audit/SKILL.md

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,9 @@ npm run --silent tier-check -- \
6666

6767
If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped).
6868

69-
The CLI output includes server conformance pass rate, client conformance pass rate, issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
69+
The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
70+
71+
The conformance results now include a `specVersions` field on each detail entry, enabling per-version pass rate analysis. The `list` command also shows spec version tags: `node dist/index.js list` shows `[2025-06-18]`, `[2025-11-25]`, `[draft]`, or `[extension]` next to each scenario.
7072

7173
### Conformance Baseline Check
7274

@@ -143,17 +145,21 @@ If any Tier 2 requirement is not met, the SDK is Tier 3.
143145
- If GitHub issue labels are not set up per SEP-1730, triage metrics cannot be computed. Note this as a gap. However, repos may use GitHub's native issue types instead of type labels — the CLI checks for both.
144146
- If client conformance was skipped (no client command found), note this as a gap but do not block tier advancement based on it alone.
145147

146-
**Client Conformance Splits:**
148+
**Conformance Breakdown:**
149+
150+
The **full suite** pass rates (server total, client total) are used for tier threshold checks. To interpret them, present a single conformance matrix combining server and client results. Each detail entry in the tier-check JSON has a `specVersions` field; client category is derived from the scenario name (`auth/` prefix = Auth, everything else = Core). Server scenarios are all Core.
147151

148-
When reporting client conformance, always break results into three categories:
152+
Example:
149153

150-
1. **Core suite** — Non-auth scenarios (e.g. initialize, tools_call, elicitation, sse-retry)
151-
2. **Auth suite** — OAuth/authorization scenarios (any scenario starting with `auth/`)
152-
3. **Full suite** — All scenarios combined
154+
| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All\* |
155+
| ------------ | ---------- | ---------- | ---------- | ----- | --------- | ------------ |
156+
| Server || 26/26 | 4/4 ||| 30/30 (100%) |
157+
| Client: Core || 2/2 | 2/2 ||| 4/4 (100%) |
158+
| Client: Auth | 0/2 | 3/3 | 6/11 | 0/1 | 0/2 | 9/19 (47%) |
153159

154-
The **full suite** number is used for tier threshold checks. However, the core vs auth split provides essential context. Always present both numbers in the report.
160+
This immediately shows where failures concentrate. Failures clustered in Client: Auth / `2025-11-25` means "new auth features not yet implemented" — a scope gap, not a quality problem. Failures in Server or Client: Core are more concerning.
155161

156-
If the SDK has a `baseline.yml` or expected-failures file, note which failures are known/tracked vs. unexpected regressions. A low full-suite score where all failures are auth scenarios documented in the baseline is a scope gap (OAuth not yet implemented), not a quality problem — flag it accordingly in the assessment.
162+
If the SDK has a `baseline.yml` or expected-failures file, cross-reference with the matrix to identify whether baselined failures cluster in a specific cell (e.g. all in `2025-11-25` / Client: Auth = scope gap).
157163

158164
**P0 Label Audit Guidance:**
159165

@@ -197,12 +203,24 @@ After the subagents finish, output a short executive summary directly to the use
197203
```
198204
## <sdk-name> — Tier <X>
199205
206+
Conformance:
207+
208+
| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All* | T2 | T1 |
209+
|--------------|------------|------------|------------|-------|-----------|-------|----|----|
210+
| Server | — | pass/total | pass/total | — | — | pass/total (rate%) | ✓/✗ | ✓/✗ |
211+
| Client: Core | — | pass/total | pass/total | — | — | pass/total (rate%) | — | — |
212+
| Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — |
213+
| **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** |
214+
215+
\* unique scenarios — a scenario may apply to multiple spec versions
216+
217+
If a baseline file was found, add a note below the conformance table:
218+
> **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}).
219+
220+
Repository Health:
221+
200222
| Check | Value | T2 | T1 |
201223
|-------|-------|----|----|
202-
| Server Conformance | <passed>/<total> (<rate>%) | ✓/✗ | ✓/✗ |
203-
| Client Conformance (full) | <passed>/<total> (<rate>%) | ✓/✗ | ✓/✗ |
204-
| — Core scenarios | <core_pass>/<core_total> (<rate>%) | — | — |
205-
| — Auth scenarios | <auth_pass>/<auth_total> (<rate>%) | — | — |
206224
| Issue Triage | <rate>% (<triaged>/<total>) | ✓/✗ | ✓/✗ |
207225
| Labels | <present>/<required> | ✓/✗ | ✓/✗ |
208226
| P0 Resolution | <count> open | ✓/✗ | ✓/✗ |
@@ -213,9 +231,6 @@ After the subagents finish, output a short executive summary directly to the use
213231
| Versioning Policy | <summary> | N/A | ✓/✗ |
214232
| Stable Release | <version> | ✓/✗ | ✓/✗ |
215233
216-
If a baseline file was found, add a note below the table:
217-
> **Baseline**: {N} failures in `baseline.yml` ({list of categories, e.g. "18 auth scenarios"}). Core suite: {core_rate}%.
218-
219234
---
220235
221236
**High-Priority Fixes:**

.claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,13 @@ Source: `modelcontextprotocol/docs/community/sdk-tiers.mdx` in the spec reposito
3232

3333
## Conformance Score Calculation
3434

35-
Conformance scores are calculated against **applicable required tests** only:
35+
Every scenario in the conformance suite has a `specVersions` field indicating which spec version it targets. The valid values are defined as the `SpecVersion` type (as a list) in `src/types.ts` — run `node dist/index.js list` to see the current mapping of scenarios to spec versions.
3636

37-
- Tests for the specification version the SDK targets
38-
- Excluding tests marked as pending or skipped
39-
- Excluding tests for experimental features
40-
- Excluding legacy backward-compatibility tests (unless the SDK claims legacy support)
37+
Date-versioned scenarios (e.g. `2025-06-18`, `2025-11-25`) count toward tier scoring. `draft` and `extension` scenarios are listed separately as informational.
38+
39+
The `--spec-version` CLI flag filters scenarios cumulatively for date versions (e.g. `--spec-version 2025-06-18` includes `2025-03-26` + `2025-06-18`). For `draft`/`extension`, it returns exact matches only.
40+
41+
The tier-check output includes a per-version pass rate breakdown alongside the aggregate.
4142

4243
## Tier Relegation Rules
4344

src/index.ts

Lines changed: 94 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,13 @@ import {
1919
listMetadataScenarios,
2020
listCoreScenarios,
2121
listExtensionScenarios,
22-
listBackcompatScenarios
22+
listBackcompatScenarios,
23+
listScenariosForSpec,
24+
listClientScenariosForSpec,
25+
getScenarioSpecVersions,
26+
ALL_SPEC_VERSIONS
2327
} from './scenarios';
28+
import type { SpecVersion } from './scenarios';
2429
import { ConformanceCheck } from './types';
2530
import { ClientOptionsSchema, ServerOptionsSchema } from './schemas';
2631
import {
@@ -31,6 +36,32 @@ import {
3136
import { createTierCheckCommand } from './tier-check';
3237
import packageJson from '../package.json';
3338

39+
function resolveSpecVersion(value: string): SpecVersion {
40+
if (ALL_SPEC_VERSIONS.includes(value as SpecVersion)) {
41+
return value as SpecVersion;
42+
}
43+
console.error(`Unknown spec version: ${value}`);
44+
console.error(`Valid versions: ${ALL_SPEC_VERSIONS.join(', ')}`);
45+
process.exit(1);
46+
}
47+
48+
// Note on naming: `command` refers to which CLI command is calling this.
49+
// The `client` command tests Scenario objects (which test clients),
50+
// and the `server` command tests ClientScenario objects (which test servers).
51+
// This matches the inverted naming in scenarios/index.ts.
52+
function filterScenariosBySpecVersion(
53+
allScenarios: string[],
54+
version: SpecVersion,
55+
command: 'client' | 'server'
56+
): string[] {
57+
const versionScenarios =
58+
command === 'client'
59+
? listScenariosForSpec(version)
60+
: listClientScenariosForSpec(version);
61+
const allowed = new Set(versionScenarios);
62+
return allScenarios.filter((s) => allowed.has(s));
63+
}
64+
3465
const program = new Command();
3566

3667
program
@@ -53,12 +84,19 @@ program
5384
'Path to YAML file listing expected failures (baseline)'
5485
)
5586
.option('-o, --output-dir <path>', 'Save results to this directory')
87+
.option(
88+
'--spec-version <version>',
89+
'Filter scenarios by spec version (cumulative for date versions)'
90+
)
5691
.option('--verbose', 'Show verbose output')
5792
.action(async (options) => {
5893
try {
5994
const timeout = parseInt(options.timeout, 10);
6095
const verbose = options.verbose ?? false;
6196
const outputDir = options.outputDir;
97+
const specVersionFilter = options.specVersion
98+
? resolveSpecVersion(options.specVersion)
99+
: undefined;
62100

63101
// Handle suite mode
64102
if (options.suite) {
@@ -85,7 +123,14 @@ program
85123
process.exit(1);
86124
}
87125

88-
const scenarios = suites[suiteName]();
126+
let scenarios = suites[suiteName]();
127+
if (specVersionFilter) {
128+
scenarios = filterScenariosBySpecVersion(
129+
scenarios,
130+
specVersionFilter,
131+
'client'
132+
);
133+
}
89134
console.log(
90135
`Running ${suiteName} suite (${scenarios.length} scenarios) in parallel...\n`
91136
);
@@ -262,6 +307,10 @@ program
262307
'Path to YAML file listing expected failures (baseline)'
263308
)
264309
.option('-o, --output-dir <path>', 'Save results to this directory')
310+
.option(
311+
'--spec-version <version>',
312+
'Filter scenarios by spec version (cumulative for date versions)'
313+
)
265314
.option('--verbose', 'Show verbose output (JSON instead of pretty print)')
266315
.action(async (options) => {
267316
try {
@@ -270,6 +319,9 @@ program
270319

271320
const verbose = options.verbose ?? false;
272321
const outputDir = options.outputDir;
322+
const specVersionFilter = options.specVersion
323+
? resolveSpecVersion(options.specVersion)
324+
: undefined;
273325

274326
// If a single scenario is specified, run just that one
275327
if (validated.scenario) {
@@ -317,6 +369,14 @@ program
317369
process.exit(1);
318370
}
319371

372+
if (specVersionFilter) {
373+
scenarios = filterScenariosBySpecVersion(
374+
scenarios,
375+
specVersionFilter,
376+
'server'
377+
);
378+
}
379+
320380
console.log(
321381
`Running ${suite} suite (${scenarios.length} scenarios) against ${validated.url}\n`
322382
);
@@ -393,20 +453,48 @@ program
393453
.description('List available test scenarios')
394454
.option('--client', 'List client scenarios')
395455
.option('--server', 'List server scenarios')
456+
.option(
457+
'--spec-version <version>',
458+
'Filter scenarios by spec version (cumulative for date versions)'
459+
)
396460
.action((options) => {
461+
const specVersionFilter = options.specVersion
462+
? resolveSpecVersion(options.specVersion)
463+
: undefined;
464+
397465
if (options.server || (!options.client && !options.server)) {
398466
console.log('Server scenarios (test against a server):');
399-
const serverScenarios = listClientScenarios();
400-
serverScenarios.forEach((s) => console.log(` - ${s}`));
467+
let serverScenarios = listClientScenarios();
468+
if (specVersionFilter) {
469+
serverScenarios = filterScenariosBySpecVersion(
470+
serverScenarios,
471+
specVersionFilter,
472+
'server'
473+
);
474+
}
475+
serverScenarios.forEach((s) => {
476+
const v = getScenarioSpecVersions(s);
477+
console.log(` - ${s}${v ? ` [${v}]` : ''}`);
478+
});
401479
}
402480

403481
if (options.client || (!options.client && !options.server)) {
404482
if (options.server || (!options.client && !options.server)) {
405483
console.log('');
406484
}
407485
console.log('Client scenarios (test against a client):');
408-
const clientScenarios = listScenarios();
409-
clientScenarios.forEach((s) => console.log(` - ${s}`));
486+
let clientScenarioNames = listScenarios();
487+
if (specVersionFilter) {
488+
clientScenarioNames = filterScenariosBySpecVersion(
489+
clientScenarioNames,
490+
specVersionFilter,
491+
'client'
492+
);
493+
}
494+
clientScenarioNames.forEach((s) => {
495+
const v = getScenarioSpecVersions(s);
496+
console.log(` - ${s}${v ? ` [${v}]` : ''}`);
497+
});
410498
}
411499
});
412500

src/scenarios/client/auth/basic-cimd.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import type { Scenario, ConformanceCheck } from '../../../types';
2-
import { ScenarioUrls } from '../../../types';
2+
import { ScenarioUrls, SpecVersion } from '../../../types';
33
import { createAuthServer } from './helpers/createAuthServer';
44
import { createServer } from './helpers/createServer';
55
import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -22,6 +22,7 @@ export const CIMD_CLIENT_METADATA_URL =
2222
*/
2323
export class AuthBasicCIMDScenario implements Scenario {
2424
name = 'auth/basic-cimd';
25+
specVersions: SpecVersion[] = ['2025-11-25'];
2526
description =
2627
'Tests OAuth flow with Client ID Metadata Documents (SEP-991/URL-based client IDs). Server advertises client_id_metadata_document_supported=true and client should use URL as client_id instead of DCR.';
2728
private authServer = new ServerLifecycle();

src/scenarios/client/auth/client-credentials.ts

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
import * as jose from 'jose';
22
import type { CryptoKey } from 'jose';
3-
import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
3+
import type {
4+
Scenario,
5+
ConformanceCheck,
6+
ScenarioUrls,
7+
SpecVersion
8+
} from '../../../types';
49
import { createAuthServer } from './helpers/createAuthServer';
510
import { createServer } from './helpers/createServer';
611
import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -32,6 +37,7 @@ async function generateTestKeypair(): Promise<{
3237
*/
3338
export class ClientCredentialsJwtScenario implements Scenario {
3439
name = 'auth/client-credentials-jwt';
40+
specVersions: SpecVersion[] = ['extension'];
3541
description =
3642
'Tests OAuth client_credentials flow with private_key_jwt authentication (SEP-1046)';
3743

@@ -250,6 +256,7 @@ export class ClientCredentialsJwtScenario implements Scenario {
250256
*/
251257
export class ClientCredentialsBasicScenario implements Scenario {
252258
name = 'auth/client-credentials-basic';
259+
specVersions: SpecVersion[] = ['extension'];
253260
description =
254261
'Tests OAuth client_credentials flow with client_secret_basic authentication';
255262

src/scenarios/client/auth/cross-app-access.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
import * as jose from 'jose';
22
import type { CryptoKey } from 'jose';
33
import express, { type Request, type Response } from 'express';
4-
import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
4+
import type {
5+
Scenario,
6+
ConformanceCheck,
7+
ScenarioUrls,
8+
SpecVersion
9+
} from '../../../types';
510
import { createAuthServer } from './helpers/createAuthServer';
611
import { createServer } from './helpers/createServer';
712
import { MockTokenVerifier } from './helpers/mockTokenVerifier';
@@ -55,6 +60,7 @@ async function createIdpIdToken(
5560
*/
5661
export class CrossAppAccessCompleteFlowScenario implements Scenario {
5762
name = 'auth/cross-app-access-complete-flow';
63+
specVersions: SpecVersion[] = ['extension'];
5864
description =
5965
'Tests complete SEP-990 flow: token exchange + JWT bearer grant (Enterprise Managed OAuth)';
6066

src/scenarios/client/auth/discovery-metadata.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ function createMetadataScenario(config: MetadataScenarioConfig): Scenario {
8787

8888
return {
8989
name: `auth/${config.name}`,
90+
specVersions: ['2025-11-25'],
9091
description: `Tests Basic OAuth metadata discovery flow.
9192
9293
**PRM:** ${config.prmLocation}${config.inWwwAuth ? '' : ' (not in WWW-Authenticate)'}

src/scenarios/client/auth/march-spec-backcompat.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import type { Scenario, ConformanceCheck } from '../../../types';
2-
import { ScenarioUrls } from '../../../types';
2+
import { ScenarioUrls, SpecVersion } from '../../../types';
33
import { createAuthServer } from './helpers/createAuthServer';
44
import { createServer } from './helpers/createServer';
55
import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -8,6 +8,7 @@ import { SpecReferences } from './spec-references';
88

99
export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario {
1010
name = 'auth/2025-03-26-oauth-metadata-backcompat';
11+
specVersions: SpecVersion[] = ['2025-03-26'];
1112
description =
1213
'Tests 2025-03-26 spec OAuth flow: no PRM (Protected Resource Metadata), OAuth metadata at root location';
1314
private server = new ServerLifecycle();
@@ -68,6 +69,7 @@ export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario {
6869

6970
export class Auth20250326OEndpointFallbackScenario implements Scenario {
7071
name = 'auth/2025-03-26-oauth-endpoint-fallback';
72+
specVersions: SpecVersion[] = ['2025-03-26'];
7173
description =
7274
'Tests OAuth flow with no metadata endpoints, relying on fallback to standard OAuth endpoints at server root (2025-03-26 spec behavior)';
7375
private server = new ServerLifecycle();

src/scenarios/client/auth/pre-registration.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
1+
import type {
2+
Scenario,
3+
ConformanceCheck,
4+
ScenarioUrls,
5+
SpecVersion
6+
} from '../../../types';
27
import { createAuthServer } from './helpers/createAuthServer';
38
import { createServer } from './helpers/createServer';
49
import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -19,6 +24,7 @@ const PRE_REGISTERED_CLIENT_SECRET = 'pre-registered-secret';
1924
*/
2025
export class PreRegistrationScenario implements Scenario {
2126
name = 'auth/pre-registration';
27+
specVersions: SpecVersion[] = ['2025-11-25'];
2228
description =
2329
'Tests OAuth flow with pre-registered client credentials. Server does not support DCR.';
2430

0 commit comments

Comments
 (0)