You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add specVersion classification to conformance scenarios (#147)
* feat: add specVersions classification to conformance scenarios
Each scenario declares which spec versions it applies to as a list.
Scenarios that carry forward (e.g. initialize) list all applicable versions
['2025-06-18', '2025-11-25']. Scenarios removed from newer specs (e.g.
backcompat auth) only list their original version ['2025-03-26'].
- specVersions list on Scenario and ClientScenario interfaces
- --spec-version CLI filter uses simple .includes()
- Tier-check conformance matrix (Server / Client: Core / Client: Auth)
with per-version columns and unique All* count
- 7 unit tests for specVersions helpers
- Updated tier-audit skill docs with matrix format
* fix: fix console output template for tier audit skill
The template had an orphan table header (Check | Value | T2 | T1)
with no rows above the conformance matrix, causing an empty table
to render. The scorecard rows below the matrix also lacked their
own header.
Fix: two self-contained tables with clear labels — 'Conformance:'
for the per-version matrix, 'Scorecard:' for the check rows.
* fix: align skill console template with tier-check script output
Add asterisk footnote ('unique scenarios — a scenario may apply to
multiple spec versions') and rename 'Scorecard' to 'Repository Health'
to match the labels used by the tier-check CLI output.
* fix: add specVersions to CrossAppAccessCompleteFlowScenario
Added in 83c446d on main after this branch diverged.
Copy file name to clipboardExpand all lines: .claude/skills/mcp-sdk-tier-audit/SKILL.md
+30-15Lines changed: 30 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,9 @@ npm run --silent tier-check -- \
66
66
67
67
If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped).
68
68
69
-
The CLI output includes server conformance pass rate, client conformance pass rate, issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
69
+
The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
70
+
71
+
The conformance results now include a `specVersions` field on each detail entry, enabling per-version pass rate analysis. The `list` command also shows spec version tags: `node dist/index.js list` shows `[2025-06-18]`, `[2025-11-25]`, `[draft]`, or `[extension]` next to each scenario.
70
72
71
73
### Conformance Baseline Check
72
74
@@ -143,17 +145,21 @@ If any Tier 2 requirement is not met, the SDK is Tier 3.
143
145
- If GitHub issue labels are not set up per SEP-1730, triage metrics cannot be computed. Note this as a gap. However, repos may use GitHub's native issue types instead of type labels — the CLI checks for both.
144
146
- If client conformance was skipped (no client command found), note this as a gap but do not block tier advancement based on it alone.
145
147
146
-
**Client Conformance Splits:**
148
+
**Conformance Breakdown:**
149
+
150
+
The **full suite** pass rates (server total, client total) are used for tier threshold checks. To interpret them, present a single conformance matrix combining server and client results. Each detail entry in the tier-check JSON has a `specVersions` field; client category is derived from the scenario name (`auth/` prefix = Auth, everything else = Core). Server scenarios are all Core.
147
151
148
-
When reporting client conformance, always break results into three categories:
The **full suite** number is used for tier threshold checks. However, the core vs auth split provides essential context. Always present both numbers in the report.
160
+
This immediately shows where failures concentrate. Failures clustered in Client: Auth / `2025-11-25` means "new auth features not yet implemented" — a scope gap, not a quality problem. Failures in Server or Client: Core are more concerning.
155
161
156
-
If the SDK has a `baseline.yml` or expected-failures file, note which failures are known/tracked vs. unexpected regressions. A low full-suite score where all failures are auth scenarios documented in the baseline is a scope gap (OAuth not yet implemented), not a quality problem — flag it accordingly in the assessment.
162
+
If the SDK has a `baseline.yml` or expected-failures file, cross-reference with the matrix to identify whether baselined failures cluster in a specific cell (e.g. all in `2025-11-25` / Client: Auth = scope gap).
157
163
158
164
**P0 Label Audit Guidance:**
159
165
@@ -197,12 +203,24 @@ After the subagents finish, output a short executive summary directly to the use
Copy file name to clipboardExpand all lines: .claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,12 +32,13 @@ Source: `modelcontextprotocol/docs/community/sdk-tiers.mdx` in the spec reposito
32
32
33
33
## Conformance Score Calculation
34
34
35
-
Conformance scores are calculated against **applicable required tests** only:
35
+
Every scenario in the conformance suite has a `specVersions` field indicating which spec version it targets. The valid values are defined as the `SpecVersion` type (as a list) in `src/types.ts` — run `node dist/index.js list` to see the current mapping of scenarios to spec versions.
36
36
37
-
- Tests for the specification version the SDK targets
Date-versioned scenarios (e.g. `2025-06-18`, `2025-11-25`) count toward tier scoring. `draft` and `extension` scenarios are listed separately as informational.
38
+
39
+
The `--spec-version` CLI flag filters scenarios cumulatively for date versions (e.g. `--spec-version 2025-06-18` includes `2025-03-26` + `2025-06-18`). For `draft`/`extension`, it returns exact matches only.
40
+
41
+
The tier-check output includes a per-version pass rate breakdown alongside the aggregate.
'Tests OAuth flow with Client ID Metadata Documents (SEP-991/URL-based client IDs). Server advertises client_id_metadata_document_supported=true and client should use URL as client_id instead of DCR.';
0 commit comments