Skip to content

Commit 3b4a92b

Browse files
feat: SDK tier assessment CLI and skill (#142)
* feat: add tier-check CLI for SDK tier assessment Adds a 'tier-check' subcommand to the conformance tool that automates SDK tier assessment against SEP-1730 criteria. Checks performed: - Conformance test pass rate (via everything-server) - GitHub label taxonomy (priority/status/area labels) - Issue triage SLA compliance - P0 bug resolution tracking - Stable release detection - Required file existence (CHANGELOG, SECURITY, etc.) - Spec tracking (SDK release within 30d of spec release) Also includes a Claude Code skill (skills/mcp-sdk-tier-audit/) for judgment-based checks that require codebase analysis (feature coverage, docs quality, policy evaluation). Usage: npx tsx src/index.ts tier-check --repo modelcontextprotocol/typescript-sdk npx tsx src/index.ts tier-check --repo ... --conformance-server-cmd '...' \ --conformance-server-cwd ... --conformance-server-url ... --output json * refactor: revise tier-check CLI and skill based on review feedback - Move skill to .claude/skills/ so it's auto-available in Claude Code - Remove feature-coverage subagent (redundant with conformance tests) - Remove hardcoded ~/src/mcp paths from all skill files - Trim conformance server table to TS + Python only - Rename file_existence check to policy_signals (informational, not blocking) - Add GitHub native issue types detection to labels check - Add missing features to docs-coverage checklist (tasks, elicitation URL mode, JSON Schema 2020-12) - Add README with CLI quick start and escape hatch for non-Claude-Code users - Use --limit 500 instead of --limit 100 for gh issue list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review feedback on skill and README - Use npx @modelcontextprotocol/conformance instead of node dist/index.js - Add full GitHub auth instructions (gh auth login, GITHUB_TOKEN, --token) - Point TS SDK conformance server to typescript-sdk/test/conformance/ - Fix Python SDK URL to localhost:3001/mcp (not TBD) - Remove manual gh issue list / gh release list from SKILL.md (CLI handles it) - Remove Claude Code-specific subagent_type references - Assume user is already in conformance repo - Clean up policy-evaluation-prompt.md: remove redundant grep commands, focus on content evaluation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve lint errors in conformance.ts Remove unused variable assignments flagged by eslint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: apply prettier formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add npm run tier-check script, update docs with examples - Add "tier-check" npm script so users can run `npm run tier-check --` instead of `node dist/index.js tier-check` - Update SKILL.md, README, and skill README to use npm run tier-check - Add full conformance examples with --conformance-server-cmd/cwd/url flags and realistic paths (~/src/mcp/typescript-sdk) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix prettier formatting in SKILL.md * fix: add per-scenario timeout, support url-only conformance, fix stdout pollution - Add 30s per-scenario timeout to prevent tier-check from hanging - Allow --conformance-server-url without --conformance-server-cmd (server already running) - Move runner status logs to stderr so --output json produces clean JSON - Update SKILL.md and README with --silent flag and pre-start server workflow * refactor: skill takes local path + server URL instead of repo name The skill now requires two arguments: 1. Local path to the SDK checkout (for direct file inspection) 2. URL where the everything server is already running The GitHub owner/repo is derived from git remote. This eliminates cloning, server startup complexity, and branch confusion (v1.x vs main). * refactor: write detailed reports to files, show concise summary Reports go to results/tier-audits/<sdk>-<date>/ (already gitignored). Claude's console output is now just the tier classification, pass/fail summary line, top 3 actions, and file paths. * docs: update READMEs for new skill interface and pre-start workflow * simplify: flat file output instead of nested directory * fix: remediation always shows path to Tier 2 and Tier 1 * feat: add client conformance testing to tier-check CLI and skill - Add checkClientConformance() that runs core client scenarios (initialize, tools_call, elicitation-defaults, sse-retry, auth) by spawning the SDK's conformance client via --client-cmd - Add client_conformance to TierScorecard type - Wire --client-cmd option into the CLI - Update tier logic: both server + client conformance feed into Tier 1 (100%) and Tier 2 (>=80%) requirements - Update terminal and markdown output to show both conformance types - Update skill to auto-detect conformance client or accept explicit client-cmd argument - Update README with new option and examples * improve: table summary output, write reports via subagents - Change executive summary from pipe-delimited line to a readable table with T2/T1 columns - Move assessment and remediation file writing into parallel subagents to keep the main conversation thread clean * improve: list tier gaps as numbered items instead of one-line blob * improve: finalize summary format with separator, high-priority fixes, numbered gaps * improve: add pre-flight checks for gh auth and server reachability Fail fast with clear error messages if GitHub CLI is not authenticated or if the conformance server URL is not reachable, rather than failing deep into the scorecard run. * docs: improve README and fix skill auto-detection paths - Claude Code section: explain client-cmd auto-detection for TS/Python, show explicit 3-arg form for other SDKs, add examples for all three - Fix TypeScript build command (npm run build, not pnpm build:all) - Fix Python server command (add --port, use uv sync --package) - Fix Python client path (.github/actions/conformance/client.py) - Expand 'Other SDKs' section with guidance on everything server - Add gh auth login prerequisite to Claude Code steps * simplify: remove client-cmd auto-detection, require explicit argument Client command is now always passed as the third argument. If omitted, client conformance is skipped and noted as a gap. No more magic path detection — clearer and more predictable. * fix: align docs table with canonical list (48 features), simplify policy eval Docs coverage: - Table now has numbered rows matching all 48 non-experimental features from the canonical list (was missing 7: tools text/image/audio/embedded/ error/notifications, protocol version negotiation) - Hardcode total as 48 in summary so agents don't miscount Policy evaluation: - Simplified from deep content analysis to file-existence checks - Dependency policy: DEPENDENCY_POLICY.md, dependabot.yml, or CONTRIBUTING.md section - Roadmap: ROADMAP.md must exist (GitHub milestones alone not sufficient) - Versioning: VERSIONING.md or CONTRIBUTING.md section - Removed GitHub API calls for milestones and releases from policy eval * refactor: extract canonical feature list into single source of truth Create references/feature-list.md with all 48 non-experimental + 5 experimental features. The docs-coverage prompt now references this file instead of duplicating the list. One place to update when features change. * fix: separate deterministic file checks from AI content evaluation CLI (files.ts): now checks all policy files deterministically — DEPENDENCY_POLICY.md, docs/dependency-policy.md, dependabot.yml, renovate.json, ROADMAP.md, docs/roadmap.md, VERSIONING.md, docs/versioning.md, BREAKING_CHANGES.md (in addition to existing CHANGELOG.md, SECURITY.md, CONTRIBUTING.md). AI policy eval: receives CLI output showing which files exist, then reads ONLY those files to judge content quality. No longer searches the repo for files — clean separation of concerns. * style: apply prettier formatting * revert: undo unrelated console.log change in runner/server.ts * refactor: shell out to conformance CLI instead of reimplementing runner Address PR feedback: conformance.ts was duplicating the normal conformance running code. Now shells out to 'node dist/index.js server/client' with -o to save results to a temp dir, then parses the checks.json files. Also removes --conformance-server-cmd and --conformance-server-cwd options since the server must be pre-started. * docs: add Go and C# SDK examples to README and SKILL.md * fix: add --framework net9.0 to C# server command * rename: conformance.ts -> test-conformance-results.ts Avoids confusion with src/runner/ (the actual conformance runner). This file just invokes the CLI and parses output. * style: prettier formatting * fix: reconcile conformance results against full scenario list The tier-check CLI was only counting scenarios that produced a checks.json file. Scenarios that crashed or failed to run (e.g., auth scenarios when OAuth is not implemented) were invisible, making the denominator artificially small (e.g., 4/4 instead of 4/23). Now both checkConformance and checkClientConformance reconcile their parsed results against the known scenario lists, adding failure entries for any expected scenario that didn't produce results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: tighten documentation evaluation criteria Clarify what counts as documented vs just having code: - Conformance test servers don't count as docs or examples - Examples without prose = PARTIAL, not PASS - Go Example* test functions explicitly allowed - Clear PASS/PARTIAL/FAIL verdict definitions * docs: add Labels and Spec Tracking rows to audit report templates The executive summary and assessment report were missing two SEP-1730 requirements: label taxonomy compliance and spec tracking (new protocol features timeline). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: reuse ConformanceCheck type from src/types.ts instead of redefining --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 37225ce commit 3b4a92b

22 files changed

+3147
-1139
lines changed
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# MCP SDK Tier Audit
2+
3+
Assess any MCP SDK repository against [SEP-1730](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1730) (the SDK Tiering System). Produces a tier classification (1/2/3) with an evidence-backed scorecard.
4+
5+
Two components work together:
6+
7+
- **`tier-check` CLI** — runs deterministic checks (server + client conformance pass rate, issue triage speed, P0 resolution, labels, releases, policy signals). Works standalone, no AI needed.
8+
- **AI-assisted assessment** — an agent uses the CLI scorecard plus judgment-based evaluation (documentation coverage, dependency policy, roadmap) to produce a full tier report with remediation guide.
9+
10+
## Quick Start: CLI
11+
12+
The CLI is a subcommand of the [MCP Conformance](https://github.com/modelcontextprotocol/conformance) tool.
13+
14+
```bash
15+
# Clone and build
16+
git clone https://github.com/modelcontextprotocol/conformance.git
17+
cd conformance
18+
npm install
19+
npm run build
20+
21+
# Authenticate with GitHub (needed for API access)
22+
gh auth login
23+
24+
# Run against any MCP SDK repo (without conformance tests)
25+
npm run --silent tier-check -- --repo modelcontextprotocol/typescript-sdk --skip-conformance
26+
```
27+
28+
The CLI uses the GitHub API (read-only) for issue metrics, labels, and release checks. Authenticate via one of:
29+
30+
- **GitHub CLI** (recommended): `gh auth login` — the CLI picks up your token automatically
31+
- **Environment variable**: `export GITHUB_TOKEN=ghp_...`
32+
- **Flag**: `--token ghp_...`
33+
34+
For public repos, any authenticated token works (no special scopes needed — authentication just avoids rate limits). For a [fine-grained personal access token](https://github.com/settings/personal-access-tokens/new), select **Public Repositories (read-only)** with no additional permissions.
35+
36+
### CLI Options
37+
38+
```
39+
--repo <owner/repo> GitHub repository (required)
40+
--branch <branch> Branch to check
41+
--skip-conformance Skip conformance tests
42+
--conformance-server-url <url> URL of the already-running conformance server
43+
--client-cmd <cmd> Command to run the SDK conformance client (for client conformance tests)
44+
--days <n> Limit triage analysis to last N days
45+
--output <format> json | markdown | terminal (default: terminal)
46+
--token <token> GitHub token (defaults to GITHUB_TOKEN or gh auth token)
47+
```
48+
49+
### What the CLI Checks
50+
51+
| Check | What it measures |
52+
| ------------------ | ------------------------------------------------------------------------------ |
53+
| Server Conformance | Pass rate of server implementation against the conformance test suite |
54+
| Client Conformance | Pass rate of client implementation against the conformance test suite |
55+
| Labels | Whether SEP-1730 label taxonomy is set up (supports GitHub native issue types) |
56+
| Triage | How quickly issues get labeled after creation |
57+
| P0 Resolution | Whether critical bugs are resolved within SLA |
58+
| Stable Release | Whether a stable release >= 1.0.0 exists |
59+
| Policy Signals | Presence of CHANGELOG, SECURITY, CONTRIBUTING, dependabot, ROADMAP |
60+
| Spec Tracking | Gap between latest spec release and SDK release |
61+
62+
### Example Output
63+
64+
```
65+
Tier Assessment: Tier 2
66+
67+
Repo: modelcontextprotocol/typescript-sdk
68+
Timestamp: 2026-02-10T12:00:00Z
69+
70+
Check Results:
71+
72+
✓ Server Conformance 45/45 (100%)
73+
✓ Client Conformance 4/4 (100%)
74+
✗ Labels 9/12 required labels
75+
Missing: needs confirmation, needs repro, ready for work
76+
✓ Triage 92% within 2BD (150 issues, median 8h)
77+
✓ P0 Resolution 0 open, 3/3 closed within 7d
78+
✓ Stable Release 2.3.1
79+
~ Policy Signals ✓ CHANGELOG.md, ✗ SECURITY.md, ✓ CONTRIBUTING.md, ✓ .github/dependabot.yml, ✗ ROADMAP.md
80+
✓ Spec Tracking 2d gap
81+
```
82+
83+
Use `--output json` to get machine-readable results, or `--output markdown` for a report you can paste into an issue.
84+
85+
## Full AI-Assisted Assessment
86+
87+
The CLI produces a deterministic scorecard, but some SEP-1730 requirements need judgment: documentation quality, dependency policy, roadmap substance. An AI agent can evaluate these by reading the repo.
88+
89+
### Claude Code
90+
91+
The skill lives in `.claude/skills/` in this repo, so if you open [Claude Code](https://docs.anthropic.com/en/docs/claude-code) in the conformance repo it's already available.
92+
93+
1. Make sure `gh auth login` is done (the skill checks this upfront)
94+
2. Start the SDK's everything server in a separate terminal
95+
3. Run the skill:
96+
97+
```
98+
/mcp-sdk-tier-audit <local-sdk-path> <conformance-server-url> [client-cmd]
99+
```
100+
101+
Pass the client command as the third argument to include client conformance testing. If omitted, client conformance is skipped and noted as a gap in the report.
102+
103+
**TypeScript SDK example:**
104+
105+
```bash
106+
# Terminal 1: start the everything server (build first: npm run build)
107+
cd ~/src/mcp/typescript-sdk && npm run test:conformance:server:run
108+
109+
# Terminal 2: run the audit (from the conformance repo)
110+
/mcp-sdk-tier-audit ~/src/mcp/typescript-sdk http://localhost:3000/mcp "npx tsx ~/src/mcp/typescript-sdk/test/conformance/src/everythingClient.ts"
111+
```
112+
113+
**Python SDK example:**
114+
115+
```bash
116+
# Terminal 1: install and start the everything server
117+
cd ~/src/mcp/python-sdk && uv sync --frozen --all-extras --package mcp-everything-server
118+
uv run mcp-everything-server --port 3001
119+
120+
# Terminal 2: run the audit (from the conformance repo)
121+
/mcp-sdk-tier-audit ~/src/mcp/python-sdk http://localhost:3001/mcp "uv run python ~/src/mcp/python-sdk/.github/actions/conformance/client.py"
122+
```
123+
124+
**Go SDK example:**
125+
126+
```bash
127+
# Terminal 1: build and start the everything server
128+
cd ~/src/mcp/go-sdk && go build -o /tmp/go-conformance-server ./conformance/everything-server
129+
go build -o /tmp/go-conformance-client ./conformance/everything-client
130+
/tmp/go-conformance-server -http="localhost:3002"
131+
132+
# Terminal 2: run the audit (from the conformance repo)
133+
/mcp-sdk-tier-audit ~/src/mcp/go-sdk http://localhost:3002 "/tmp/go-conformance-client"
134+
```
135+
136+
**C# SDK example:**
137+
138+
```bash
139+
# Terminal 1: start the everything server (requires .NET SDK)
140+
cd ~/src/mcp/csharp-sdk
141+
dotnet run --project tests/ModelContextProtocol.ConformanceServer --framework net9.0 -- --urls http://localhost:3003
142+
143+
# Terminal 2: run the audit (from the conformance repo)
144+
/mcp-sdk-tier-audit ~/src/mcp/csharp-sdk http://localhost:3003 "dotnet run --project ~/src/mcp/csharp-sdk/tests/ModelContextProtocol.ConformanceClient"
145+
```
146+
147+
The skill derives `owner/repo` from git remote, runs the CLI, launches parallel evaluations for docs and policy, and writes detailed reports to `results/`.
148+
149+
### Any Other AI Coding Agent
150+
151+
If you use a different agent (Codex, Cursor, Aider, OpenCode, etc.), give it these instructions:
152+
153+
1. **Run the CLI** to get the deterministic scorecard:
154+
155+
```bash
156+
node dist/index.js tier-check --repo <repo> --conformance-server-url <url> --output json
157+
```
158+
159+
2. **Evaluate documentation coverage** — check whether MCP features (tools, resources, prompts, sampling, transports, etc.) are documented with examples. See [`references/docs-coverage-prompt.md`](references/docs-coverage-prompt.md) for the full checklist.
160+
161+
3. **Evaluate policies** — check for dependency update policy, roadmap, and versioning/breaking-change policy. See [`references/policy-evaluation-prompt.md`](references/policy-evaluation-prompt.md) for criteria.
162+
163+
4. **Apply tier logic** — combine scorecard + evaluations against the thresholds in [`references/tier-requirements.md`](references/tier-requirements.md).
164+
165+
5. **Generate report** — use [`references/report-template.md`](references/report-template.md) for the output format.
166+
167+
### Manual Review
168+
169+
Run the CLI for the scorecard, then review docs and policies yourself using the tier requirements as a checklist:
170+
171+
| Requirement | Tier 1 | Tier 2 |
172+
| ------------------ | ------------------------------ | ------------------------ |
173+
| Server Conformance | 100% pass | >= 80% pass |
174+
| Client Conformance | 100% pass | >= 80% pass |
175+
| Issue triage | Within 2 business days | Within 1 month |
176+
| P0 resolution | Within 7 days | Within 2 weeks |
177+
| Stable release | >= 1.0.0 with clear versioning | At least one >= 1.0.0 |
178+
| Documentation | All features with examples | Core features documented |
179+
| Dependency policy | Published | Published |
180+
| Roadmap | Published with spec tracking | Plan toward Tier 1 |
181+
182+
## Running Conformance Tests
183+
184+
To include conformance test results, start the SDK's everything server first, then pass the URL to the CLI. To also run client conformance tests, pass `--client-cmd` with the command to launch the SDK's conformance client.
185+
186+
**TypeScript SDK**:
187+
188+
```bash
189+
# Terminal 1: start the server (SDK must be built first)
190+
cd ~/src/mcp/typescript-sdk && npm run build
191+
npm run test:conformance:server:run # starts on port 3000
192+
193+
# Terminal 2: run tier-check (server + client conformance)
194+
npm run --silent tier-check -- \
195+
--repo modelcontextprotocol/typescript-sdk \
196+
--conformance-server-url http://localhost:3000/mcp \
197+
--client-cmd 'npx tsx ~/src/mcp/typescript-sdk/test/conformance/src/everythingClient.ts'
198+
```
199+
200+
**Python SDK**:
201+
202+
```bash
203+
# Terminal 1: install and start the server
204+
cd ~/src/mcp/python-sdk
205+
uv sync --frozen --all-extras --package mcp-everything-server
206+
uv run mcp-everything-server --port 3001 # specify port to avoid conflicts
207+
208+
# Terminal 2: run tier-check (server + client conformance)
209+
npm run --silent tier-check -- \
210+
--repo modelcontextprotocol/python-sdk \
211+
--conformance-server-url http://localhost:3001/mcp \
212+
--client-cmd 'uv run python ~/src/mcp/python-sdk/.github/actions/conformance/client.py'
213+
```
214+
215+
**Go SDK**:
216+
217+
```bash
218+
# Terminal 1: build and start the server
219+
cd ~/src/mcp/go-sdk
220+
go build -o /tmp/go-conformance-server ./conformance/everything-server
221+
go build -o /tmp/go-conformance-client ./conformance/everything-client
222+
/tmp/go-conformance-server -http="localhost:3002"
223+
224+
# Terminal 2: run tier-check (server + client conformance)
225+
npm run --silent tier-check -- \
226+
--repo modelcontextprotocol/go-sdk \
227+
--conformance-server-url http://localhost:3002 \
228+
--client-cmd '/tmp/go-conformance-client'
229+
```
230+
231+
**C# SDK**:
232+
233+
```bash
234+
# Terminal 1: start the server (requires .NET SDK)
235+
cd ~/src/mcp/csharp-sdk
236+
dotnet run --project tests/ModelContextProtocol.ConformanceServer --framework net9.0 -- --urls http://localhost:3003
237+
238+
# Terminal 2: run tier-check (server + client conformance)
239+
npm run --silent tier-check -- \
240+
--repo modelcontextprotocol/csharp-sdk \
241+
--conformance-server-url http://localhost:3003 \
242+
--client-cmd 'dotnet run --project ~/src/mcp/csharp-sdk/tests/ModelContextProtocol.ConformanceClient'
243+
```
244+
245+
**Other SDKs:** Your SDK needs an "everything server" — an HTTP server implementing the [Streamable HTTP transport](https://modelcontextprotocol.io/specification/draft/basic/transports.md) with all MCP features (tools, resources, prompts, etc.). See the implementations above as reference.
246+
247+
Start your everything server, then pass `--conformance-server-url`. Pass `--client-cmd` if your SDK has a conformance client. If neither exists yet, use `--skip-conformance` — the scorecard will note this as a gap.
248+
249+
## Reference Files
250+
251+
These files in [`references/`](references/) contain the detailed criteria and prompts:
252+
253+
| File | Purpose |
254+
| ----------------------------- | ------------------------------------------------------- |
255+
| `tier-requirements.md` | Full SEP-1730 requirements with exact thresholds |
256+
| `docs-coverage-prompt.md` | Feature checklist for documentation evaluation |
257+
| `policy-evaluation-prompt.md` | Criteria for dependency, roadmap, and versioning policy |
258+
| `report-template.md` | Output format for the full audit report |

0 commit comments

Comments
 (0)