Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 24 additions & 12 deletions .claude/skills/mcp-sdk-tier-audit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,26 +38,38 @@ For public repos, any authenticated token works (no special scopes needed — au
```
--repo <owner/repo> GitHub repository (required)
--branch <branch> Branch to check
--skip-conformance Skip conformance tests
--skip-conformance Skip conformance tests (server and client)
--skip-server-conformance Skip only the server conformance test suite
--skip-client-conformance Skip only the client conformance test suite
--skip-repo-health Skip all GitHub-backed repo-health checks
--conformance-server-url <url> URL of the already-running conformance server
--client-cmd <cmd> Command to run the SDK conformance client (for client conformance tests)
--days <n> Limit triage analysis to last N days
--output <format> json | markdown | terminal (default: terminal)
--token <token> GitHub token (defaults to GITHUB_TOKEN or gh auth token)
```

When any check is skipped — either via `--skip-*` flags or because `--conformance-server-url` / `--client-cmd` were omitted — the run is **partial**: the scorecard's `partial_run` field is `true` and the tier classification is suppressed (shown as `N/A (partial run)`). Skipped checks appear as `status: "skipped"` in JSON output and as `○ skipped` in the human-readable formats.

### What the CLI Checks

| Check | What it measures |
| ------------------ | ------------------------------------------------------------------------------ |
| Server Conformance | Pass rate of server implementation against the conformance test suite |
| Client Conformance | Pass rate of client implementation against the conformance test suite |
| Labels | Whether SEP-1730 label taxonomy is set up (supports GitHub native issue types) |
| Triage | How quickly issues get labeled after creation |
| P0 Resolution | Whether critical bugs are resolved within SLA |
| Stable Release | Whether a stable release >= 1.0.0 exists |
| Policy Signals | Presence of CHANGELOG, SECURITY, CONTRIBUTING, dependabot, ROADMAP |
| Spec Tracking | Gap between latest spec release and SDK release |
#### Conformance Tests

| Check | What it measures |
| ------------------ | --------------------------------------------------------------------- |
| Server Conformance | Pass rate of server implementation against the conformance test suite |
| Client Conformance | Pass rate of client implementation against the conformance test suite |

#### Repository Health

| Check | What it measures |
| -------------- | ------------------------------------------------------------------------------ |
| Labels | Whether SEP-1730 label taxonomy is set up (supports GitHub native issue types) |
| Triage | How quickly issues get labeled after creation |
| P0 Resolution | Whether critical bugs are resolved within SLA |
| Stable Release | Whether a stable release >= 1.0.0 exists |
| Policy Signals | Presence of CHANGELOG, SECURITY, CONTRIBUTING, dependabot, ROADMAP |
| Spec Tracking | Gap between latest spec release and SDK release |

### Example Output

Expand Down Expand Up @@ -144,7 +156,7 @@ dotnet run --project tests/ModelContextProtocol.ConformanceServer --framework ne
/mcp-sdk-tier-audit ~/src/mcp/csharp-sdk http://localhost:3003 "dotnet run --project ~/src/mcp/csharp-sdk/tests/ModelContextProtocol.ConformanceClient"
```

The skill derives `owner/repo` from git remote, runs the CLI, launches parallel evaluations for docs and policy, and writes detailed reports to `results/`.
The skill derives `owner/repo` from git remote, runs the CLI, launches parallel evaluations for docs and policy, and writes detailed reports to `results/`. For scoped runs, `--skip-repo-health` also suppresses the AI policy review so repo-health behavior stays aligned end-to-end; `--skip-docs-eval` only suppresses the documentation evaluation.

### Any Other AI Coding Agent

Expand Down
47 changes: 38 additions & 9 deletions .claude/skills/mcp-sdk-tier-audit/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: >-
Produces tier classification (1/2/3) with evidence table, gap list, and
remediation guide. Works for any official MCP SDK (TypeScript, Python, Go,
C#, Java, Kotlin, PHP, Swift, Rust, Ruby).
argument-hint: '<local-path> <conformance-server-url> [client-cmd] [--branch <branch>]'
argument-hint: '<local-path> <conformance-server-url> [client-cmd] [--branch <branch>] [--scope <scope>] [--skip-*]'
---

# MCP SDK Tier Audit
Expand Down Expand Up @@ -44,6 +44,18 @@ Extract from the user's input:
- **conformance-server-url**: URL where the SDK's everything server is already running (e.g. `http://localhost:3000/mcp`)
- **client-cmd** (optional): command to run the SDK's conformance client (e.g. `npx tsx test/conformance/src/everythingClient.ts`). If not provided, client conformance tests are skipped and noted as a gap in the report.
- **branch** (optional): Git branch to check on GitHub (e.g. `--branch fweinberger/v1x-governance-docs`). If not provided, derive from the local checkout's current branch: `cd <local-path> && git rev-parse --abbrev-ref HEAD`. This is passed to the tier-check CLI so that policy signal file checks use the correct branch instead of the repo's default branch.
- **scope** (optional): one of `server`, `client`, `conformance`, `repo-health`, or `full` (default). A partial scope runs only a subset of the audit and **suppresses tier classification** — the report will show "Tier: N/A (partial run)". Scope presets expand to skip flags:
- `--scope server` → `--skip-client-conformance --skip-repo-health --skip-docs-eval`
- `--scope client` → `--skip-server-conformance --skip-repo-health --skip-docs-eval`
- `--scope conformance` → `--skip-repo-health --skip-docs-eval`
- `--scope repo-health` → `--skip-conformance --skip-docs-eval`
- `--scope full` → default, no skips
- **individual skip flags** (optional; override/augment `--scope` if both supplied):
- `--skip-server-conformance`, `--skip-client-conformance`, `--skip-conformance`
- `--skip-repo-health` (skips all repo-health work: the CLI's GitHub-backed checks plus the AI policy evaluation in Step 3)
- `--skip-docs-eval` (skips Evaluation 1 in Step 3)

Note: repo health is treated as a single concern across the deterministic CLI checks and the AI-assisted policy review. `--skip-repo-health` skips both of those surfaces. `--skip-docs-eval` remains the only separate evaluation skip.

The first two arguments are required. If either is missing, ask the user to provide it.

Expand All @@ -63,10 +75,15 @@ npm run --silent tier-check -- \
--branch <branch> \
--conformance-server-url <conformance-server-url> \
--client-cmd '<client-cmd>' \
--output json
--output json \
[--skip-server-conformance] \
[--skip-client-conformance] \
[--skip-repo-health]
```

If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped). The `--branch` flag should always be included (derived from the local checkout if not explicitly provided).
Forward whichever `--skip-*` flags the user passed (or that the `--scope` preset resolved to). If no scope/skip flags were supplied, run the CLI without them for a full assessment.

If no client-cmd was detected, omit the `--client-cmd` flag. Client conformance will be skipped, and because the run is no longer complete, the final result should be treated as a partial run (`Tier: N/A (partial run)`). The `--branch` flag should always be included (derived from the local checkout if not explicitly provided).

The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.

Expand All @@ -84,12 +101,14 @@ If found, read the file. It lists known/expected conformance failures. This cont

## Step 3: Launch Parallel Evaluations

Launch 2 evaluations in parallel. Each reads the SDK from the local checkout path.
Launch up to 2 evaluations in parallel. Each reads the SDK from the local checkout path.

**IMPORTANT**: Launch both evaluations at the same time (in the same response) so they run in parallel.
**IMPORTANT**: Launch both evaluations at the same time (in the same response) so they run in parallel. If documentation coverage is skipped via `--skip-docs-eval`, or repo-health work is skipped via `--skip-repo-health` (or via a `--scope` preset), omit the relevant evaluation and treat its section in the report as "Not run — excluded by scope".

### Evaluation 1: Documentation Coverage

**Skip if** `--skip-docs-eval` is set.

Use the prompt from `references/docs-coverage-prompt.md`. Pass the local path.

This evaluation checks:
Expand All @@ -100,6 +119,8 @@ This evaluation checks:

### Evaluation 2: Policy Evaluation

**Skip if** `--skip-repo-health` is set. Repo health is intentionally aligned here: if the repo-health checks are excluded from the CLI run, the AI policy evaluation is excluded too.

Use the prompt from `references/policy-evaluation-prompt.md`. Pass the local path, the derived `owner/repo`, and the `policy_signals` section from the CLI JSON output.

The CLI has already checked which policy files exist (ROADMAP.md, DEPENDENCY_POLICY.md, dependabot.yml, VERSIONING.md, etc.). The AI evaluation reads only the files the CLI found to judge whether the content is substantive — it does NOT search for files in other locations.
Expand All @@ -113,7 +134,9 @@ This evaluation checks:

## Step 4: Compute Final Tier

Combine the deterministic scorecard (from the CLI) with the evaluation results (docs, policies). Apply the tier logic:
**If the run is partial** (any `--scope` other than `full`, or any individual `--skip-*` flag is set, or the CLI JSON has `partial_run: true`): **do not classify a tier**. Report the result as "Tier: N/A (partial run)" and list which checks ran vs. which were skipped. Do not output Tier 1/2 blockers for skipped checks — the skipped list already conveys that information. Still present the full table for the sections that did run so the user gets signal on those.

Otherwise (full run): combine the deterministic scorecard (from the CLI) with the evaluation results (docs, policies). Apply the tier logic:

### Tier 1 requires ALL of:

Expand Down Expand Up @@ -145,7 +168,7 @@ If any Tier 2 requirement is not met, the SDK is Tier 3.
**Important edge cases:**

- If GitHub issue labels are not set up per SEP-1730, triage metrics cannot be computed. Note this as a gap. However, repos may use GitHub's native issue types instead of type labels — the CLI checks for both.
- If client conformance was skipped (no client command found), note this as a gap but do not block tier advancement based on it alone.
- If client conformance was skipped (no client command found), note this as a gap and keep the overall result as a partial run rather than assigning a definitive tier.

**Conformance Breakdown:**

Expand Down Expand Up @@ -182,7 +205,7 @@ When evaluating P0 metrics, flag potentially mislabeled P0 issues:

## Step 5: Generate Output

Write detailed reports to files using subagents, then show a concise summary to the user.
Write detailed reports to files using subagents, then show a concise summary to the user. **Always write both files**, even for partial runs — skipped sections should render as `○ skipped` / "Not run — excluded by scope (`--skip-…` / `--scope <name>`)" so the file shape stays stable. Include a `**Scope**` line in the header of each report (e.g., `Scope: partial run (server conformance only)` or `Scope: full`).

### Output files (write via subagents)

Expand All @@ -208,7 +231,7 @@ Pass all the gathered data to a subagent and instruct it to write the remediatio

### Console output (shown to the user)

After the subagents finish, output a short executive summary directly to the user:
After the subagents finish, output a short executive summary directly to the user. For **partial runs**, replace the `## <sdk-name> — Tier <X>` header with `## <sdk-name> — Tier N/A (partial run)`, list the skipped sections directly below, and render rows for skipped checks as `○ skipped` in the tables (no ✓/✗ for T2/T1 columns on those rows). If you include a legend, state that `○` means "skipped". Omit the "For Tier 2" / "For Tier 1" sections entirely on partial runs — a partial run cannot definitively identify tier gaps.

```
## <sdk-name> — Tier <X>
Expand Down Expand Up @@ -304,4 +327,10 @@ Read these reference files when you need the detailed content for evaluation pro

# Any SDK — server conformance only (no client)
/mcp-sdk-tier-audit ~/src/mcp/some-sdk http://localhost:3004

# Scope presets — run only a subset of the audit
/mcp-sdk-tier-audit ~/src/mcp/typescript-sdk http://localhost:3000/mcp --scope server
/mcp-sdk-tier-audit ~/src/mcp/typescript-sdk http://localhost:3000/mcp "<client-cmd>" --scope client
/mcp-sdk-tier-audit ~/src/mcp/typescript-sdk http://localhost:3000/mcp "<client-cmd>" --scope conformance
/mcp-sdk-tier-audit ~/src/mcp/typescript-sdk --scope repo-health
```
12 changes: 9 additions & 3 deletions .claude/skills/mcp-sdk-tier-audit/references/report-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,17 @@ Write two files to `results/` in the conformance repo:

**Date**: {date}
**Branch**: {branch}
**Scope**: {full | partial run — skipped: <list>}
**Auditor**: mcp-sdk-tier-audit skill (automated + subagent evaluation)

## Tier Assessment: Tier {X}
## Tier Assessment: {Tier X | N/A (partial run)}

{Brief 1-2 sentence summary of the overall assessment and key factors.}
{Brief 1-2 sentence summary of the overall assessment and key factors. For partial runs: state which scopes ran and note that no definitive tier is assigned.}

### Requirements Summary

For partial runs, rows corresponding to skipped checks should render with `Current Value` = `○ skipped (excluded by scope)` and both `T1?` / `T2?` = `N/A`. Do not mark skipped rows as PASS or FAIL.

| # | Requirement | Tier 1 Standard | Tier 2 Standard | Current Value | T1? | T2? | Gap |
| --- | ----------------------- | --------------------------------- | ---------------------------- | --------------------------------- | ----------- | ----------- | ------------------ |
| 1a | Server Conformance | 100% pass rate | >= 80% pass rate | {X}% ({passed}/{total}) | {PASS/FAIL} | {PASS/FAIL} | {detail or "None"} |
Expand Down Expand Up @@ -108,7 +111,10 @@ Labels: {present/missing list}
# Remediation Guide: {repo}

**Date**: {date}
**Current Tier**: {X}
**Scope**: {full | partial run — skipped: <list>}
**Current Tier**: {X | N/A (partial run)}

> For **partial runs**, omit the "Path to Tier 2" and "Path to Tier 1" sections (a partial run cannot identify all tier gaps). Instead, include a single section titled **Findings from scoped run** that lists remediation items only for the sections that were actually run, and a note that a full audit is required to enumerate remaining tier gaps.

## Path to Tier 2

Expand Down
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,17 @@ npm run --silent tier-check -- --repo modelcontextprotocol/typescript-sdk --skip
npm run --silent tier-check -- \
--repo modelcontextprotocol/typescript-sdk \
--conformance-server-url http://localhost:3000/mcp

# Partial runs (tier classification is suppressed — reported as "N/A (partial run)"):
# --skip-server-conformance / --skip-client-conformance target one conformance suite
# --skip-conformance skip both conformance suites
# --skip-repo-health skip all GitHub-backed checks
# Example: run only client conformance
npm run --silent tier-check -- \
--repo modelcontextprotocol/typescript-sdk \
--conformance-server-url http://localhost:3000/mcp \
--client-cmd '<client-cmd>' \
--skip-server-conformance --skip-repo-health
```

For a full AI-assisted assessment with remediation guide, use Claude Code:
Expand Down
69 changes: 69 additions & 0 deletions src/tier-check/checks/skipped.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import {
ConformanceResult,
LabelsResult,
TriageResult,
P0Result,
ReleaseResult,
PolicySignalsResult,
SpecTrackingResult
} from '../types';

export const skippedConformance = (): ConformanceResult => ({
status: 'skipped',
pass_rate: 0,
passed: 0,
failed: 0,
total: 0,
details: []
});

export const skippedLabels = (): LabelsResult => ({
status: 'skipped',
present: 0,
required: 0,
missing: [],
found: [],
uses_issue_types: false
});

export const skippedTriage = (): TriageResult => ({
status: 'skipped',
compliance_rate: 0,
total_issues: 0,
triaged_within_sla: 0,
exceeding_sla: 0,
median_hours: 0,
p95_hours: 0,
days_analyzed: undefined
});

export const skippedP0 = (): P0Result => ({
status: 'skipped',
open_p0s: 0,
open_p0_details: [],
closed_within_7d: 0,
closed_within_14d: 0,
closed_total: 0,
all_p0s_resolved_within_7d: false,
all_p0s_resolved_within_14d: false
});

export const skippedRelease = (): ReleaseResult => ({
status: 'skipped',
version: null,
is_stable: false,
is_prerelease: false
});

export const skippedPolicySignals = (): PolicySignalsResult => ({
status: 'skipped',
files: {}
});

export const skippedSpecTracking = (): SpecTrackingResult => ({
status: 'skipped',
latest_spec_release: null,
latest_sdk_release: null,
sdk_release_within_30d: null,
days_gap: null
});
35 changes: 35 additions & 0 deletions src/tier-check/index.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import { describe, expect, it } from 'vitest';
import { resolveTierCheckPlan } from './index';

describe('resolveTierCheckPlan', () => {
it('treats omitted conformance inputs as skipped', () => {
const plan = resolveTierCheckPlan({});

expect(plan.runServer).toBe(false);
expect(plan.runClient).toBe(false);
expect(plan.serverSkipReason).toBe('no --conformance-server-url');
expect(plan.clientSkipReason).toBe('no --client-cmd');
expect(plan.nothingToRun).toBe(false);
});

it('reports nothing to run when repo health is also skipped', () => {
const plan = resolveTierCheckPlan({ skipRepoHealth: true });

expect(plan.runServer).toBe(false);
expect(plan.runClient).toBe(false);
expect(plan.nothingToRun).toBe(true);
});

it('respects explicit scope exclusions even when inputs are present', () => {
const plan = resolveTierCheckPlan({
conformanceServerUrl: 'http://localhost:3000/mcp',
clientCmd: 'npm run conformance:client',
skipClientConformance: true
});

expect(plan.runServer).toBe(true);
expect(plan.runClient).toBe(false);
expect(plan.clientSkipReason).toBe('excluded by scope');
expect(plan.nothingToRun).toBe(false);
});
});
Loading
Loading