Skip to content

Commit 39f34b2

Browse files
mikekistlertarekghTarek Mahmoud Sayedpcarleton
authored
Add conformance tests for SEP-2243 HTTP Standardization (#259)
* Add prepare script for git dependency installs * Add conformance tests for SEP-2243 HTTP Standardization * Add traceability file for SEP-2243 * Fix prettier formatting in http-standard-headers.ts * Improve SEP-2243 conformance test coverage Close gaps in HTTP header conformance scenarios: Client standard headers (http-standard-headers.ts): - Enforce all expected methods in checks, failing any that were not observed - Track Mcp-Name header per-method separately from Mcp-Method - Advertise resources and prompts capabilities so clients exercise those endpoints - Add a prompt entry for prompts/get testing Client custom headers (http-custom-headers.ts): - Add Base64 encoding checks for non-ASCII, whitespace, and control-char values - Add null/omitted parameter test (second tool call with null value) - Add client-keeps-valid-tool check verifying clients still call valid tools after filtering out invalid ones Server header validation (server/http-standard-headers.ts): - Replace fetch-based sendRawRequest with http.request to preserve exact header casing on the wire (fetch/Headers lowercases names) - Compute defaultArgs and defaultHeaders from tool schema so test requests satisfy all required parameters while varying only the param under test Traceability (sep-2243.yaml): - Add 7 new spec-to-check mappings (nonascii, whitespace, controlchar, keeps-valid-tool, literal-missing-base64-prefix/suffix, no-mirror-unannotated) - Fix 2 incorrect existing mappings * Fill SEP-2243 conformance test coverage gaps Add missing test cases identified by comparing the branch against the SEP-2243 spec's Conformance Test Cases section: - Mcp-Method on notifications: add notifications/initialized to expectedMethods so clients are checked for header on notifications - Boolean true: add debug=true param to complement verbose=false - Leading-space-only and trailing-space-only: separate checks for Base64 encoding when only one edge has whitespace - Internal spaces only: verify plain ASCII (no Base64) for values like 'us west 1' that have spaces only in the middle - CRLF string: add line1\r\nline2 test (complements existing \n test) - Leading tab: add \tindented test for tab-triggered Base64 encoding - Server rejects invalid Mcp-Param chars: mark as excluded in YAML since HTTP itself prevents these characters in headers Register all new checks in sep-2243.yaml. * Remove case-insensitive =?base64? prefix test The spec states header values are case-sensitive (RFC 9110). The =?base64? prefix is part of the header value, not the header name, so it should be matched case-sensitively. This was identified as a bug in the conformance tests per feedback on SEP-2243. Changes: - Remove server-accepts-case-insensitive-base64 check from server validation scenario - Change base64 prefix regex from case-insensitive (/i) to case-sensitive in client header validation - Remove corresponding entry from sep-2243.yaml * Remove nested x-mcp-header rejection test The spec constrains x-mcp-header to primitive types but does not restrict it to top-level properties only. A nested string property with x-mcp-header is valid per the spec. This was confirmed by mikekistler in SEP-2243 PR discussion. Removes invalid_nested_header from the invalid tool definitions list and its corresponding sep-2243.yaml entry. * Use numeric comparison for number header values Compare number header values numerically instead of as strings to accommodate cross-SDK floating point representation differences (e.g., '42' vs '42.0'). For integers, exact numeric match is required. For decimals, a tolerance of 1e-9 is allowed. See SEP-2243 discussion on number precision. * client/http-standard-headers: SKIPPED (not FAILURE) for unexercised methods; make getChecks() idempotent A client that never calls prompts/list isn't violating SEP-2243 — the spec sentence is 'The client MUST include the standard MCP request headers on each POST request', and a request that was never sent can't fail it. The Mcp-Method requirement is already proven by initialize/tools/list etc., so there's nothing unique to prompts/list. SKIPPED keeps the gap visible without a false red. Separately: getChecks() was pushing into this.checks on every call. The runner may call it more than once (e.g. progress + final report); that produced duplicate rows. Build a fresh array per call instead. * client/http-standard-headers: de-dup guard in checkMcpNameHeader checkMcpMethodHeader already has 'if (this.methodHeaderChecks.has(method)) return'. checkMcpNameHeader was missing the equivalent. The test server advertises two tools and two resources; a client that calls both produces two 'client-mcp-name-header-tools-call' / '...-resources-read' rows. * client/http-standard-headers: replace -> /\//g for slug generation String.replace with a string arg only replaces the first occurrence. 'notifications/initialized' is fine (one slash) but a method like 'notifications/resources/updated' would yield 'notifications-resources/updated' in the check id. (replaceAll isn't in this project's TS lib target.) * client/http-custom-headers: base64 regex (.+) -> (.*) so empty string round-trips The spec doesn't forbid base64-encoding values that don't strictly need it. A client that always wraps would send '=?base64??=' for empty_val: ''. The '.+' wouldn't match, so the harness would compare '=?base64??=' === '' literally and FAIL a compliant client. '(.*)' lets the empty payload decode to '' and match. * client/http-custom-headers: emit SUCCESS for no-mirror-unannotated This check only pushed on FAILURE; a correct client produced no row at all, so the pass was invisible in reports and couldn't be counted toward coverage. Same-slug SUCCESS/FAILURE pair is the repo convention. * client/http-custom-headers: drop redundant optional-present check checkParamHeader('Verbose', ...) already pushes a FAILURE when the header is missing. The explicit 'optional-present' block re-tests the same condition under a second check id, doubling the row. * server/http-standard-headers: defaultArgs must use real number/boolean, not strings defaultArgs feeds the JSON-RPC body. With Record<string, string> + '0'/'false', a server that validates inputSchema rejects on type mismatch — which is also a 400. Every header-rejection check then false-PASSes (server rejected for the wrong reason), and server-accepts-valid-base64 false-FAILs a compliant server because it never gets past schema validation. Also handle 'integer' (JSON Schema distinguishes it from 'number'). * server/http-standard-headers: createAcceptanceCheck must also assert no JSON-RPC error in body A server can return HTTP 200 with {error: {code: -32001, ...}} in the body. Status-only acceptance lets that through as a pass. * use DRAFT_PROTOCOL_VERSION constant instead of 'DRAFT-2026-v1' literal The literal will rot when the draft cycle rolls over. types.ts already exports DRAFT_PROTOCOL_VERSION for this. * sep-2243.yaml: spec-first rewrite + move to src/seps/ Regenerated from the SEP-2243 spec diff (transports.mdx + tools.mdx) rather than from the scenario implementations: 21 check rows + 4 excluded vs the previous 46 + 2. Differences: - one check id per spec sentence (test variants go in 'details', not new ids) - 'text:' quotes the spec sentence verbatim instead of paraphrasing - '-reject-status' (MUST 400) split from '-reject-error-code' (SHOULD -32001 for standard headers, MUST for custom) - rows with no spec backing dropped (whitespace, base64-padding/chars, prefix/ suffix-literal, control-char-name) - two more SHOULD excludes (log-warning, no-sensitive-params) - check ids use sep-2243-<slug> convention - check: key first, excludes grouped at bottom (matches sep-2164) Moved to src/seps/ to match #272 layout (this branch predates it; will reconcile cleanly on rebase). * add negative test for HttpStandardHeadersScenario Self-contained vitest that POSTs initialize with and without Mcp-Method, asserts FAILURE/SUCCESS on the pinned check id, and proves getChecks() is idempotent. No example client/server file needed for this one because the scenario *is* the server — the test crafts the bad request directly. This is the negative-test half of AGENTS.md §Testing your scenario; the everything-client passing-example half is still TODO. * server/http-standard-headers: severity fixes for whitespace + malformed-base64 tests These five tests assert behavior the SEP-2243 text doesn't pin down. Kept because they're useful consistency signals across SDKs, but adjusted so they don't FAIL spec-compliant servers: - server-accepts-whitespace-header-value: this IS a MUST, but per RFC 9110 §5.5 (field parsing MUST exclude OWS), not SEP-2243. Kept as FAILURE, specReference now cites RFC 9110. - server-rejects-invalid-base64-padding / -chars: downgraded to INFO. SEP-2243 says only 'MUST decode them accordingly' without picking RFC 4648 strict vs lenient. Node's Buffer.from() accepts unpadded/dirty input and decodes to a matching value (server accepts); .NET's Convert.FromBase64String throws (server rejects). Either is currently compliant. WARNING would force Tier-1 SDKs into non-default strict decoding (#245); INFO records the behavior without prescribing it, so cross-SDK divergence is visible without tier impact. - server-literal-missing-base64-prefix / -suffix: unchanged. The wrapper syntax is '=?base64?{x}?=' complete; treating partial wrappers as literal is the natural reading. Plumbing: createRejectionCheck gains an optional failSeverity param; testBase64Case gains a 'reject-info' mode. (This is also the hook for the larger 400-MUST / -32001-SHOULD split that still needs doing.) * server/http-standard-headers: split rejection check into status (MUST) + error-code (SHOULD/MUST) SEP-2243 §Server Validation: 'servers MUST return HTTP status 400 ... and SHOULD include a JSON-RPC error response using ... -32001'. So a server that returns 400 with -32600 (or no error body) is compliant for standard headers. The single conflated check FAILed it. For custom headers (§Server Behavior for Custom Headers) -32001 IS a MUST, so both halves stay FAILURE there. createRejectionCheck → createRejectionChecks returning two rows: '<id>' for the 400 status and '<id>-error-code' for -32001. testCase (standard) sets errorCodeSeverity=WARNING; testBase64Case/testMissingCustomHeader (custom) keep FAILURE; the two malformed-base64 INFO probes set both halves to INFO. * rename check ids to sep-2243-* prefix; kebab-case throughout Aligns with the sep-2243.yaml traceability ids and the repo convention (sep-2164-*, sep-2207-*). Variant suffixes are kept (per-method, per-tool) so per-case visibility isn't lost; the YAML's row id is a prefix of the emitted ids, which is fine — extra ids not in the YAML are allowed. Also fixes the mixed hyphen/underscore in reject-invalid-tool ids (toolName has underscores; replace to hyphens). * client: extract BaseHttpScenario to client/http-base.ts; both client scenarios extend it http-standard-headers.ts had its own copy of the start/stop/handleRequest boilerplate (~100 lines) that http-custom-headers.ts already abstracted as BaseHttpScenario. Moved the abstract class to a shared file; both scenario files now import it. HttpStandardHeadersScenario implements handlePost() instead of handleRequest(); its handleInitialize() uses the base sendInitialize() with the resources/prompts capability flags it needs. Net: 465 → 361 lines for http-standard-headers.ts; the third inline copy of the same HTTP-server scaffold is gone. * use req.setEncoding('utf8') instead of per-chunk Buffer.toString() Per-chunk .toString() corrupts a multi-byte UTF-8 char that straddles a TCP chunk boundary (e.g. 0xC3 | 0xBC decodes to two replacement chars instead of 'ü'). The custom-headers scenario sends '日本語'/'naïve' in the body, so this is reachable in principle. setEncoding('utf8') makes 'data' emit strings with boundary handling done by Node's StringDecoder. Fixed in both places: BaseHttpScenario.handleRequest (request body) and sendRawRequest (response body). * fix: migrate SEP-2243 scenarios from specVersions to source (introducedIn/removedIn) The introducedIn/removedIn refactor (#265) replaced the specVersions array with a source property on all scenario interfaces. The SEP-2243 scenarios still used the old specVersions field, causing matchesSpecVersion() to receive undefined and crash in spec-version.test.ts. Replace specVersions with source = { introducedIn: DRAFT_PROTOCOL_VERSION } in BaseHttpScenario and the two server ClientScenario classes. Remove the now-unused SpecVersion imports. * server/http-standard-headers: malformed-base64 padding/chars back to FAILURE Per discussion on #259: the SEP-2243 conformance-test-case table is the approved source of truth and lists these as reject cases, so they're FAILURE-level even though the spec body itself only says 'MUST decode them accordingly'. SDKs whose stdlib base64 is lenient (Node, browsers) will need to validate before decoding; if that's burdensome we'll revisit. Removes the now-unused 'reject-info' mode and statusSeverity/INFO plumbing introduced for these two probes. --------- Co-authored-by: Tarek Mahmoud Sayed <tarekms@microsoft.com> Co-authored-by: Tarek Mahmoud Sayed <tarekms@ntdev.microsoft.com> Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com> Co-authored-by: Paul Carleton <paulc@anthropic.com>
1 parent acc70de commit 39f34b2

8 files changed

Lines changed: 2490 additions & 4 deletions

File tree

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@
2121
"tier-check": "node dist/index.js tier-check",
2222
"check": "npm run typecheck && npm run lint",
2323
"typecheck": "tsgo --noEmit",
24-
"prepack": "npm run build"
24+
"prepack": "npm run build",
25+
"prepare": "npm run build"
2526
},
2627
"files": [
2728
"dist"

src/scenarios/client/http-base.ts

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
/**
2+
* Shared HTTP test-server scaffold for client-under-test SEP-2243 scenarios.
3+
*
4+
* A scenario that needs to act as a Streamable-HTTP MCP server, inspect
5+
* incoming client requests, and emit ConformanceChecks should extend this
6+
* class and implement handlePost() + getChecks(). start()/stop() and the
7+
* GET/DELETE/body-parse boilerplate are handled here.
8+
*/
9+
10+
import http from 'http';
11+
import {
12+
Scenario,
13+
ScenarioUrls,
14+
ConformanceCheck,
15+
ScenarioSource,
16+
DRAFT_PROTOCOL_VERSION
17+
} from '../../types.js';
18+
19+
export abstract class BaseHttpScenario implements Scenario {
20+
abstract name: string;
21+
abstract description: string;
22+
readonly source: ScenarioSource = { introducedIn: DRAFT_PROTOCOL_VERSION };
23+
allowClientError?: boolean;
24+
25+
protected server: http.Server | null = null;
26+
protected checks: ConformanceCheck[] = [];
27+
protected port: number = 0;
28+
protected sessionId: string = `session-${Date.now()}`;
29+
30+
async start(): Promise<ScenarioUrls> {
31+
return new Promise((resolve, reject) => {
32+
this.server = http.createServer((req, res) => {
33+
this.handleRequest(req, res);
34+
});
35+
this.server.on('error', reject);
36+
this.server.listen(0, () => {
37+
const address = this.server!.address();
38+
if (address && typeof address === 'object') {
39+
this.port = address.port;
40+
resolve({ serverUrl: `http://localhost:${this.port}` });
41+
} else {
42+
reject(new Error('Failed to get server address'));
43+
}
44+
});
45+
});
46+
}
47+
48+
async stop(): Promise<void> {
49+
return new Promise((resolve, reject) => {
50+
if (this.server) {
51+
this.server.close((err) => {
52+
if (err) reject(err);
53+
else {
54+
this.server = null;
55+
resolve();
56+
}
57+
});
58+
} else {
59+
resolve();
60+
}
61+
});
62+
}
63+
64+
abstract getChecks(): ConformanceCheck[];
65+
66+
protected handleRequest(
67+
req: http.IncomingMessage,
68+
res: http.ServerResponse
69+
): void {
70+
if (req.method === 'GET') {
71+
res.writeHead(200, {
72+
'Content-Type': 'text/event-stream',
73+
'Cache-Control': 'no-cache',
74+
Connection: 'keep-alive',
75+
'mcp-session-id': this.sessionId
76+
});
77+
res.write('data: \n\n');
78+
return;
79+
}
80+
if (req.method === 'DELETE') {
81+
res.writeHead(200);
82+
res.end();
83+
return;
84+
}
85+
if (req.method !== 'POST') {
86+
res.writeHead(405);
87+
res.end('Method Not Allowed');
88+
return;
89+
}
90+
91+
// Decode the stream as UTF-8 so multi-byte characters that straddle a
92+
// chunk boundary aren't corrupted by per-chunk Buffer.toString().
93+
req.setEncoding('utf8');
94+
let body = '';
95+
req.on('data', (chunk) => {
96+
body += chunk;
97+
});
98+
req.on('end', () => {
99+
try {
100+
const request = JSON.parse(body);
101+
this.handlePost(req, res, request);
102+
} catch (error) {
103+
res.writeHead(400, { 'Content-Type': 'application/json' });
104+
res.end(
105+
JSON.stringify({
106+
jsonrpc: '2.0',
107+
error: { code: -32700, message: `Parse error: ${error}` }
108+
})
109+
);
110+
}
111+
});
112+
}
113+
114+
protected abstract handlePost(
115+
req: http.IncomingMessage,
116+
res: http.ServerResponse,
117+
request: any
118+
): void;
119+
120+
protected sendJson(res: http.ServerResponse, body: object): void {
121+
res.writeHead(200, {
122+
'Content-Type': 'application/json',
123+
'mcp-session-id': this.sessionId
124+
});
125+
res.end(JSON.stringify(body));
126+
}
127+
128+
protected sendInitialize(
129+
res: http.ServerResponse,
130+
request: any,
131+
capabilities: object = { tools: {} }
132+
): void {
133+
this.sendJson(res, {
134+
jsonrpc: '2.0',
135+
id: request.id,
136+
result: {
137+
protocolVersion: DRAFT_PROTOCOL_VERSION,
138+
serverInfo: { name: this.name + '-server', version: '1.0.0' },
139+
capabilities
140+
}
141+
});
142+
}
143+
144+
protected sendNotificationAck(res: http.ServerResponse): void {
145+
res.writeHead(202);
146+
res.end();
147+
}
148+
149+
protected sendGenericResult(res: http.ServerResponse, request: any): void {
150+
this.sendJson(res, {
151+
jsonrpc: '2.0',
152+
id: request.id,
153+
result: {}
154+
});
155+
}
156+
}

0 commit comments

Comments
 (0)