Commit 39f34b2
Add conformance tests for SEP-2243 HTTP Standardization (#259)
* Add prepare script for git dependency installs
* Add conformance tests for SEP-2243 HTTP Standardization
* Add traceability file for SEP-2243
* Fix prettier formatting in http-standard-headers.ts
* Improve SEP-2243 conformance test coverage
Close gaps in HTTP header conformance scenarios:
Client standard headers (http-standard-headers.ts):
- Enforce all expected methods in checks, failing any that were not observed
- Track Mcp-Name header per-method separately from Mcp-Method
- Advertise resources and prompts capabilities so clients exercise those endpoints
- Add a prompt entry for prompts/get testing
Client custom headers (http-custom-headers.ts):
- Add Base64 encoding checks for non-ASCII, whitespace, and control-char values
- Add null/omitted parameter test (second tool call with null value)
- Add client-keeps-valid-tool check verifying clients still call valid tools
after filtering out invalid ones
Server header validation (server/http-standard-headers.ts):
- Replace fetch-based sendRawRequest with http.request to preserve exact
header casing on the wire (fetch/Headers lowercases names)
- Compute defaultArgs and defaultHeaders from tool schema so test requests
satisfy all required parameters while varying only the param under test
Traceability (sep-2243.yaml):
- Add 7 new spec-to-check mappings (nonascii, whitespace, controlchar,
keeps-valid-tool, literal-missing-base64-prefix/suffix, no-mirror-unannotated)
- Fix 2 incorrect existing mappings
* Fill SEP-2243 conformance test coverage gaps
Add missing test cases identified by comparing the branch against the
SEP-2243 spec's Conformance Test Cases section:
- Mcp-Method on notifications: add notifications/initialized to
expectedMethods so clients are checked for header on notifications
- Boolean true: add debug=true param to complement verbose=false
- Leading-space-only and trailing-space-only: separate checks for
Base64 encoding when only one edge has whitespace
- Internal spaces only: verify plain ASCII (no Base64) for values
like 'us west 1' that have spaces only in the middle
- CRLF string: add line1\r\nline2 test (complements existing \n test)
- Leading tab: add \tindented test for tab-triggered Base64 encoding
- Server rejects invalid Mcp-Param chars: mark as excluded in YAML
since HTTP itself prevents these characters in headers
Register all new checks in sep-2243.yaml.
* Remove case-insensitive =?base64? prefix test
The spec states header values are case-sensitive (RFC 9110). The
=?base64? prefix is part of the header value, not the header name,
so it should be matched case-sensitively. This was identified as a
bug in the conformance tests per feedback on SEP-2243.
Changes:
- Remove server-accepts-case-insensitive-base64 check from server
validation scenario
- Change base64 prefix regex from case-insensitive (/i) to
case-sensitive in client header validation
- Remove corresponding entry from sep-2243.yaml
* Remove nested x-mcp-header rejection test
The spec constrains x-mcp-header to primitive types but does not
restrict it to top-level properties only. A nested string property
with x-mcp-header is valid per the spec. This was confirmed by
mikekistler in SEP-2243 PR discussion.
Removes invalid_nested_header from the invalid tool definitions
list and its corresponding sep-2243.yaml entry.
* Use numeric comparison for number header values
Compare number header values numerically instead of as strings to
accommodate cross-SDK floating point representation differences
(e.g., '42' vs '42.0'). For integers, exact numeric match is
required. For decimals, a tolerance of 1e-9 is allowed.
See SEP-2243 discussion on number precision.
* client/http-standard-headers: SKIPPED (not FAILURE) for unexercised methods; make getChecks() idempotent
A client that never calls prompts/list isn't violating SEP-2243 — the spec
sentence is 'The client MUST include the standard MCP request headers on each
POST request', and a request that was never sent can't fail it. The Mcp-Method
requirement is already proven by initialize/tools/list etc., so there's nothing
unique to prompts/list. SKIPPED keeps the gap visible without a false red.
Separately: getChecks() was pushing into this.checks on every call. The runner
may call it more than once (e.g. progress + final report); that produced
duplicate rows. Build a fresh array per call instead.
* client/http-standard-headers: de-dup guard in checkMcpNameHeader
checkMcpMethodHeader already has 'if (this.methodHeaderChecks.has(method)) return'.
checkMcpNameHeader was missing the equivalent. The test server advertises two
tools and two resources; a client that calls both produces two
'client-mcp-name-header-tools-call' / '...-resources-read' rows.
* client/http-standard-headers: replace -> /\//g for slug generation
String.replace with a string arg only replaces the first occurrence.
'notifications/initialized' is fine (one slash) but a method like
'notifications/resources/updated' would yield 'notifications-resources/updated'
in the check id. (replaceAll isn't in this project's TS lib target.)
* client/http-custom-headers: base64 regex (.+) -> (.*) so empty string round-trips
The spec doesn't forbid base64-encoding values that don't strictly need it.
A client that always wraps would send '=?base64??=' for empty_val: ''. The
'.+' wouldn't match, so the harness would compare '=?base64??=' === ''
literally and FAIL a compliant client. '(.*)' lets the empty payload decode
to '' and match.
* client/http-custom-headers: emit SUCCESS for no-mirror-unannotated
This check only pushed on FAILURE; a correct client produced no row at all,
so the pass was invisible in reports and couldn't be counted toward coverage.
Same-slug SUCCESS/FAILURE pair is the repo convention.
* client/http-custom-headers: drop redundant optional-present check
checkParamHeader('Verbose', ...) already pushes a FAILURE when the header is
missing. The explicit 'optional-present' block re-tests the same condition
under a second check id, doubling the row.
* server/http-standard-headers: defaultArgs must use real number/boolean, not strings
defaultArgs feeds the JSON-RPC body. With Record<string, string> + '0'/'false',
a server that validates inputSchema rejects on type mismatch — which is also a
400. Every header-rejection check then false-PASSes (server rejected for the
wrong reason), and server-accepts-valid-base64 false-FAILs a compliant server
because it never gets past schema validation.
Also handle 'integer' (JSON Schema distinguishes it from 'number').
* server/http-standard-headers: createAcceptanceCheck must also assert no JSON-RPC error in body
A server can return HTTP 200 with {error: {code: -32001, ...}} in the body.
Status-only acceptance lets that through as a pass.
* use DRAFT_PROTOCOL_VERSION constant instead of 'DRAFT-2026-v1' literal
The literal will rot when the draft cycle rolls over. types.ts already exports
DRAFT_PROTOCOL_VERSION for this.
* sep-2243.yaml: spec-first rewrite + move to src/seps/
Regenerated from the SEP-2243 spec diff (transports.mdx + tools.mdx) rather
than from the scenario implementations: 21 check rows + 4 excluded vs the
previous 46 + 2. Differences:
- one check id per spec sentence (test variants go in 'details', not new ids)
- 'text:' quotes the spec sentence verbatim instead of paraphrasing
- '-reject-status' (MUST 400) split from '-reject-error-code' (SHOULD -32001
for standard headers, MUST for custom)
- rows with no spec backing dropped (whitespace, base64-padding/chars, prefix/
suffix-literal, control-char-name)
- two more SHOULD excludes (log-warning, no-sensitive-params)
- check ids use sep-2243-<slug> convention
- check: key first, excludes grouped at bottom (matches sep-2164)
Moved to src/seps/ to match #272 layout (this branch predates it; will
reconcile cleanly on rebase).
* add negative test for HttpStandardHeadersScenario
Self-contained vitest that POSTs initialize with and without Mcp-Method,
asserts FAILURE/SUCCESS on the pinned check id, and proves getChecks() is
idempotent. No example client/server file needed for this one because the
scenario *is* the server — the test crafts the bad request directly.
This is the negative-test half of AGENTS.md §Testing your scenario; the
everything-client passing-example half is still TODO.
* server/http-standard-headers: severity fixes for whitespace + malformed-base64 tests
These five tests assert behavior the SEP-2243 text doesn't pin down. Kept
because they're useful consistency signals across SDKs, but adjusted so they
don't FAIL spec-compliant servers:
- server-accepts-whitespace-header-value: this IS a MUST, but per RFC 9110
§5.5 (field parsing MUST exclude OWS), not SEP-2243. Kept as FAILURE,
specReference now cites RFC 9110.
- server-rejects-invalid-base64-padding / -chars: downgraded to INFO.
SEP-2243 says only 'MUST decode them accordingly' without picking RFC 4648
strict vs lenient. Node's Buffer.from() accepts unpadded/dirty input and
decodes to a matching value (server accepts); .NET's Convert.FromBase64String
throws (server rejects). Either is currently compliant. WARNING would force
Tier-1 SDKs into non-default strict decoding (#245); INFO records the
behavior without prescribing it, so cross-SDK divergence is visible
without tier impact.
- server-literal-missing-base64-prefix / -suffix: unchanged. The wrapper
syntax is '=?base64?{x}?=' complete; treating partial wrappers as literal
is the natural reading.
Plumbing: createRejectionCheck gains an optional failSeverity param;
testBase64Case gains a 'reject-info' mode. (This is also the hook for the
larger 400-MUST / -32001-SHOULD split that still needs doing.)
* server/http-standard-headers: split rejection check into status (MUST) + error-code (SHOULD/MUST)
SEP-2243 §Server Validation: 'servers MUST return HTTP status 400 ... and
SHOULD include a JSON-RPC error response using ... -32001'. So a server that
returns 400 with -32600 (or no error body) is compliant for standard headers.
The single conflated check FAILed it.
For custom headers (§Server Behavior for Custom Headers) -32001 IS a MUST,
so both halves stay FAILURE there.
createRejectionCheck → createRejectionChecks returning two rows: '<id>' for
the 400 status and '<id>-error-code' for -32001. testCase (standard) sets
errorCodeSeverity=WARNING; testBase64Case/testMissingCustomHeader (custom)
keep FAILURE; the two malformed-base64 INFO probes set both halves to INFO.
* rename check ids to sep-2243-* prefix; kebab-case throughout
Aligns with the sep-2243.yaml traceability ids and the repo convention
(sep-2164-*, sep-2207-*). Variant suffixes are kept (per-method, per-tool)
so per-case visibility isn't lost; the YAML's row id is a prefix of the
emitted ids, which is fine — extra ids not in the YAML are allowed.
Also fixes the mixed hyphen/underscore in reject-invalid-tool ids
(toolName has underscores; replace to hyphens).
* client: extract BaseHttpScenario to client/http-base.ts; both client scenarios extend it
http-standard-headers.ts had its own copy of the start/stop/handleRequest
boilerplate (~100 lines) that http-custom-headers.ts already abstracted as
BaseHttpScenario. Moved the abstract class to a shared file; both scenario
files now import it. HttpStandardHeadersScenario implements handlePost()
instead of handleRequest(); its handleInitialize() uses the base
sendInitialize() with the resources/prompts capability flags it needs.
Net: 465 → 361 lines for http-standard-headers.ts; the third inline copy of
the same HTTP-server scaffold is gone.
* use req.setEncoding('utf8') instead of per-chunk Buffer.toString()
Per-chunk .toString() corrupts a multi-byte UTF-8 char that straddles a TCP
chunk boundary (e.g. 0xC3 | 0xBC decodes to two replacement chars instead of
'ü'). The custom-headers scenario sends '日本語'/'naïve' in the body, so this
is reachable in principle. setEncoding('utf8') makes 'data' emit strings with
boundary handling done by Node's StringDecoder.
Fixed in both places: BaseHttpScenario.handleRequest (request body) and
sendRawRequest (response body).
* fix: migrate SEP-2243 scenarios from specVersions to source (introducedIn/removedIn)
The introducedIn/removedIn refactor (#265) replaced the specVersions array
with a source property on all scenario interfaces. The SEP-2243 scenarios
still used the old specVersions field, causing matchesSpecVersion() to
receive undefined and crash in spec-version.test.ts.
Replace specVersions with source = { introducedIn: DRAFT_PROTOCOL_VERSION }
in BaseHttpScenario and the two server ClientScenario classes. Remove the
now-unused SpecVersion imports.
* server/http-standard-headers: malformed-base64 padding/chars back to FAILURE
Per discussion on #259: the SEP-2243 conformance-test-case table is the
approved source of truth and lists these as reject cases, so they're
FAILURE-level even though the spec body itself only says 'MUST decode them
accordingly'. SDKs whose stdlib base64 is lenient (Node, browsers) will need
to validate before decoding; if that's burdensome we'll revisit.
Removes the now-unused 'reject-info' mode and statusSeverity/INFO plumbing
introduced for these two probes.
---------
Co-authored-by: Tarek Mahmoud Sayed <tarekms@microsoft.com>
Co-authored-by: Tarek Mahmoud Sayed <tarekms@ntdev.microsoft.com>
Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com>
Co-authored-by: Paul Carleton <paulc@anthropic.com>1 parent acc70de commit 39f34b2
8 files changed
Lines changed: 2490 additions & 4 deletions
File tree
- src
- scenarios
- client
- server
- seps
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
0 commit comments