Skip to content

Commit 10bf837

Browse files
committed
docs(tasks,mrtr): scenario READMEs for upstream porting
Restructured around ClientScenario classes (one row per class with check-list under it) rather than per-numbered-test slugs. Documents fixture requirements, env vars, open spec questions, and the wire-format diff for each suite. Per AGENTS.md, severity follows spec keyword (MUST/MUST NOT → FAILURE, SHOULD/SHOULD NOT → WARNING). The READMEs explain why some checks emit INFO rather than FAILURE (optional emission paths per SEP-2322).
1 parent 95da20d commit 10bf837

2 files changed

Lines changed: 308 additions & 0 deletions

File tree

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# SEP-2322 MRTR — Server Conformance
2+
3+
Tests any MCP server that implements the SEP-2322 ephemeral
4+
Multi Round-Trip Request flow on `tools/call` — the
5+
`IncompleteResult` → retry-with-`inputResponses``ToolResult`
6+
contract that lets a tool gather elicitation / sampling / roots input
7+
without creating a task envelope.
8+
9+
## Specs covered
10+
11+
| SEP | What it adds | Where it shows up |
12+
| -------- | ---------------------------------------------------------------------------------------------------------------- | ----------------------------- |
13+
| SEP-2322 | Ephemeral MRTR — `resultType` discriminator, `inputRequests` / `inputResponses` keyed maps, `requestState` token | every check |
14+
| SEP-2663 | MRTR → Tasks composition (final round returns `CreateTaskResult`) | mrtr-08 (SKIPPED — see below) |
15+
16+
## ClientScenario classes
17+
18+
### `mrtr-ephemeral-flow` (`ephemeral-flow.ts`)
19+
20+
A single scenario covering the full ephemeral MRTR contract — per the
21+
AGENTS.md "fewer scenarios, more checks" rule. A server that
22+
implemented elicitation round-trips but not sampling round-trips would
23+
be incoherent, so they bundle.
24+
25+
| Check | What it tests |
26+
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
27+
| `mrtr-basic-elicitation-round-trip` | Round 1 returns `IncompleteResult` with `elicitation/create`; round 2 completes with the answer reflected |
28+
| `mrtr-sampling-round-trip` | Same flow with `sampling/createMessage` |
29+
| `mrtr-roots-list-round-trip` | Same flow with `roots/list` |
30+
| `mrtr-request-state-round-trip` | When server emits `requestState`, it's a non-empty string and the server validates the echo |
31+
| `mrtr-multiple-input-requests-one-round` | A single `IncompleteResult` MAY carry inputRequests for `elicitation/create` + `sampling/createMessage` + `roots/list` together |
32+
| `mrtr-multi-round-flow` | A handler MAY take 2+ rounds; each round mints a fresh `requestState`; final result reflects answers from every round |
33+
| `mrtr-wrong-input-key-rerequests` | When client sends a wrong `inputResponses` key, server SHOULD re-request via `IncompleteResult` rather than erroring |
34+
| `mrtr-tasks-composition` | **SKIPPED** — see "Open issues" below |
35+
36+
## Required server fixtures
37+
38+
The fixture server MUST register these tools:
39+
40+
| Tool | Behavior |
41+
| ---------------------------------------- | ------------------------------------------------------------------------------------------- |
42+
| `test_tool_with_elicitation` | One `elicitation/create` round, completes with answer reflected |
43+
| `test_incomplete_result_sampling` | One `sampling/createMessage` round |
44+
| `test_incomplete_result_list_roots` | One `roots/list` round |
45+
| `test_incomplete_result_request_state` | Exercises `requestState` validation; final result includes `state-ok` to confirm validation |
46+
| `test_incomplete_result_multiple_inputs` | Emits 3+ inputRequests of different methods in one round |
47+
| `test_incomplete_result_multi_round` | Drives 2+ MRTR rounds, final result references every answer |
48+
| `test_incomplete_result_elicitation` | Emits inputRequest for `user_name`; server re-requests on wrong-key responses |
49+
50+
The fixture can be implemented in any language; one example reference
51+
implementation lives at
52+
[`panyam/mcpkit/examples/mrtr`](https://github.com/panyam/mcpkit/tree/main/examples/mrtr).
53+
54+
## Running
55+
56+
```bash
57+
# Against an already-running server
58+
MRTR_SERVER_URL=http://localhost:8080/mcp \
59+
npx vitest run src/scenarios/server/mrtr/all-scenarios.test.ts
60+
61+
# Auto-spawn a fixture in beforeAll
62+
MRTR_SERVER_URL=http://localhost:18093/mcp \
63+
MRTR_SERVER_CMD="/path/to/mrtr-server --port 18093" \
64+
npx vitest run src/scenarios/server/mrtr/all-scenarios.test.ts
65+
```
66+
67+
## Open issues
68+
69+
### `mrtr-tasks-composition` deferred
70+
71+
SEP-2663 commit `451f5e1` (Apr 30) made the MRTR → Tasks composition
72+
flow normative: a `tools/call` MAY exchange `IncompleteResult` rounds
73+
to gather input, then return `CreateTaskResult` to go async on a
74+
subsequent round. Two blockers prevent enabling the check today:
75+
76+
1. **Spec watch — discriminator value.** SEP-2322 (MRTR base) and
77+
SEP-2663 (Tasks Extension) currently disagree on the wire value for
78+
the "needs more input" discriminator: SEP-2322's draft uses
79+
`"input_required"`, SEP-2663's draft uses `"incomplete"`. Awaiting
80+
alignment between the SEP authors. The current literal lives in
81+
`MRTR_INCOMPLETE_RESULT_TYPE` (helpers.ts) so it's a one-line flip
82+
when the spec converges.
83+
84+
2. **Reference-impl gap.** The natural server-side implementation
85+
pattern for tasks (mint task up-front, run handler in a goroutine /
86+
async task) means the handler's `IncompleteResult` signal isn't
87+
visible to the middleware in time — by the time the handler returns
88+
`IsIncomplete`, the `CreateTaskResult` is already on the wire. SDKs
89+
in any language need an inverted middleware pattern that runs the
90+
first round synchronously and only spins up the task once the
91+
handler signals async-promotion.
92+
([panyam/mcpkit issue 347](https://github.com/panyam/mcpkit/issues/347)
93+
tracks this for one example impl; SDKs in any language hit the
94+
same architectural choice.)
95+
96+
The check is registered with `status: 'SKIPPED'` so it's discoverable
97+
but doesn't fail conformance runs. When both blockers resolve, remove
98+
the SKIPPED short-circuit in `ephemeral-flow.ts` Check 8.
99+
100+
## Design notes
101+
102+
### Why the MRTR scenarios share helpers with `tasks/`
103+
104+
`MRTR_INCOMPLETE_RESULT_TYPE`, the result-type predicates
105+
(`isIncompleteResult`, `isCompleteResult`), and the elicitation/sampling/
106+
roots mocks live in `mrtr/helpers.ts`. The raw-fetch primitives
107+
(`initRawSession`, `rawRequest`) are imported from the sibling
108+
`../tasks/helpers` because both scenario sets share the same wire-shape
109+
problem (SDK Zod schemas strip extension fields). When the upstream
110+
SDK gains schemas for SEP-2322 / SEP-2663 shapes, those import paths
111+
collapse back into the SDK.

0 commit comments

Comments
 (0)