Python-SDK counterpart to typescript-sdk#1701. Seven ways to write the same weather-lookup tool, so the diff between files is the argument.
Unlike the TS demos, the lowlevel plumbing here is real — each option is
an actual mcp.server.Server that round-trips IncompleteResult through the
wire protocol. The invariant test at the bottom asserts they all produce
identical client-observed behaviour.
If you just want to see what an MRTR lowlevel handler looks like without the comparison framing, read these first:
basic.py— the simple-tool equivalent. OneIncompleteResult, one retry, done. ~130 lines, half of which are comments explaining the two moves every MRTR handler makes.basic_multiround.py— the ADO-rules SEP example. Two rounds, withrequest_statecarrying accumulated context across the retry so any server instance can handle any round.
Both are runnable end-to-end against the in-memory client:
uv run python -m mrtr_options.basic
uv run python -m mrtr_options.basic_multiround| Server infra | Pre-MRTR client | MRTR client |
|---|---|---|
| Can hold SSE | E by default; A/C/D if you opt in | MRTR |
| MRTR-only (horizontally scaled) | E by necessity | MRTR |
Both rows work for old clients — version negotiation succeeds,
tools/list is complete, tools that don't elicit are unaffected. Only
elicitation inside a tool is unavailable. Bottom-left isn't "unresolvable";
it's "E is the only option." Top-left is "E, unless you choose to carry SSE
infra." The rows collapse for E, which is why it's the SDK default.
| Author writes | SDK does | Hidden re-entry | Server state | Old client gets | |
|---|---|---|---|---|---|
| E | MRTR-native only | Nothing | No | None | Result w/ default, or error |
| A | MRTR-native only | Retry-loop over SSE | Yes, safe | SSE connection | Full elicitation |
| B | await elicit() |
Exception → IncompleteResult |
Yes, unsafe | None | Full elicitation |
| C | One handler, if version branch |
Version accessor | No | SSE (old-client arm) | Full elicitation |
| D | Two handlers | Picks by version | No | SSE (old-client arm) | Full elicitation |
| F | MRTR-native + ctx.once wraps |
once() guard in request_state |
No | None | (same as E) |
| G | Step functions + .build() |
Step-tracking in request_state | No | None | (same as E) |
| H | await ctx.elicit() (linear) |
Holds coroutine frame in memory | No | Coroutine frame | (same as E) |
"Hidden re-entry" = the handler function is invoked more than once for a single logical tool call, and the author can't tell from the source text.
A is safe because MRTR-native code has the re-entry guard (if not prefs: return IncompleteResult(...)) visible in source even though the
loop is hidden.
B is unsafe because await elicit() looks like a suspension point but
is actually a re-entry point on MRTR sessions — see the audit_log
landmine in that file.
A–E are about the dual-path axis (old client vs new). F and G address a
different axis: even in a pure-MRTR world, the naive handler shape has a
footgun. Code above the if not prefs guard runs on every retry. If that
code is a DB write or HTTP POST, it executes N times for N-round
elicitation. Nothing enforces putting side-effects below the guard —
safety depends on the developer knowing the convention. The analogy from
SDK-WG review: the naive MRTR handler is de-facto GOTO.
F (MrtrCtx.once) keeps the monolithic handler but wraps side-effects
in an idempotency guard. ctx.once("audit", lambda: audit_log(...)) checks
request_state — if the key is marked executed, skip. Opt-in: an unwrapped
mutation still fires twice. The footgun is made visually distinct, which
is reviewable.
G (ToolBuilder) decomposes the handler into named step functions.
incomplete_step may return IncompleteResult or data; end_step receives
everything and runs exactly once. There is no "above the guard" zone because
there is no guard — the SDK's step-tracking is the guard. Side-effects go in
end_step, structurally unreachable until all elicitations complete.
Both depend on request_state integrity. The demos use plain base64-JSON;
a real SDK MUST HMAC-sign the blob, or the client can forge step-done
markers and skip the guards. Per-session key derived from initialize keeps
it stateless. Without signing, the safety story is advisory.
E is the SDK default. A horizontally-scaled server gets E for free — it's the only thing that works on that infra. A server that can hold SSE also gets E by default, and opts into A/C/D only if serving old-client elicitation is worth the extra infra dependency.
A vs E is the core tension. Same author-facing code (MRTR-native), the
only difference is whether old clients get elicitation. A requires shipping
sse_retry_shim; E requires nothing. A also carries a deployment-time
hazard E doesn't: the shim calls real SSE under the hood, so on MRTR-only
infra it fails at runtime when an old client connects — a constraint that
lives nowhere near the tool code.
B is zero-migration but breaks silently for anything non-idempotent above the await. Not a ship target.
C vs D is factoring: one function with a branch vs two functions with a dispatcher. Both put the dual-path burden on the tool author.
F vs G is the footgun-prevention trade. F is minimal — one line per
side-effect, composes with any handler shape. G is structural —
double-execution impossible for end_step, but costs two function defs
per tool. Likely SDK answer: ship F as a primitive on the context, ship G
as an opt-in builder, recommend G for multi-round tools and F for
single-question tools.
H (linear continuation) is the Option B footgun, fixed. Handler code
reads exactly like the SSE era — await ctx.elicit() is a genuine
suspension point, side-effects above it fire once — because the coroutine
frame is held in memory across rounds. The trade: server is stateful
within a single tool call (frame keyed by request_state), so
horizontally-scaled deployments need sticky routing on the token. Same
operational shape as A's SSE hold but without the long-lived connection.
Use for migrating existing SSE-era tools without rewriting, or when the
linear style is genuinely clearer than guard-first. Don't use if you need
true statelessness — E/F/G encode everything in request_state itself.
tests/server/experimental/test_mrtr_options.py parametrises all seven
servers against the same Client + elicitation_callback, asserting
identical output. The footgun test measures audit_count to prove F and G
hold the side-effect to one.
- Persistent/Tasks workflow —
ServerTaskContextalready doesinput_required; MRTR integration is a separate PR mrtrOnlyclient flag — trivial to add, not demoed- requestState HMAC signing — called out in code comments