Skip to content

Commit 4df7d2f

Browse files
jerm-droclaude
andcommitted
Address review feedback from yrobla
- Fix priority_order.index() crash for unranked backends in default preset - Inline FIND_TOOL_SCHEMA and CALL_TOOL_SCHEMA in default preset sketch - Add when_unavailable parameter to elicit() for non-elicitation clients - Make preset/script/scriptFile mutual exclusion a hard validation error - Resolve error handling open question: MCP-standard isError response dicts - Keep existing decorators in Phase 1 POC, delete in later phases - Clarify session scope: runs once per session creation or Redis restore - Restructure rollout: safe capabilities first, optimizer + authz together - Add warning about handler dispatch bypassing Cedar authz (#4374) - Reference PR #4385 as interim fix for optimizer + authz bypass - Resolve authz open question: ship with optimizer, defer exact design - Add thv vmcp list-presets command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 5e20c09 commit 4df7d2f

File tree

1 file changed

+88
-23
lines changed

1 file changed

+88
-23
lines changed

rfcs/THV-0060-starlark-programmable-middleware.md

Lines changed: 88 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,8 @@ for name, backend in backends().items():
183183
publish(meta, fn)
184184
```
185185

186+
**Note on authorization**: Handlers returned from `backend.tools()` dispatch directly to the backend, bypassing Cedar authz middleware. This means a handler saved in a dict (as in the optimizer pattern) can invoke tools the current user isn't authorized to use — the same root cause as [#4374](https://github.com/stacklok/toolhive/issues/4374). See [Interaction with authorization](#interaction-with-authorization) for details. The implementation plan addresses this by shipping optimizer + authz integration together in Phase 3.
187+
186188
#### Handling name collisions across backends
187189

188190
Because the script sees which backend each tool comes from, it can handle collisions explicitly:
@@ -201,6 +203,22 @@ for name, backend in backends().items():
201203
publish(meta, fn)
202204
```
203205

206+
#### Error handling
207+
208+
When a handler invokes a backend tool and it fails, the Go engine catches the error and returns an MCP-standard `isError` response dict: `{"isError": true, "content": [{"type": "text", "text": "..."}]}`. The script is not halted — the error is a value the script can inspect and react to. This enables decorator-style error recovery as plain Starlark:
209+
210+
```python
211+
def with_fallback(fn, fallback_fn):
212+
def wrapper(args):
213+
result = fn(args)
214+
if result.get("isError"):
215+
return fallback_fn(args)
216+
return result
217+
return wrapper
218+
```
219+
220+
Retry wrappers, default responses, and other error policies are all expressible as userland decorators using the same mechanism.
221+
204222
#### Decorating handlers
205223

206224
Since handlers are just functions, decoration is plain function wrapping:
@@ -256,7 +274,7 @@ These are Go-implemented functions exposed to Starlark scripts.
256274
| Built-in | Signature | Description |
257275
|----------|-----------|-------------|
258276
| `search_index(tools)` | `search_index(list[(metadata, handler)]) → SearchIndex` | Builds a semantic search index over the tool list. Returns an object with `.search(query) → list[dict]`. |
259-
| `elicit(message, schema)` | `elicit(message, schema={}) → struct(action, content)` | Prompts the user for a decision via MCP elicitation. |
277+
| `elicit(message, schema, when_unavailable)` | `elicit(message, schema={}, when_unavailable) → struct(action, content)` | Prompts the user for a decision via MCP elicitation. `when_unavailable` is required and controls behavior when the client doesn't support elicitation: `"accept"` (permit the operation), `"reject"` (deny it), or `"error"` (halt the script). |
260278
| `config()` | `config() → dict` | Returns the vMCP config fields (`aggregation`, `optimizer`, etc.) as a read-only dict. Used by the `default` preset. |
261279
| `log(message)` | `log(message) → None` | Emits a structured audit log entry. |
262280

@@ -291,10 +309,11 @@ An important question is: how do people who don't want to write Starlark still u
291309
**Answer: presets.** A preset is a named, built-in Starlark script that replicates the behavior of today's config knobs. Presets are transparent — users can inspect the underlying Starlark source and fork it when they need customization:
292310

293311
```bash
312+
thv vmcp list-presets
294313
thv vmcp show-preset default
295314
```
296315

297-
This prints the Starlark source. A user who needs 90% of a preset's behavior can copy it, modify the 10% they need, and use `sessionInit.script` or `sessionInit.scriptFile` instead.
316+
`list-presets` shows available presets. `show-preset` prints the Starlark source for a given preset. A user who needs 90% of a preset's behavior can copy it, modify the 10% they need, and use `sessionInit.script` or `sessionInit.scriptFile` instead.
298317

299318
#### Built-in presets
300319

@@ -306,9 +325,41 @@ A single `default` preset handles all existing config knobs. When no `sessionIni
306325

307326
#### Sketch of the `default` preset
308327

309-
The `default` preset is the most complex part of this RFC — it must faithfully replicate the behavior of the existing config-driven system. Below is a sketch of what this script looks like. The exact implementation will be validated during the POC phase.
328+
The `default` preset is the most complex part of this RFC — it must faithfully replicate the behavior of the existing config-driven system. The sketch below shows what we're aiming for. The exact nature of the built-ins — particularly for more distant phases like authz integration — is open to discussion and doesn't need to be resolved in this RFC.
310329

311330
```python
331+
# --- Optimizer tool schemas (used when optimizer mode is enabled) ---
332+
FIND_TOOL_SCHEMA = {
333+
"type": "object",
334+
"properties": {
335+
"tool_description": {
336+
"type": "string",
337+
"description": "Description of the task or capability needed (e.g. 'web search', 'analyze CSV file'). Used for semantic similarity matching.",
338+
},
339+
"tool_keywords": {
340+
"type": "array",
341+
"items": {"type": "string"},
342+
"description": "Optional keywords for BM25 text search to narrow results. Combined with tool_description for hybrid search.",
343+
},
344+
},
345+
"required": ["tool_description"],
346+
}
347+
348+
CALL_TOOL_SCHEMA = {
349+
"type": "object",
350+
"properties": {
351+
"tool_name": {
352+
"type": "string",
353+
"description": "The name of the tool to execute (obtain from find_tool results).",
354+
},
355+
"parameters": {
356+
"type": "object",
357+
"description": "Arguments required by the tool. Must match the tool's input schema from find_tool.",
358+
},
359+
},
360+
"required": ["tool_name", "parameters"],
361+
}
362+
312363
cfg = config()
313364
agg = cfg.get("aggregation", {})
314365
opt = cfg.get("optimizer", None)
@@ -367,13 +418,20 @@ for backend_name, backend in backends().items():
367418
)
368419
elif strategy == "priority":
369420
existing_backend = seen_names[meta.name]
421+
if backend_name not in priority_order or existing_backend not in priority_order:
422+
continue # unranked backends don't win collisions
370423
if priority_order.index(backend_name) > priority_order.index(existing_backend):
371424
continue # lower priority, skip
372425
# else: higher priority, will overwrite
373426

374427
seen_names[meta.name] = backend_name
375428
all_published.append((meta, fn))
376429

430+
# --- Enforce Cedar authorization policies (Phase 3) ---
431+
# Filters the tool set so handlers can only dispatch to authorized tools.
432+
# This ensures the optimizer pattern doesn't bypass authz (#4374).
433+
all_published = enforce_cedar_policies(all_published)
434+
377435
# --- Optimizer mode: publish find_tool/call_tool instead of raw tools ---
378436
if opt:
379437
index = search_index(all_published)
@@ -442,7 +500,7 @@ sequenceDiagram
442500
Authz-->>Agent: CallToolResult
443501
```
444502

445-
The script runs **once** per session, not per request. `publish()` calls build up the tool set. The resulting `(metadata, handler)` pairs are used to construct the `MultiSession`, which handles all subsequent `tools/list` and `tools/call` requests. The authz middleware sits between the agent and the session, filtering and gating requests at runtime.
503+
The script runs **once** per session — either when a session is created or when it's restored from Redis. The Starlark state is in-memory and not serialized, but can be recreated by re-running the script. Even though the script output could technically be shared across sessions today, running it once per session is a deliberate choice — it enables safe adoption of user-centric built-ins like `current_user()` in the future without requiring an architectural change. `publish()` calls build up the tool set. The resulting `(metadata, handler)` pairs are used to construct the `MultiSession`, which handles all subsequent `tools/list` and `tools/call` requests. The authz middleware sits between the agent and the session, filtering and gating requests at runtime.
446504

447505
#### Where this fits in the architecture
448506

@@ -509,7 +567,7 @@ type SessionInitConfig struct {
509567

510568
#### Existing config fields
511569

512-
`aggregation`, `compositeToolRefs`, and `optimizer` remain on `Config`. The `default` preset reads these fields and produces identical behavior. When a custom `sessionInit.script` or `sessionInit.scriptFile` is set, these fields are ignored (if both are set, vMCP logs a warning).
570+
`aggregation`, `compositeToolRefs`, and `optimizer` remain on `Config`. The `default` preset reads these fields and produces identical behavior. When a custom `sessionInit.script` or `sessionInit.scriptFile` is set, these fields are ignored. Setting more than one of `preset`, `script`, or `scriptFile` is a validation error — vMCP rejects the configuration at load time.
513571

514572

515573
## Security Considerations
@@ -663,31 +721,40 @@ A fast, rough POC to validate the high-level design. The goal is to prove the pr
663721
- Implement `backends()`, `publish()`, `metadata()` built-ins in the Starlark engine
664722
- Session factory runs the script and constructs `MultiSession` from `publish()` results
665723
- Implement the `default` preset that reads existing config knobs (`aggregation`, `optimizer`, etc.)
666-
- **Delete** the existing decorator-based code that supports these config knobs today (optimizer, filter, composite tools decorators)
724+
- Run the Starlark engine alongside existing decorators for comparison testing
667725
- All existing tests must pass **except** those that test legacy composite tools (`compositeTools`, `compositeToolRefs`)
668726
- Update this RFC with any findings — design changes, missing built-ins, edge cases discovered
669727

670-
### Phase 2: Production implementation
728+
### Phase 2: Safe capabilities
671729

672-
Take the learnings from the POC and implement for real. This is the production-quality version with proper error handling, tests, and documentation.
730+
Ship the capabilities that don't interact with the authz boundary. These are the "safe" features that can be validated independently.
673731

674-
- Extend the Starlark engine from THV-0051 with production-quality `backends()`, `publish()`, `metadata()` built-ins
675-
- Port `search_index()` from current optimizer implementation
732+
- Production-quality `backends()`, `publish()`, `metadata()` built-ins
733+
- Name resolution, filtering, and overrides via the `default` preset
734+
- Rate limiting integration
676735
- `thv vmcp show-preset` command to inspect built-in presets
677736
- Config model: `sessionInit.preset`, `sessionInit.script`, `sessionInit.scriptFile`
678-
- Preset equivalence tests: verify the `default` preset produces identical behavior to the old config-driven system
679-
- Remove optimizer, filter, and composite tools decorators — the session init script is the single implementation
737+
- Preset equivalence tests for the capabilities in scope
738+
- Remove the decorator code for features replaced in this phase
739+
740+
### Phase 3: Optimizer + authz integration
741+
742+
Ship the optimizer and authz capabilities together so the relationship between them is explicit. Today, the optimizer bypasses Cedar because handlers dispatch directly to backends ([#4374](https://github.com/stacklok/toolhive/issues/4374), interim fix in [PR #4385](https://github.com/stacklok/toolhive/pull/4385)). By shipping them together, the script can enforce Cedar policies on the tool set before the optimizer builds its dispatch table (e.g. `enforce_cedar_policies(all_published)`).
680743

681-
### Phase 3: Deprecate composite tools
744+
- `search_index()` built-in ported from current optimizer implementation
745+
- Authz built-in (e.g. `enforce_cedar_policies()`) that filters the `(metadata, handler)` list
746+
- Updated `default` preset with optimizer + authz integration
747+
- Preset equivalence tests for optimizer behavior
748+
- Remove remaining decorator code
749+
- The exact design of the authz built-ins will be detailed in a follow-up RFC
750+
751+
### Phase 4: Deprecate composite tools, ship and document
682752

683753
- Mark `compositeTools` and `compositeToolRefs` as deprecated
684754
- Log deprecation warnings when these fields are used
685755
- Document migration path from declarative composite tools to Starlark scripts
686-
687-
### Phase 4: Ship and document
688-
689756
- E2E tests for custom scripts in K8s via ConfigMap
690-
- Documentation: user guide, built-in reference, composite tools migration guide, advanced use cases
757+
- Documentation: user guide, built-in reference, migration guide, advanced use cases
691758

692759
New built-in functions like `scrub_pii()` and `check_rate_limit()` are out of scope for this RFC. The programming model makes them straightforward to add as follow-up work.
693760

@@ -709,18 +776,15 @@ New built-in functions like `scrub_pii()` and `check_rate_limit()` are out of sc
709776

710777
## Open Questions
711778

712-
1. **Error handling in handler functions**: When a handler function invokes a backend tool and it fails, what should happen? Options include: (a) the handler returns an error dict to the agent, (b) a `with_fallback(fn, fallback_fn)` pattern for decorator-style error recovery, (c) preset parameters for common error policies (retry N times, fall back to a default response).
713-
714-
2. **Should authz decisions move into Starlark?** Authorization (Cedar) and session initialization (Starlark) remain entirely separate systems. This RFC reduces config knob interactions significantly and makes most of them explicit, but the "who sees what?" question still requires reasoning across both systems. The interaction between the optimizer and Cedar illustrates the problem: enabling the optimizer replaces real tool names with `find_tool` / `call_tool`, which silently breaks Cedar policies that reference the original names ([stacklok/toolhive#4373](https://github.com/stacklok/toolhive/issues/4373))); and `find_tool` returns tools the caller isn't authorized to use, because Cedar gates `tools/call` but doesn't filter search results inside a handler ([stacklok/toolhive#4374](https://github.com/stacklok/toolhive/issues/4374)). Neither system is aware of the other. Pulling authz decisions into the script (e.g., a `current_user()` built-in combined with policy logic) would unify the model but raises questions about Cedar's role and the trust boundary. Worth exploring once the base programming model is proven.
715-
716-
3. **Sessionless MCP requests**: What happens when MCP supports requests without sessions? Do we have to run this heavy script on every request? We could actually run the script once at startup, since it does not depend on request-time information. However, if we fold in authz concerns from above, then `current_user()` will be request-time information. We could cheat around this by recommending all logic which depends on `current_user()` be placed at the end of the script. When that's encountered during startup, we block and restore the state on each request. Alternatively, we could support two different scripts. One for initialization and one per-request.
779+
1. **Sessionless MCP requests**: What happens when MCP supports requests without sessions? Do we have to run this heavy script on every request? We could actually run the script once at startup, since it does not depend on request-time information. However, if we fold in authz concerns from above, then `current_user()` will be request-time information. We could cheat around this by recommending all logic which depends on `current_user()` be placed at the end of the script. When that's encountered during startup, we block and restore the state on each request. Alternatively, we could support two different scripts. One for initialization and one per-request.
717780

718781
## References
719782

720783
- [THV-0051: Starlark Scripted Tools](./THV-0051-starlark-scripted-tools.md) — original Starlark RFC
721784
- [THV-0057: Rate Limiting](./THV-0057-rate-limiting.md) — rate limiting mechanism
722785
- [THV-0017: Dynamic Webhook Middleware](./THV-0017-dynamic-webhook-middleware.md) — external webhook integration
723786
- [stacklok-epics#213](https://github.com/stacklok/stacklok-epics/issues/213) — Dynamic Webhook Middleware epic
787+
- [stacklok/toolhive#4385](https://github.com/stacklok/toolhive/pull/4385) — Interim fix for optimizer + authz bypass (#4374)
724788
- [Optimizer discoverability discussion](https://stacklok.slack.com/archives/C09L9QF47EU/p1774392171855569) — Slack thread
725789
- [Starlark Language Specification](https://github.com/bazelbuild/starlark/blob/master/spec.md)
726790
- [starlark-go Implementation](https://github.com/google/starlark-go)
@@ -789,6 +853,7 @@ def with_approval_gate(fn, tool_name):
789853
decision = elicit(
790854
"Tool '%s' may modify data. Approve?" % tool_name,
791855
schema={"type": "object", "properties": {"reason": {"type": "string"}}},
856+
when_unavailable="reject",
792857
)
793858
if decision.action != "accept":
794859
return {"error": "Declined by user"}
@@ -953,7 +1018,7 @@ def dispatch(args):
9531018
# Elicitation gate for non-readonly tools
9541019
meta = tool_metadata.get(tool_name)
9551020
if meta and not meta.annotations.get("readOnly", False):
956-
decision = elicit("Approve call to '%s'?" % tool_name)
1021+
decision = elicit("Approve call to '%s'?" % tool_name, when_unavailable="reject")
9571022
if decision.action != "accept":
9581023
return {"error": "Declined"}
9591024

0 commit comments

Comments
 (0)