You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rfcs/THV-0060-starlark-programmable-middleware.md
+88-23Lines changed: 88 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -183,6 +183,8 @@ for name, backend in backends().items():
183
183
publish(meta, fn)
184
184
```
185
185
186
+
**Note on authorization**: Handlers returned from `backend.tools()` dispatch directly to the backend, bypassing Cedar authz middleware. This means a handler saved in a dict (as in the optimizer pattern) can invoke tools the current user isn't authorized to use — the same root cause as [#4374](https://github.com/stacklok/toolhive/issues/4374). See [Interaction with authorization](#interaction-with-authorization) for details. The implementation plan addresses this by shipping optimizer + authz integration together in Phase 3.
187
+
186
188
#### Handling name collisions across backends
187
189
188
190
Because the script sees which backend each tool comes from, it can handle collisions explicitly:
@@ -201,6 +203,22 @@ for name, backend in backends().items():
201
203
publish(meta, fn)
202
204
```
203
205
206
+
#### Error handling
207
+
208
+
When a handler invokes a backend tool and it fails, the Go engine catches the error and returns an MCP-standard `isError` response dict: `{"isError": true, "content": [{"type": "text", "text": "..."}]}`. The script is not halted — the error is a value the script can inspect and react to. This enables decorator-style error recovery as plain Starlark:
209
+
210
+
```python
211
+
defwith_fallback(fn, fallback_fn):
212
+
defwrapper(args):
213
+
result = fn(args)
214
+
if result.get("isError"):
215
+
return fallback_fn(args)
216
+
return result
217
+
return wrapper
218
+
```
219
+
220
+
Retry wrappers, default responses, and other error policies are all expressible as userland decorators using the same mechanism.
221
+
204
222
#### Decorating handlers
205
223
206
224
Since handlers are just functions, decoration is plain function wrapping:
@@ -256,7 +274,7 @@ These are Go-implemented functions exposed to Starlark scripts.
256
274
| Built-in | Signature | Description |
257
275
|----------|-----------|-------------|
258
276
|`search_index(tools)`|`search_index(list[(metadata, handler)]) → SearchIndex`| Builds a semantic search index over the tool list. Returns an object with `.search(query) → list[dict]`. |
259
-
|`elicit(message, schema)`|`elicit(message, schema={}) → struct(action, content)`| Prompts the user for a decision via MCP elicitation. |
277
+
|`elicit(message, schema, when_unavailable)`|`elicit(message, schema={}, when_unavailable) → struct(action, content)`| Prompts the user for a decision via MCP elicitation. `when_unavailable` is required and controls behavior when the client doesn't support elicitation: `"accept"` (permit the operation), `"reject"` (deny it), or `"error"` (halt the script). |
260
278
|`config()`|`config() → dict`| Returns the vMCP config fields (`aggregation`, `optimizer`, etc.) as a read-only dict. Used by the `default` preset. |
261
279
|`log(message)`|`log(message) → None`| Emits a structured audit log entry. |
262
280
@@ -291,10 +309,11 @@ An important question is: how do people who don't want to write Starlark still u
291
309
**Answer: presets.** A preset is a named, built-in Starlark script that replicates the behavior of today's config knobs. Presets are transparent — users can inspect the underlying Starlark source and fork it when they need customization:
292
310
293
311
```bash
312
+
thv vmcp list-presets
294
313
thv vmcp show-preset default
295
314
```
296
315
297
-
This prints the Starlark source. A user who needs 90% of a preset's behavior can copy it, modify the 10% they need, and use `sessionInit.script` or `sessionInit.scriptFile` instead.
316
+
`list-presets` shows available presets. `show-preset`prints the Starlark source for a given preset. A user who needs 90% of a preset's behavior can copy it, modify the 10% they need, and use `sessionInit.script` or `sessionInit.scriptFile` instead.
298
317
299
318
#### Built-in presets
300
319
@@ -306,9 +325,41 @@ A single `default` preset handles all existing config knobs. When no `sessionIni
306
325
307
326
#### Sketch of the `default` preset
308
327
309
-
The `default` preset is the most complex part of this RFC — it must faithfully replicate the behavior of the existing config-driven system. Below is a sketch of what this script looks like. The exact implementation will be validated during the POC phase.
328
+
The `default` preset is the most complex part of this RFC — it must faithfully replicate the behavior of the existing config-driven system. The sketch below shows what we're aiming for. The exact nature of the built-ins — particularly for more distant phases like authz integration — is open to discussion and doesn't need to be resolved in this RFC.
310
329
311
330
```python
331
+
# --- Optimizer tool schemas (used when optimizer mode is enabled) ---
332
+
FIND_TOOL_SCHEMA= {
333
+
"type": "object",
334
+
"properties": {
335
+
"tool_description": {
336
+
"type": "string",
337
+
"description": "Description of the task or capability needed (e.g. 'web search', 'analyze CSV file'). Used for semantic similarity matching.",
338
+
},
339
+
"tool_keywords": {
340
+
"type": "array",
341
+
"items": {"type": "string"},
342
+
"description": "Optional keywords for BM25 text search to narrow results. Combined with tool_description for hybrid search.",
343
+
},
344
+
},
345
+
"required": ["tool_description"],
346
+
}
347
+
348
+
CALL_TOOL_SCHEMA= {
349
+
"type": "object",
350
+
"properties": {
351
+
"tool_name": {
352
+
"type": "string",
353
+
"description": "The name of the tool to execute (obtain from find_tool results).",
354
+
},
355
+
"parameters": {
356
+
"type": "object",
357
+
"description": "Arguments required by the tool. Must match the tool's input schema from find_tool.",
358
+
},
359
+
},
360
+
"required": ["tool_name", "parameters"],
361
+
}
362
+
312
363
cfg = config()
313
364
agg = cfg.get("aggregation", {})
314
365
opt = cfg.get("optimizer", None)
@@ -367,13 +418,20 @@ for backend_name, backend in backends().items():
367
418
)
368
419
elif strategy =="priority":
369
420
existing_backend = seen_names[meta.name]
421
+
if backend_name notin priority_order or existing_backend notin priority_order:
422
+
continue# unranked backends don't win collisions
370
423
if priority_order.index(backend_name) > priority_order.index(existing_backend):
371
424
continue# lower priority, skip
372
425
# else: higher priority, will overwrite
373
426
374
427
seen_names[meta.name] = backend_name
375
428
all_published.append((meta, fn))
376
429
430
+
# --- Enforce Cedar authorization policies (Phase 3) ---
431
+
# Filters the tool set so handlers can only dispatch to authorized tools.
432
+
# This ensures the optimizer pattern doesn't bypass authz (#4374).
# --- Optimizer mode: publish find_tool/call_tool instead of raw tools ---
378
436
if opt:
379
437
index = search_index(all_published)
@@ -442,7 +500,7 @@ sequenceDiagram
442
500
Authz-->>Agent: CallToolResult
443
501
```
444
502
445
-
The script runs **once** per session, not per request. `publish()` calls build up the tool set. The resulting `(metadata, handler)` pairs are used to construct the `MultiSession`, which handles all subsequent `tools/list` and `tools/call` requests. The authz middleware sits between the agent and the session, filtering and gating requests at runtime.
503
+
The script runs **once** per session — either when a session is created or when it's restored from Redis. The Starlark state is in-memory and not serialized, but can be recreated by re-running the script. Even though the script output could technically be shared across sessions today, running it once per session is a deliberate choice — it enables safe adoption of user-centric built-ins like `current_user()` in the future without requiring an architectural change. `publish()` calls build up the tool set. The resulting `(metadata, handler)` pairs are used to construct the `MultiSession`, which handles all subsequent `tools/list` and `tools/call` requests. The authz middleware sits between the agent and the session, filtering and gating requests at runtime.
446
504
447
505
#### Where this fits in the architecture
448
506
@@ -509,7 +567,7 @@ type SessionInitConfig struct {
509
567
510
568
#### Existing config fields
511
569
512
-
`aggregation`, `compositeToolRefs`, and `optimizer` remain on `Config`. The `default` preset reads these fields and produces identical behavior. When a custom `sessionInit.script` or `sessionInit.scriptFile` is set, these fields are ignored (if both are set, vMCP logs a warning).
570
+
`aggregation`, `compositeToolRefs`, and `optimizer` remain on `Config`. The `default` preset reads these fields and produces identical behavior. When a custom `sessionInit.script` or `sessionInit.scriptFile` is set, these fields are ignored. Setting more than one of `preset`, `script`, or `scriptFile` is a validation error — vMCP rejects the configuration at load time.
513
571
514
572
515
573
## Security Considerations
@@ -663,31 +721,40 @@ A fast, rough POC to validate the high-level design. The goal is to prove the pr
663
721
- Implement `backends()`, `publish()`, `metadata()` built-ins in the Starlark engine
664
722
- Session factory runs the script and constructs `MultiSession` from `publish()` results
665
723
- Implement the `default` preset that reads existing config knobs (`aggregation`, `optimizer`, etc.)
666
-
-**Delete** the existing decorator-based code that supports these config knobs today (optimizer, filter, composite tools decorators)
724
+
-Run the Starlark engine alongside existing decorators for comparison testing
667
725
- All existing tests must pass **except** those that test legacy composite tools (`compositeTools`, `compositeToolRefs`)
668
726
- Update this RFC with any findings — design changes, missing built-ins, edge cases discovered
669
727
670
-
### Phase 2: Production implementation
728
+
### Phase 2: Safe capabilities
671
729
672
-
Take the learnings from the POC and implement for real. This is the production-quality version with proper error handling, tests, and documentation.
730
+
Ship the capabilities that don't interact with the authz boundary. These are the "safe" features that can be validated independently.
673
731
674
-
- Extend the Starlark engine from THV-0051 with production-quality `backends()`, `publish()`, `metadata()` built-ins
675
-
- Port `search_index()` from current optimizer implementation
- Preset equivalence tests: verify the `default` preset produces identical behavior to the old config-driven system
679
-
- Remove optimizer, filter, and composite tools decorators — the session init script is the single implementation
737
+
- Preset equivalence tests for the capabilities in scope
738
+
- Remove the decorator code for features replaced in this phase
739
+
740
+
### Phase 3: Optimizer + authz integration
741
+
742
+
Ship the optimizer and authz capabilities together so the relationship between them is explicit. Today, the optimizer bypasses Cedar because handlers dispatch directly to backends ([#4374](https://github.com/stacklok/toolhive/issues/4374), interim fix in [PR #4385](https://github.com/stacklok/toolhive/pull/4385)). By shipping them together, the script can enforce Cedar policies on the tool set before the optimizer builds its dispatch table (e.g. `enforce_cedar_policies(all_published)`).
680
743
681
-
### Phase 3: Deprecate composite tools
744
+
-`search_index()` built-in ported from current optimizer implementation
745
+
- Authz built-in (e.g. `enforce_cedar_policies()`) that filters the `(metadata, handler)` list
746
+
- Updated `default` preset with optimizer + authz integration
747
+
- Preset equivalence tests for optimizer behavior
748
+
- Remove remaining decorator code
749
+
- The exact design of the authz built-ins will be detailed in a follow-up RFC
750
+
751
+
### Phase 4: Deprecate composite tools, ship and document
682
752
683
753
- Mark `compositeTools` and `compositeToolRefs` as deprecated
684
754
- Log deprecation warnings when these fields are used
685
755
- Document migration path from declarative composite tools to Starlark scripts
686
-
687
-
### Phase 4: Ship and document
688
-
689
756
- E2E tests for custom scripts in K8s via ConfigMap
690
-
- Documentation: user guide, built-in reference, composite tools migration guide, advanced use cases
757
+
- Documentation: user guide, built-in reference, migration guide, advanced use cases
691
758
692
759
New built-in functions like `scrub_pii()` and `check_rate_limit()` are out of scope for this RFC. The programming model makes them straightforward to add as follow-up work.
693
760
@@ -709,18 +776,15 @@ New built-in functions like `scrub_pii()` and `check_rate_limit()` are out of sc
709
776
710
777
## Open Questions
711
778
712
-
1.**Error handling in handler functions**: When a handler function invokes a backend tool and it fails, what should happen? Options include: (a) the handler returns an error dict to the agent, (b) a `with_fallback(fn, fallback_fn)` pattern for decorator-style error recovery, (c) preset parameters for common error policies (retry N times, fall back to a default response).
713
-
714
-
2. **Should authz decisions move into Starlark?** Authorization (Cedar) and session initialization (Starlark) remain entirely separate systems. This RFC reduces config knob interactions significantly and makes most of them explicit, but the "who sees what?" question still requires reasoning across both systems. The interaction between the optimizer and Cedar illustrates the problem: enabling the optimizer replaces real tool names with `find_tool` / `call_tool`, which silently breaks Cedar policies that reference the original names ([stacklok/toolhive#4373](https://github.com/stacklok/toolhive/issues/4373))); and `find_tool` returns tools the caller isn't authorized to use, because Cedar gates `tools/call` but doesn't filter search results inside a handler ([stacklok/toolhive#4374](https://github.com/stacklok/toolhive/issues/4374)). Neither system is aware of the other. Pulling authz decisions into the script (e.g., a `current_user()` built-in combined with policy logic) would unify the model but raises questions about Cedar's role and the trust boundary. Worth exploring once the base programming model is proven.
715
-
716
-
3.**Sessionless MCP requests**: What happens when MCP supports requests without sessions? Do we have to run this heavy script on every request? We could actually run the script once at startup, since it does not depend on request-time information. However, if we fold in authz concerns from above, then `current_user()` will be request-time information. We could cheat around this by recommending all logic which depends on `current_user()` be placed at the end of the script. When that's encountered during startup, we block and restore the state on each request. Alternatively, we could support two different scripts. One for initialization and one per-request.
779
+
1.**Sessionless MCP requests**: What happens when MCP supports requests without sessions? Do we have to run this heavy script on every request? We could actually run the script once at startup, since it does not depend on request-time information. However, if we fold in authz concerns from above, then `current_user()` will be request-time information. We could cheat around this by recommending all logic which depends on `current_user()` be placed at the end of the script. When that's encountered during startup, we block and restore the state on each request. Alternatively, we could support two different scripts. One for initialization and one per-request.
717
780
718
781
## References
719
782
720
783
-[THV-0051: Starlark Scripted Tools](./THV-0051-starlark-scripted-tools.md) — original Starlark RFC
0 commit comments