Skip to content

Commit ce7c6ff

Browse files
committed
docs(proposal): streaming evaluation
1 parent 956cf1c commit ce7c6ff

1 file changed

Lines changed: 140 additions & 0 deletions

File tree

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# Streaming evaluation
2+
3+
Status: Proposed (draft PR, design doc only).
4+
Tracking: ROADMAP.md → Longer term.
5+
6+
## Motivation
7+
8+
The body-aware policy proposal (`body-aware-policies.md`) buffers the
9+
full request body up to `max_body_bytes` before evaluating. That's
10+
fine for small JSON payloads. It falls down for:
11+
12+
- Large file uploads. Buffering 50 MB to test "is this user allowed
13+
to POST to this path" is wasteful when the decision doesn't need
14+
the body at all.
15+
- gRPC and protobuf streams where the policy may only need the first
16+
few framed messages.
17+
- Latency-sensitive paths where blocking on the full body adds tens
18+
of milliseconds before the decision is even possible.
19+
20+
For policies that don't reference `input.body`, buffering should not
21+
happen at all. For policies that reference only a prefix
22+
(`input.body.action`, `input.body.user_id`), buffering should stop
23+
as soon as the prefix is decidable. Streaming evaluation makes those
24+
optimizations possible.
25+
26+
## Goals
27+
28+
1. Static AST analysis at configure time to classify each policy as
29+
one of:
30+
31+
| Class | Body buffering required |
32+
| ---------------------------- | ----------------------- |
33+
| No body refs | None (decide on headers)|
34+
| Body refs, prefix-only | Until prefix resolved |
35+
| Body refs, full-tree | Up to `max_body_bytes` |
36+
37+
1. For prefix-only policies: a streaming JSON parser path that lets
38+
`evaluate` decide partway through the body when the referenced
39+
prefix is fully resolved. The rest of the body is allowed to flow
40+
through unbuffered.
41+
1. For no-body-refs policies: skip `proxy_on_request_body` entirely.
42+
The decision is locked in at headers time.
43+
1. The streaming path is opt-in. Policies that want strict, full-body
44+
eval continue to use the buffered path; the analysis classifies
45+
conservatively (when in doubt, full-tree).
46+
47+
## Non-goals
48+
49+
- Streaming response bodies. Same idea, but tracked separately under
50+
`response-side-policies.md` extensions.
51+
- Decisions on partial header sets. Headers are atomic in proxy-wasm;
52+
no streaming there.
53+
- Refusing the body mid-stream after data has already been forwarded.
54+
zopa decides allow/deny *before* the body forwards (Envoy buffers
55+
internally up to its own configured limits).
56+
57+
## Design sketch
58+
59+
### AST classifier
60+
61+
A static walk over the configured policy AST that records every
62+
`input.body...` ref encountered:
63+
64+
```zig
65+
const BodyDeps = struct {
66+
refs_body: bool = false,
67+
refs_paths: std.ArrayList([]const []const u8), // prefix tree
68+
refs_whole: bool = false, // body referenced as a unit
69+
};
70+
71+
fn classifyBodyDeps(module: *const ast.Module) BodyDeps;
72+
```
73+
74+
`refs_whole = true` happens when the policy uses `input.body` directly
75+
(not just a sub-path) or when iteration would require the full
76+
parsed object. In that case, fall back to the buffered path.
77+
78+
### Streaming JSON parser
79+
80+
`src/json.zig` gains an iterative path that emits `(path, value)`
81+
events as the body streams in:
82+
83+
```zig
84+
pub const StreamEvent = union(enum) {
85+
enter_object: []const u8, // path so far
86+
leave_object: void,
87+
field: struct { path: [][]const u8, value: Value },
88+
done: void,
89+
};
90+
```
91+
92+
The streaming evaluator subscribes to events and binds resolved refs
93+
into the `input` lazily. As soon as every ref in the policy's prefix
94+
set is resolved, evaluation can run.
95+
96+
### Decision short-circuit
97+
98+
Once the streaming evaluator reaches a resolved decision, it tells
99+
the host:
100+
101+
- allow → return `Continue` from `proxy_on_request_body` for the
102+
current chunk and stop subscribing to body events.
103+
- deny → call `proxy_send_local_response(403)` and return Pause.
104+
105+
If the body finishes before the prefix set is fully resolved (the
106+
caller didn't include the expected field), treat the missing path as
107+
undefined (deny-by-default per Rego semantics).
108+
109+
## API impact
110+
111+
- New plugin config field `streaming: { enabled: bool, max_buffer:
112+
size }`. Default `enabled: true` once the implementation has at
113+
least one full release of stability.
114+
- Per-context state grows to hold the partial input being assembled.
115+
- AST schema unchanged.
116+
117+
## Test plan
118+
119+
- Unit tests for `classifyBodyDeps` covering each class of policy.
120+
- Streaming parser unit tests: feed a JSON byte-by-byte, assert
121+
events fire in order.
122+
- Integration test: a policy that references only `input.body.action`
123+
decides before the full payload is delivered. Measure that
124+
evaluation latency is independent of payload size.
125+
- Negative test: a policy that needs `input.body.items[*].sku` falls
126+
back to the buffered path even with `streaming.enabled: true`.
127+
128+
## Open questions
129+
130+
- The streaming parser nearly doubles the size of `src/json.zig`.
131+
Worth running a sizing experiment before committing: does
132+
`--release=small` keep zopa.wasm under 80 KB with both paths
133+
present?
134+
- How aggressive is the prefix analysis? A conservative pass classifies
135+
more policies as "full-tree" and forfeits the optimization. A
136+
precise pass requires reasoning about `some` / `every` over body
137+
arrays, which is non-trivial.
138+
- Should the streaming path also feed Envoy's body forwarding so
139+
large uploads don't pause? Probably yes, but the proxy-wasm body
140+
ABI semantics need a careful read first.

0 commit comments

Comments
 (0)