feat: response body mode framework for streaming support by noyitz · Pull Request #169 · llm-d/llm-d-inference-payload-processor

noyitz · 2026-06-15T21:43:37Z

Summary

Add BodyProcessingMode type (Skip, Chunks, Full) and ResponseBodyRequirement interface for response plugins to declare their body access needs
Pre-compute NeedsResponseBuffering per profile at startup based on plugin declarations
When NeedsResponseBuffering=false for the selected profile, ack each response body chunk immediately — enabling real-time client streaming
Plugins that don't implement the interface default to Full (backward compatible)

Body Processing Modes

Mode	Behavior	Use case
`Skip`	Plugin doesn't need response body	Headers-only plugins
`Chunks`	Plugin processes chunks as they stream (must implement `ChunkProcessor`)	Metering, logging
`Full`	Framework buffers entire response before calling plugin	API translation, body mutation

How it works

At startup, the config loader iterates each profile's response plugins and checks their BodyProcessingMode() declaration:

If ANY plugin declares Full → NeedsResponseBuffering=true for that profile
Otherwise → NeedsResponseBuffering=false, chunks are acked immediately

This is a per-profile decision, so different profiles can have different buffering behavior.

Test plan

Unit tests for computeResponseBuffering covering all mode combinations
Tests for validation: Chunks without ChunkProcessor implementation fails at startup
Multi-profile test verifying independent buffering decisions

github-actions · 2026-06-15T21:43:50Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

github-actions · 2026-06-16T21:35:24Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

Add ChunkProcessor interface that plugins declaring BodyChunked must implement. The framework validates this at startup and pre-computes the list of ChunkProcessors per profile. When streaming (no buffering), each response body chunk is passed through all ChunkProcessors before being acked to Envoy, enabling in-flight chunk transformation. Depends on llm-d#169 Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

github-actions · 2026-06-17T16:11:35Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

nirrozenbaum · 2026-06-21T18:51:21Z

+
+## Design
+
+### Response Body Mode Declaration


I'd like to propose an alternative.
instead of making plugins declare if they require full body or can operate on chunks, I propose the following:

go to the file pkg/frameworkinterfacerequesthandling. split the ResponseProcessor interface into two interfaces:

type ResponseBodyProcessor interface { plugin.Plugin // ProcessResponseBody runs the ResponseBodyProcessor plugin. // ResponseBodyProcessor can mutate the headers and/or the body of the response. ProcessResponseBody(ctx context.Context, cycleState *plugin.CycleState, response *InferenceResponse) error } type ResponseChunkProcessor interface { plugin.Plugin // ProcessResponseChunk runs the ResponseChunkProcessor plugin. // ResponseChunkProcessor can operate on a single chunk and can mutate the headers and/or the body of the response. ProcessResponseChunk(ctx context.Context, cycleState *plugin.CycleState, responseChunk *InferenceResponseChunk) error // InferenceRequestChunk doesn't exist. need to see if the response can be reused or a new type is needed.

then, Profile should be extended to include both ResponseChunkProcessor[] AND ResponseBodyProcessor[].

Done — reworked the PR to follow your approach. Split ResponseProcessor into:

ResponseProcessor — processes full buffered body (existing interface, unchanged)

ResponseChunkProcessor — processes individual chunks without buffering (new)

The config loader sorts response plugins into the right list based on which interface they implement. Profile now has ResponsePlugins, ResponseChunkProcessors, and NeedsResponseBuffering (computed automatically from len(ResponsePlugins) > 0).

Removed the BodyProcessingMode enum and ResponseBodyRequirement interface entirely. Proposal doc removed too.

nirrozenbaum · 2026-06-21T18:52:14Z

 		return nil, err
 	}

+	computeResponseBuffering(profiles, logger)


this doesn't require a separate function. if you follow the proposal I've added, this can be folded into buildProfiles function, where we need to maintain for each profile a boolean field RequireResponseBuffering: bool (which is actually a result of if len(ResponseBodyProcessor) > 0, so maybe we don't even need the extra field).

Done — folded into buildProfiles. The buffering flag is computed inline: theProfile.NeedsResponseBuffering = len(theProfile.ResponsePlugins) > 0

nirrozenbaum · 2026-06-21T18:55:09Z

+Plugins that don't implement `ResponseBodyRequirement` get a warning:
+```
+INFO  Response plugin does not declare ResponseBodyRequirement, defaulting to BodyFull  profile=default plugin=legacy-plugin/legacy
+```
+


we don't need this. llm-d IPP is a new repo and hasn't been through the process of a formal release yet. so we don't need to support backward compatibility, which can simplify a bit.

Removed. No backward compat shims.

nirrozenbaum · 2026-06-21T18:57:06Z

+
+## Future: Dynamic Mode Override
+
+Envoy ext_proc supports `mode_override` in responses. A future enhancement could override `response_body_mode` to `NONE` per-request when no response plugin in the selected profile needs the body at all. This is out of scope for the initial implementation.


this is very interesting. I didn't know this option exist.
can you open an issue about this so we can track it and make sure it goes into next version?
100% agree this is not first priority.

Will open a tracking issue for the FULL_DUPLEX_STREAMED per-chunk envoy mode.

…cessing Split response processing into two interfaces per review feedback: - ResponseProcessor: processes the complete buffered response body (existing) - ResponseChunkProcessor: processes individual chunks without buffering (new) The Profile now carries both processor lists and a NeedsResponseBuffering flag. The config loader sorts response plugins into the right list based on which interface they implement. A plugin can implement both. When ResponseProcessor plugins are present → buffer the full response. When only ResponseChunkProcessors are present → stream chunks through. When neither is present → buffer (backward compatible). Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

github-actions Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 15, 2026

noyitz force-pushed the feat/response-body-mode-framework branch from bf93834 to 381fe43 Compare June 17, 2026 16:11

nirrozenbaum removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2026

nirrozenbaum reviewed Jun 21, 2026

View reviewed changes

noyitz mentioned this pull request Jun 22, 2026

docs: design proposal 044 — plugin body mode capabilities #167

Closed

noyitz force-pushed the feat/response-body-mode-framework branch from 381fe43 to 5f2053c Compare June 22, 2026 12:36

github-actions Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: response body mode framework for streaming support#169

feat: response body mode framework for streaming support#169
noyitz wants to merge 1 commit into
llm-d:mainfrom
noyitz:feat/response-body-mode-framework

noyitz commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

nirrozenbaum Jun 21, 2026 •

edited

Loading

Uh oh!

noyitz Jun 22, 2026

Uh oh!

nirrozenbaum Jun 21, 2026 •

edited

Loading

Uh oh!

noyitz Jun 22, 2026

Uh oh!

nirrozenbaum Jun 21, 2026

Uh oh!

noyitz Jun 22, 2026

Uh oh!

nirrozenbaum Jun 21, 2026

Uh oh!

noyitz Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Future: Dynamic Mode Override

		Envoy ext_proc supports `mode_override` in responses. A future enhancement could override `response_body_mode` to `NONE` per-request when no response plugin in the selected profile needs the body at all. This is out of scope for the initial implementation.

Conversation

noyitz commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Body Processing Modes

How it works

Test plan

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

nirrozenbaum Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noyitz Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noyitz Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

noyitz Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

noyitz Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noyitz commented Jun 15, 2026 •

edited

Loading

nirrozenbaum Jun 21, 2026 •

edited

Loading

nirrozenbaum Jun 21, 2026 •

edited

Loading