|
| 1 | +<!-- maggus-id: 3f975d46-4413-4b50-9ab0-e20f2029ff36 --> |
| 2 | +# Feature 001: Fix HTTP/3 Integration Test Failures — Stream Multiplexing |
| 3 | + |
| 4 | +## Introduction |
| 5 | + |
| 6 | +14 of 131 HTTP/3 integration tests fail with `OperationCanceledException` / timeouts, and 1 test fails with data corruption (byte mismatch at offset 20365 in a 60KB echo). The root cause is that `Http30ConnectionStage` serializes all HTTP requests — it pulls one request from the Akka.Streams app inlet, encodes it, then waits for the response before pulling the next. This prevents QUIC stream multiplexing despite the protocol layer (`StateMachine`, `StreamTracker`) and transport layer (`QuicTransportStateMachine`) fully supporting up to 100 concurrent streams. |
| 7 | + |
| 8 | +The data corruption is a secondary issue: when concurrent streams eventually do activate (e.g. via reconnect), `ProcessFrameData()` passes a single `streamId` for all frames decoded from one network buffer, potentially routing DATA frames to the wrong stream state. |
| 9 | + |
| 10 | +### Architecture Context |
| 11 | + |
| 12 | +- **Components involved:** `Http30ConnectionStage` (Streams layer), `StateMachine` + `StreamTracker` (Protocol/Http3 layer) |
| 13 | +- **No new components introduced** — this is a minimal fix to the existing connection stage |
| 14 | +- **Protocol layer is correct** — `CanAcceptRequest` already checks `Tracker.CanOpenStream()` which allows `ActiveStreamCount < MaxConcurrentStreams` (default 100) |
| 15 | +- **H2 reference:** `Http20ConnectionStage` has identical `TryPullRequest()` but works because TCP multiplexes frames on a single connection; HTTP/3 uses per-request QUIC streams |
| 16 | + |
| 17 | +## Goals |
| 18 | + |
| 19 | +- All 131 HTTP/3 integration tests pass (currently 14 failures + 1 error) |
| 20 | +- `Http30ConnectionStage` supports concurrent request submission matching the protocol layer's capacity |
| 21 | +- No regressions in HTTP/1.x or HTTP/2 test suites |
| 22 | +- New stage-level tests verify concurrent request pulling behavior |
| 23 | + |
| 24 | +## Tasks |
| 25 | + |
| 26 | +### TASK-001-001: Fix Request Serialization in Http30ConnectionStage |
| 27 | +**Description:** As a developer using TurboHTTP, I want HTTP/3 requests to be multiplexed over concurrent QUIC streams so that parallel requests don't time out waiting for serial processing. |
| 28 | + |
| 29 | +**Token Estimate:** ~40k tokens |
| 30 | +**Predecessors:** none |
| 31 | +**Successors:** TASK-001-003, TASK-001-004 |
| 32 | +**Parallel:** yes — can run alongside TASK-001-002 |
| 33 | +**Model:** opus — core architectural change to Akka.Streams stage logic |
| 34 | + |
| 35 | +**File:** `src/TurboHTTP/Streams/Stages/Http30ConnectionStage.cs` |
| 36 | + |
| 37 | +**Change details:** |
| 38 | + |
| 39 | +The current `TryPullRequest()` (lines 376-384) correctly gates on `_sm.CanAcceptRequest`, which delegates to `StreamTracker.CanOpenStream()` (checks `ActiveStreamCount < MaxConcurrentStreams`). The pull-one-at-a-time behavior is actually correct Akka.Streams semantics — you can only `Pull()` once per demand cycle. |
| 40 | + |
| 41 | +The real issue is that `OnAppPush()` encodes one request and calls `TryPullRequest()`, which issues another `Pull(_stage._inApp)`. This is correct — each push/pull cycle processes one request. The problem is that the stage **never re-pulls** after a stream completes (freeing a slot), and `FlushResponses()` only re-pulls `_inServer`, not `_inApp`. |
| 42 | + |
| 43 | +**Fix approach:** |
| 44 | +1. In `FlushResponses()` — after emitting responses, call `TryPullRequest()` to fill freed stream slots |
| 45 | +2. In `OnServerPush()` for `QuicCloseItem { Kind: RequestStreamComplete }` — already calls `TryPullRequest()` (line 189), verify this path works correctly |
| 46 | +3. Ensure `OnNetworkPull()` also attempts to pull requests after preface emission |
| 47 | +4. In `ProcessFrameData()` — the existing `TryPullRequest()` at line 291 should trigger re-pulling after each frame batch; verify `CanAcceptRequest` returns true when streams have been freed by `AssembleResponse()` |
| 48 | + |
| 49 | +**Key verification:** After `EmitResponse()` is called in `StateMachine`, `ReturnStreamState()` must call `Tracker.OnStreamClosed()` which decrements `ActiveStreamCount`, making `CanOpenStream()` return true again. |
| 50 | + |
| 51 | +**Acceptance Criteria:** |
| 52 | +- [x] `TryPullRequest()` is called after response emission in `FlushResponses()` |
| 53 | +- [x] `StateMachine.ReturnStreamState()` correctly calls `Tracker.OnStreamClosed()` to free stream slots |
| 54 | +- [x] Multiple requests can be in-flight concurrently (verified by logging or test) |
| 55 | +- [x] Single-request scenarios still work (no regression) |
| 56 | +- [x] Build compiles without warnings |
| 57 | + |
| 58 | +### TASK-001-002: Fix Stream ID Routing in ProcessFrameData |
| 59 | +**Description:** As a developer sending large HTTP/3 responses, I want response body data to be correctly routed to the originating stream so that response bodies are not corrupted. |
| 60 | + |
| 61 | +**Token Estimate:** ~35k tokens |
| 62 | +**Predecessors:** none |
| 63 | +**Successors:** TASK-001-004 |
| 64 | +**Parallel:** yes — can run alongside TASK-001-001 |
| 65 | +**Model:** opus — requires understanding of QUIC stream multiplexing and frame routing |
| 66 | + |
| 67 | +**File:** `src/TurboHTTP/Streams/Stages/Http30ConnectionStage.cs` |
| 68 | + |
| 69 | +**Change details:** |
| 70 | + |
| 71 | +In `ProcessFrameData()` (lines 265-291), the `streamId` parameter is passed to `AssembleResponse()` for ALL frames decoded from a single network buffer. This is correct because in HTTP/3, each QUIC stream carries exactly one HTTP request/response pair — so all frames from one `Http3InputTaggedItem` belong to the same stream. |
| 72 | + |
| 73 | +However, the fallback path at lines 251-256 passes `streamId: 0` for untagged `NetworkBuffer` items: |
| 74 | +```csharp |
| 75 | +if (item is NetworkBuffer rawBuffer) |
| 76 | +{ |
| 77 | + ProcessFrameData(rawBuffer, streamId: 0); |
| 78 | + return; |
| 79 | +} |
| 80 | +``` |
| 81 | + |
| 82 | +**Fix approach:** |
| 83 | +1. Investigate whether untagged `NetworkBuffer` items can arrive during HTTP/3 operation — if yes, they need stream ID resolution |
| 84 | +2. Check `Http3InputTaggedItem.StreamId` — verify it correctly carries the QUIC stream ID for every tagged item |
| 85 | +3. If the corruption is caused by `streamId: 0` fallback, either: |
| 86 | + - Remove the fallback (HTTP/3 should always use tagged items), or |
| 87 | + - Add stream ID extraction from the frame header itself |
| 88 | +4. Verify `StateMachine.DecodeServerData()` handles partial frame reassembly per-stream (not mixing buffers across streams) |
| 89 | + |
| 90 | +**Files to inspect:** |
| 91 | +- `src/TurboHTTP/Protocol/Http3/StateMachine.cs` — `DecodeServerData()`, `AssembleResponse()` |
| 92 | +- `src/TurboHTTP/Protocol/Http3/StreamState.cs` — `AppendBody()`, `TakeBodyOwnership()` |
| 93 | +- `src/TurboHTTP/Internal/Messages.cs` — `Http3InputTaggedItem` definition |
| 94 | + |
| 95 | +**Acceptance Criteria:** |
| 96 | +- [x] Untagged `NetworkBuffer` items either don't occur in H3 or have proper stream ID routing |
| 97 | +- [x] `AssembleResponse()` always receives the correct QUIC stream ID |
| 98 | +- [x] 60KB echo test passes without data corruption |
| 99 | +- [x] No buffer use-after-free or double-dispose scenarios |
| 100 | +- [x] Build compiles without warnings |
| 101 | + |
| 102 | +### TASK-001-003: Add Stage-Level Concurrent Request Tests |
| 103 | +**Description:** As a maintainer, I want stage-level tests that verify `Http30ConnectionStage` can handle concurrent requests so that multiplexing regressions are caught early. |
| 104 | + |
| 105 | +**Token Estimate:** ~60k tokens |
| 106 | +**Predecessors:** TASK-001-001 |
| 107 | +**Successors:** TASK-001-004 |
| 108 | +**Parallel:** no — depends on the fixed stage from TASK-001-001 |
| 109 | + |
| 110 | +**New file:** `src/TurboHTTP.StreamTests/Http3/Http30ConnectionConcurrencySpec.cs` |
| 111 | + |
| 112 | +**Test patterns (following existing H2 patterns from `Http2ConnectionBackpressureSpec.cs`):** |
| 113 | + |
| 114 | +1. **Test: Multiple requests should be pulled concurrently** |
| 115 | + - Wire `Http30ConnectionStage` with `ManualPublisherProbe<IInputItem>` (server) and `ManualSubscriberProbe<IOutputItem>` (network) |
| 116 | + - Push 3 requests into app inlet |
| 117 | + - Assert that all 3 are encoded and emitted to network outlet before any response arrives |
| 118 | + - Verify stream IDs are distinct (0, 4, 8) |
| 119 | + |
| 120 | +2. **Test: Freed stream slots should accept new requests** |
| 121 | + - Fill concurrent stream capacity |
| 122 | + - Complete one stream (send `QuicCloseItem { Kind: RequestStreamComplete }`) |
| 123 | + - Assert that a new request is pulled from app inlet |
| 124 | + |
| 125 | +3. **Test: Responses should be correlated to correct requests** |
| 126 | + - Send 3 concurrent requests |
| 127 | + - Respond to stream 4 first, then stream 0, then stream 8 |
| 128 | + - Assert each response's `RequestMessage` matches the original request |
| 129 | + |
| 130 | +**Reference files:** |
| 131 | +- `src/TurboHTTP.StreamTests/Http2/Http2ConnectionBackpressureSpec.cs` — probe patterns |
| 132 | +- `src/TurboHTTP.StreamTests/Http2/Http2ConnectionTestHelper.cs` — frame serialization helpers |
| 133 | +- `src/TurboHTTP.StreamTests/StreamTestBase.cs` — base class with ActorSystem + Materializer |
| 134 | + |
| 135 | +**Acceptance Criteria:** |
| 136 | +- [x] Test class follows spec naming: `sealed class Http30ConnectionConcurrencySpec`, BDD method names |
| 137 | +- [x] `[Trait("RFC", "RFC9114-6.1")]` for stream multiplexing traceability |
| 138 | +- [x] `[Fact(Timeout = 5000)]` on all tests |
| 139 | +- [x] At least 3 test methods covering: concurrent pull, slot reuse, response correlation |
| 140 | +- [x] All new tests pass |
| 141 | +- [x] Max 500 lines per test class |
| 142 | + |
| 143 | +### TASK-001-004: Integration Test Validation |
| 144 | +**Description:** As a maintainer, I want to confirm all HTTP/3 integration tests pass and no regressions exist in other protocols. |
| 145 | + |
| 146 | +**Token Estimate:** ~15k tokens |
| 147 | +**Predecessors:** TASK-001-001, TASK-001-002, TASK-001-003 |
| 148 | +**Successors:** none |
| 149 | +**Parallel:** no — final validation gate |
| 150 | + |
| 151 | +**Steps:** |
| 152 | +1. Run H3 integration tests: `dotnet run --project TurboHTTP.IntegrationTests/TurboHTTP.IntegrationTests.csproj -- -trait "Category=Http3"` |
| 153 | +2. Analyze any remaining failures — categorize as: |
| 154 | + - Fixed by TASK-001-001 (concurrent request failures) |
| 155 | + - Fixed by TASK-001-002 (data corruption) |
| 156 | + - New issues requiring follow-up |
| 157 | +3. Run unit tests: `dotnet test TurboHTTP.Tests/TurboHTTP.Tests.csproj` |
| 158 | +4. Run stream tests: `dotnet test TurboHTTP.StreamTests/TurboHTTP.StreamTests.csproj` |
| 159 | + |
| 160 | +**Acceptance Criteria:** |
| 161 | +- [ ] All 131 HTTP/3 integration tests pass (0 failures, 0 errors) |
| 162 | +- [ ] Unit tests pass with no regressions |
| 163 | +- [ ] Stream tests pass with no regressions |
| 164 | +- [ ] No new compiler warnings introduced |
| 165 | + |
| 166 | +## Task Dependency Graph |
| 167 | + |
| 168 | +``` |
| 169 | +TASK-001-001 (request serialization) ──→ TASK-001-003 (stage tests) ──→ TASK-001-004 (validation) |
| 170 | +TASK-001-002 (stream ID routing) ────────────────────────────────┘ |
| 171 | +``` |
| 172 | + |
| 173 | +| Task | Estimate | Predecessors | Parallel | Model | |
| 174 | +|------|----------|--------------|----------|-------| |
| 175 | +| TASK-001-001 | ~40k | none | yes (with 002) | opus | |
| 176 | +| TASK-001-002 | ~35k | none | yes (with 001) | opus | |
| 177 | +| TASK-001-003 | ~60k | 001 | no | — | |
| 178 | +| TASK-001-004 | ~15k | 001, 002, 003 | no | — | |
| 179 | + |
| 180 | +**Total estimated tokens:** ~150k |
| 181 | + |
| 182 | +## Functional Requirements |
| 183 | + |
| 184 | +- FR-1: `Http30ConnectionStage` must pull new requests from the app inlet whenever `StateMachine.CanAcceptRequest` returns true, even when other requests are in-flight |
| 185 | +- FR-2: `StreamTracker.OnStreamClosed()` must be called when a response is emitted, freeing the stream slot for reuse |
| 186 | +- FR-3: `ProcessFrameData()` must route all frames to their correct HTTP/3 stream using the QUIC stream ID from `Http3InputTaggedItem` |
| 187 | +- FR-4: Untagged `NetworkBuffer` items must not silently use `streamId: 0` if they contain response data for a specific stream |
| 188 | +- FR-5: Response bodies must be assembled per-stream without cross-stream data mixing |
| 189 | +- FR-6: New stage-level tests must verify concurrent request pulling, slot reuse, and response correlation |
| 190 | + |
| 191 | +## Non-Goals |
| 192 | + |
| 193 | +- No refactoring of shared patterns between H2 and H3 connection stages (scope: 1A) |
| 194 | +- No changes to `Http20ConnectionStage` — H2 tests already pass |
| 195 | +- No changes to the protocol layer (`StateMachine`, `StreamTracker`) — they already support concurrency |
| 196 | +- No changes to the transport layer (`QuicTransportStateMachine`) — it already supports per-stream QUIC connections |
| 197 | +- No keep-alive ping support for H3 (that's a separate feature) |
| 198 | +- No performance optimization beyond fixing correctness |
| 199 | + |
| 200 | +## Technical Considerations |
| 201 | + |
| 202 | +- **Akka.Streams Pull semantics:** `Pull()` can only be called once per inlet per demand cycle. The fix works within this constraint — `TryPullRequest()` gates on `!HasBeenPulled(_stage._inApp)`, so it naturally waits for the previous push before pulling again. The key is calling `TryPullRequest()` from more code paths (after response emission, after stream completion). |
| 203 | +- **Thread safety:** All `GraphStageLogic` callbacks run on a single thread — no synchronization needed within the stage. |
| 204 | +- **Buffer lifecycle:** `NetworkBuffer` items must not be accessed after `Dispose()`. Verify that `ProcessFrameData()` doesn't dispose buffers that are still referenced by `StreamState.AppendBody()`. |
| 205 | +- **Stream ID allocation:** HTTP/3 client-initiated bidirectional streams use IDs 0, 4, 8, 12... (RFC 9114 Section 6.1). Tests should verify these specific IDs. |
| 206 | + |
| 207 | +## Success Metrics |
| 208 | + |
| 209 | +- 131/131 HTTP/3 integration tests pass (currently 116/131) |
| 210 | +- 0 data corruption errors in body echo tests |
| 211 | +- No regressions in HTTP/1.x and HTTP/2 test suites |
| 212 | +- Stage-level tests provide early detection of multiplexing regressions |
| 213 | + |
| 214 | +## Open Questions |
| 215 | + |
| 216 | +*None — all questions resolved.* |
0 commit comments