Skip to content

Commit ed46d3a

Browse files
committed
Fix
1 parent 64f7d1e commit ed46d3a

153 files changed

Lines changed: 7064 additions & 5130 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.maggus/features/feature_001.md

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
<!-- maggus-id: 3f975d46-4413-4b50-9ab0-e20f2029ff36 -->
2+
# Feature 001: Fix HTTP/3 Integration Test Failures — Stream Multiplexing
3+
4+
## Introduction
5+
6+
14 of 131 HTTP/3 integration tests fail with `OperationCanceledException` / timeouts, and 1 test fails with data corruption (byte mismatch at offset 20365 in a 60KB echo). The root cause is that `Http30ConnectionStage` serializes all HTTP requests — it pulls one request from the Akka.Streams app inlet, encodes it, then waits for the response before pulling the next. This prevents QUIC stream multiplexing despite the protocol layer (`StateMachine`, `StreamTracker`) and transport layer (`QuicTransportStateMachine`) fully supporting up to 100 concurrent streams.
7+
8+
The data corruption is a secondary issue: when concurrent streams eventually do activate (e.g. via reconnect), `ProcessFrameData()` passes a single `streamId` for all frames decoded from one network buffer, potentially routing DATA frames to the wrong stream state.
9+
10+
### Architecture Context
11+
12+
- **Components involved:** `Http30ConnectionStage` (Streams layer), `StateMachine` + `StreamTracker` (Protocol/Http3 layer)
13+
- **No new components introduced** — this is a minimal fix to the existing connection stage
14+
- **Protocol layer is correct**`CanAcceptRequest` already checks `Tracker.CanOpenStream()` which allows `ActiveStreamCount < MaxConcurrentStreams` (default 100)
15+
- **H2 reference:** `Http20ConnectionStage` has identical `TryPullRequest()` but works because TCP multiplexes frames on a single connection; HTTP/3 uses per-request QUIC streams
16+
17+
## Goals
18+
19+
- All 131 HTTP/3 integration tests pass (currently 14 failures + 1 error)
20+
- `Http30ConnectionStage` supports concurrent request submission matching the protocol layer's capacity
21+
- No regressions in HTTP/1.x or HTTP/2 test suites
22+
- New stage-level tests verify concurrent request pulling behavior
23+
24+
## Tasks
25+
26+
### TASK-001-001: Fix Request Serialization in Http30ConnectionStage
27+
**Description:** As a developer using TurboHTTP, I want HTTP/3 requests to be multiplexed over concurrent QUIC streams so that parallel requests don't time out waiting for serial processing.
28+
29+
**Token Estimate:** ~40k tokens
30+
**Predecessors:** none
31+
**Successors:** TASK-001-003, TASK-001-004
32+
**Parallel:** yes — can run alongside TASK-001-002
33+
**Model:** opus — core architectural change to Akka.Streams stage logic
34+
35+
**File:** `src/TurboHTTP/Streams/Stages/Http30ConnectionStage.cs`
36+
37+
**Change details:**
38+
39+
The current `TryPullRequest()` (lines 376-384) correctly gates on `_sm.CanAcceptRequest`, which delegates to `StreamTracker.CanOpenStream()` (checks `ActiveStreamCount < MaxConcurrentStreams`). The pull-one-at-a-time behavior is actually correct Akka.Streams semantics — you can only `Pull()` once per demand cycle.
40+
41+
The real issue is that `OnAppPush()` encodes one request and calls `TryPullRequest()`, which issues another `Pull(_stage._inApp)`. This is correct — each push/pull cycle processes one request. The problem is that the stage **never re-pulls** after a stream completes (freeing a slot), and `FlushResponses()` only re-pulls `_inServer`, not `_inApp`.
42+
43+
**Fix approach:**
44+
1. In `FlushResponses()` — after emitting responses, call `TryPullRequest()` to fill freed stream slots
45+
2. In `OnServerPush()` for `QuicCloseItem { Kind: RequestStreamComplete }` — already calls `TryPullRequest()` (line 189), verify this path works correctly
46+
3. Ensure `OnNetworkPull()` also attempts to pull requests after preface emission
47+
4. In `ProcessFrameData()` — the existing `TryPullRequest()` at line 291 should trigger re-pulling after each frame batch; verify `CanAcceptRequest` returns true when streams have been freed by `AssembleResponse()`
48+
49+
**Key verification:** After `EmitResponse()` is called in `StateMachine`, `ReturnStreamState()` must call `Tracker.OnStreamClosed()` which decrements `ActiveStreamCount`, making `CanOpenStream()` return true again.
50+
51+
**Acceptance Criteria:**
52+
- [x] `TryPullRequest()` is called after response emission in `FlushResponses()`
53+
- [x] `StateMachine.ReturnStreamState()` correctly calls `Tracker.OnStreamClosed()` to free stream slots
54+
- [x] Multiple requests can be in-flight concurrently (verified by logging or test)
55+
- [x] Single-request scenarios still work (no regression)
56+
- [x] Build compiles without warnings
57+
58+
### TASK-001-002: Fix Stream ID Routing in ProcessFrameData
59+
**Description:** As a developer sending large HTTP/3 responses, I want response body data to be correctly routed to the originating stream so that response bodies are not corrupted.
60+
61+
**Token Estimate:** ~35k tokens
62+
**Predecessors:** none
63+
**Successors:** TASK-001-004
64+
**Parallel:** yes — can run alongside TASK-001-001
65+
**Model:** opus — requires understanding of QUIC stream multiplexing and frame routing
66+
67+
**File:** `src/TurboHTTP/Streams/Stages/Http30ConnectionStage.cs`
68+
69+
**Change details:**
70+
71+
In `ProcessFrameData()` (lines 265-291), the `streamId` parameter is passed to `AssembleResponse()` for ALL frames decoded from a single network buffer. This is correct because in HTTP/3, each QUIC stream carries exactly one HTTP request/response pair — so all frames from one `Http3InputTaggedItem` belong to the same stream.
72+
73+
However, the fallback path at lines 251-256 passes `streamId: 0` for untagged `NetworkBuffer` items:
74+
```csharp
75+
if (item is NetworkBuffer rawBuffer)
76+
{
77+
ProcessFrameData(rawBuffer, streamId: 0);
78+
return;
79+
}
80+
```
81+
82+
**Fix approach:**
83+
1. Investigate whether untagged `NetworkBuffer` items can arrive during HTTP/3 operation — if yes, they need stream ID resolution
84+
2. Check `Http3InputTaggedItem.StreamId` — verify it correctly carries the QUIC stream ID for every tagged item
85+
3. If the corruption is caused by `streamId: 0` fallback, either:
86+
- Remove the fallback (HTTP/3 should always use tagged items), or
87+
- Add stream ID extraction from the frame header itself
88+
4. Verify `StateMachine.DecodeServerData()` handles partial frame reassembly per-stream (not mixing buffers across streams)
89+
90+
**Files to inspect:**
91+
- `src/TurboHTTP/Protocol/Http3/StateMachine.cs``DecodeServerData()`, `AssembleResponse()`
92+
- `src/TurboHTTP/Protocol/Http3/StreamState.cs``AppendBody()`, `TakeBodyOwnership()`
93+
- `src/TurboHTTP/Internal/Messages.cs``Http3InputTaggedItem` definition
94+
95+
**Acceptance Criteria:**
96+
- [x] Untagged `NetworkBuffer` items either don't occur in H3 or have proper stream ID routing
97+
- [x] `AssembleResponse()` always receives the correct QUIC stream ID
98+
- [x] 60KB echo test passes without data corruption
99+
- [x] No buffer use-after-free or double-dispose scenarios
100+
- [x] Build compiles without warnings
101+
102+
### TASK-001-003: Add Stage-Level Concurrent Request Tests
103+
**Description:** As a maintainer, I want stage-level tests that verify `Http30ConnectionStage` can handle concurrent requests so that multiplexing regressions are caught early.
104+
105+
**Token Estimate:** ~60k tokens
106+
**Predecessors:** TASK-001-001
107+
**Successors:** TASK-001-004
108+
**Parallel:** no — depends on the fixed stage from TASK-001-001
109+
110+
**New file:** `src/TurboHTTP.StreamTests/Http3/Http30ConnectionConcurrencySpec.cs`
111+
112+
**Test patterns (following existing H2 patterns from `Http2ConnectionBackpressureSpec.cs`):**
113+
114+
1. **Test: Multiple requests should be pulled concurrently**
115+
- Wire `Http30ConnectionStage` with `ManualPublisherProbe<IInputItem>` (server) and `ManualSubscriberProbe<IOutputItem>` (network)
116+
- Push 3 requests into app inlet
117+
- Assert that all 3 are encoded and emitted to network outlet before any response arrives
118+
- Verify stream IDs are distinct (0, 4, 8)
119+
120+
2. **Test: Freed stream slots should accept new requests**
121+
- Fill concurrent stream capacity
122+
- Complete one stream (send `QuicCloseItem { Kind: RequestStreamComplete }`)
123+
- Assert that a new request is pulled from app inlet
124+
125+
3. **Test: Responses should be correlated to correct requests**
126+
- Send 3 concurrent requests
127+
- Respond to stream 4 first, then stream 0, then stream 8
128+
- Assert each response's `RequestMessage` matches the original request
129+
130+
**Reference files:**
131+
- `src/TurboHTTP.StreamTests/Http2/Http2ConnectionBackpressureSpec.cs` — probe patterns
132+
- `src/TurboHTTP.StreamTests/Http2/Http2ConnectionTestHelper.cs` — frame serialization helpers
133+
- `src/TurboHTTP.StreamTests/StreamTestBase.cs` — base class with ActorSystem + Materializer
134+
135+
**Acceptance Criteria:**
136+
- [x] Test class follows spec naming: `sealed class Http30ConnectionConcurrencySpec`, BDD method names
137+
- [x] `[Trait("RFC", "RFC9114-6.1")]` for stream multiplexing traceability
138+
- [x] `[Fact(Timeout = 5000)]` on all tests
139+
- [x] At least 3 test methods covering: concurrent pull, slot reuse, response correlation
140+
- [x] All new tests pass
141+
- [x] Max 500 lines per test class
142+
143+
### TASK-001-004: Integration Test Validation
144+
**Description:** As a maintainer, I want to confirm all HTTP/3 integration tests pass and no regressions exist in other protocols.
145+
146+
**Token Estimate:** ~15k tokens
147+
**Predecessors:** TASK-001-001, TASK-001-002, TASK-001-003
148+
**Successors:** none
149+
**Parallel:** no — final validation gate
150+
151+
**Steps:**
152+
1. Run H3 integration tests: `dotnet run --project TurboHTTP.IntegrationTests/TurboHTTP.IntegrationTests.csproj -- -trait "Category=Http3"`
153+
2. Analyze any remaining failures — categorize as:
154+
- Fixed by TASK-001-001 (concurrent request failures)
155+
- Fixed by TASK-001-002 (data corruption)
156+
- New issues requiring follow-up
157+
3. Run unit tests: `dotnet test TurboHTTP.Tests/TurboHTTP.Tests.csproj`
158+
4. Run stream tests: `dotnet test TurboHTTP.StreamTests/TurboHTTP.StreamTests.csproj`
159+
160+
**Acceptance Criteria:**
161+
- [ ] All 131 HTTP/3 integration tests pass (0 failures, 0 errors)
162+
- [ ] Unit tests pass with no regressions
163+
- [ ] Stream tests pass with no regressions
164+
- [ ] No new compiler warnings introduced
165+
166+
## Task Dependency Graph
167+
168+
```
169+
TASK-001-001 (request serialization) ──→ TASK-001-003 (stage tests) ──→ TASK-001-004 (validation)
170+
TASK-001-002 (stream ID routing) ────────────────────────────────┘
171+
```
172+
173+
| Task | Estimate | Predecessors | Parallel | Model |
174+
|------|----------|--------------|----------|-------|
175+
| TASK-001-001 | ~40k | none | yes (with 002) | opus |
176+
| TASK-001-002 | ~35k | none | yes (with 001) | opus |
177+
| TASK-001-003 | ~60k | 001 | no ||
178+
| TASK-001-004 | ~15k | 001, 002, 003 | no ||
179+
180+
**Total estimated tokens:** ~150k
181+
182+
## Functional Requirements
183+
184+
- FR-1: `Http30ConnectionStage` must pull new requests from the app inlet whenever `StateMachine.CanAcceptRequest` returns true, even when other requests are in-flight
185+
- FR-2: `StreamTracker.OnStreamClosed()` must be called when a response is emitted, freeing the stream slot for reuse
186+
- FR-3: `ProcessFrameData()` must route all frames to their correct HTTP/3 stream using the QUIC stream ID from `Http3InputTaggedItem`
187+
- FR-4: Untagged `NetworkBuffer` items must not silently use `streamId: 0` if they contain response data for a specific stream
188+
- FR-5: Response bodies must be assembled per-stream without cross-stream data mixing
189+
- FR-6: New stage-level tests must verify concurrent request pulling, slot reuse, and response correlation
190+
191+
## Non-Goals
192+
193+
- No refactoring of shared patterns between H2 and H3 connection stages (scope: 1A)
194+
- No changes to `Http20ConnectionStage` — H2 tests already pass
195+
- No changes to the protocol layer (`StateMachine`, `StreamTracker`) — they already support concurrency
196+
- No changes to the transport layer (`QuicTransportStateMachine`) — it already supports per-stream QUIC connections
197+
- No keep-alive ping support for H3 (that's a separate feature)
198+
- No performance optimization beyond fixing correctness
199+
200+
## Technical Considerations
201+
202+
- **Akka.Streams Pull semantics:** `Pull()` can only be called once per inlet per demand cycle. The fix works within this constraint — `TryPullRequest()` gates on `!HasBeenPulled(_stage._inApp)`, so it naturally waits for the previous push before pulling again. The key is calling `TryPullRequest()` from more code paths (after response emission, after stream completion).
203+
- **Thread safety:** All `GraphStageLogic` callbacks run on a single thread — no synchronization needed within the stage.
204+
- **Buffer lifecycle:** `NetworkBuffer` items must not be accessed after `Dispose()`. Verify that `ProcessFrameData()` doesn't dispose buffers that are still referenced by `StreamState.AppendBody()`.
205+
- **Stream ID allocation:** HTTP/3 client-initiated bidirectional streams use IDs 0, 4, 8, 12... (RFC 9114 Section 6.1). Tests should verify these specific IDs.
206+
207+
## Success Metrics
208+
209+
- 131/131 HTTP/3 integration tests pass (currently 116/131)
210+
- 0 data corruption errors in body echo tests
211+
- No regressions in HTTP/1.x and HTTP/2 test suites
212+
- Stage-level tests provide early detection of multiplexing regressions
213+
214+
## Open Questions
215+
216+
*None — all questions resolved.*

0 commit comments

Comments
 (0)