fix: 客户端断连时取消上游请求并停止重试#4500
Conversation
WalkthroughThe changes implement context propagation through the relay system. A new helper function detects when the client request context is done, causing retry decisions to short-circuit. HTTP request construction is updated to inherit the incoming Gin request context, allowing cancellation and timeouts to propagate to upstream calls. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
relay/channel/api_request.go (1)
532-554:⚠️ Potential issue | 🟡 MinorAsync task submission: genuine orphan upstream task risk on client disconnect.
For
DoTaskApiRequest, tying the upstream submit RPC toc.Request.Context()creates a vulnerability in task endpoints (Midjourney, Suno, Sora, Kling, etc.). If the client disconnects after the upstream provider accepts/creates the task but before we complete reading the response body inDoResponse, we will:
- Have the HTTP response received successfully (task created upstream).
- Cancel the context while
io.ReadAll(resp.Body)is executing inDoResponse(e.g., task/sora/adaptor.go:229).- Fail to extract and persist
UpstreamTaskID, sotask.Insert()is skipped (relay_task.go:96-97 gates insertion ontaskErr == nil).- Leave a real, billable task running on the upstream provider with no local handle for polling, cancellation, or refund.
Recommended fix: Use
context.WithTimeout(context.Background(), <bounded-duration>)for task submission instead ofc.Request.Context(), so the submit RPC completes independently of client disconnection. Polling and streaming paths can still respect client cancellation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@relay/channel/api_request.go` around lines 532 - 554, DoTaskApiRequest currently uses c.Request.Context() when creating the upstream submit request, which can be cancelled by client disconnect and cause orphaned upstream tasks; change DoTaskApiRequest to create a new independent context using context.WithTimeout(context.Background(), <bounded-duration-or-config>) (e.g., taskSubmitTimeout) for the upstream submission and use that ctx in http.NewRequestWithContext and any related GetBody closure, while leaving downstream polling/streaming logic (DoResponse, polling paths) to continue honoring c.Request.Context() for cancellation; ensure the new timeout duration is configurable/constant and propagate that ctx only for the submit path so BuildRequestURL/BuildRequestHeader/doRequest use the independent submit ctx.
🧹 Nitpick comments (2)
relay/channel/api_request_test.go (1)
105-133: Good cancellation test — assertions are tight and race-free.
http.Client.Dochecks the request context before dialing, so a pre-cancelled context reliably yieldscontext.Canceledwithout ever reaching the upstream handler — makingcalled.Load() == falsea stable assertion. Skippingt.Parallel()here is the right call givenservice.InitHttpClient()mutates global state.One small enhancement worth considering: assert that
errors.Is(err, context.Canceled)(after unwrapping the wrapped error) so a future regression that swallows the cancel cause and returns a generic "do request failed" still gets caught.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@relay/channel/api_request_test.go` around lines 105 - 133, Enhance TestDoApiRequestUsesClientRequestContext to assert the cancellation cause: after calling DoApiRequest (the test currently requires an error and nil resp), unwrap the returned error if necessary and use errors.Is(err, context.Canceled) to verify the failure was due to the pre-cancelled request context; update the assertions in the test (TestDoApiRequestUsesClientRequestContext) to include this check so regressions that lose the cancel cause are detected while still verifying the upstream handler (in the upstream httptest server) is not called.controller/relay.go (1)
318-355: Helper placement and short-circuit logic look correct.Returning
falsefrom bothshouldRetryandshouldRetryTaskRelayimmediately whenc.Request.Context().Err() != nilcorrectly avoids burning additional channels after the downstream client has gone away. The nil guards inclientRequestDonealso keep the helper safe for any future call site wherecorc.Requestmight be nil.Minor optional simplification: since gin.Context implements
context.Context(Gin ≥ v1.8.0), the helper body could bec != nil && c.Err() != nil. Functionally equivalent in this codebase — feel free to keep as-is for explicitness about which context is being checked.Based on learnings:
gin.Context implements context.Context interface since Gin v1.8.0, providing the methods Deadline(), Done(), Err(), and Value().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controller/relay.go` around lines 318 - 355, Update the clientRequestDone helper to use the gin.Context's built-in context methods by checking c != nil && c.Err() != nil (instead of accessing c.Request.Context()), and adjust any related call sites such as shouldRetry and shouldRetryTaskRelay to rely on the revised clientRequestDone; this keeps the short-circuit behavior but simplifies the helper by leveraging gin.Context's context.Interface methods (Deadline/Done/Err/Value).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@relay/channel/api_request.go`:
- Around line 532-554: DoTaskApiRequest currently uses c.Request.Context() when
creating the upstream submit request, which can be cancelled by client
disconnect and cause orphaned upstream tasks; change DoTaskApiRequest to create
a new independent context using context.WithTimeout(context.Background(),
<bounded-duration-or-config>) (e.g., taskSubmitTimeout) for the upstream
submission and use that ctx in http.NewRequestWithContext and any related
GetBody closure, while leaving downstream polling/streaming logic (DoResponse,
polling paths) to continue honoring c.Request.Context() for cancellation; ensure
the new timeout duration is configurable/constant and propagate that ctx only
for the submit path so BuildRequestURL/BuildRequestHeader/doRequest use the
independent submit ctx.
---
Nitpick comments:
In `@controller/relay.go`:
- Around line 318-355: Update the clientRequestDone helper to use the
gin.Context's built-in context methods by checking c != nil && c.Err() != nil
(instead of accessing c.Request.Context()), and adjust any related call sites
such as shouldRetry and shouldRetryTaskRelay to rely on the revised
clientRequestDone; this keeps the short-circuit behavior but simplifies the
helper by leveraging gin.Context's context.Interface methods
(Deadline/Done/Err/Value).
In `@relay/channel/api_request_test.go`:
- Around line 105-133: Enhance TestDoApiRequestUsesClientRequestContext to
assert the cancellation cause: after calling DoApiRequest (the test currently
requires an error and nil resp), unwrap the returned error if necessary and use
errors.Is(err, context.Canceled) to verify the failure was due to the
pre-cancelled request context; update the assertions in the test
(TestDoApiRequestUsesClientRequestContext) to include this check so regressions
that lose the cancel cause are detected while still verifying the upstream
handler (in the upstream httptest server) is not called.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c02771ec-6610-4965-b6ec-6fa91594aa18
📒 Files selected for processing (4)
controller/relay.gocontroller/relay_retry_test.gorelay/channel/api_request.gorelay/channel/api_request_test.go
Important
📝 变更描述 / Description
本 PR 将客户端请求的
Context传递给上游 HTTP 请求,并在客户端请求已经取消时停止 relay retry。具体改动:
DoApiRequest/DoFormRequest/DoTaskApiRequest使用http.NewRequestWithContext(c.Request.Context(), ...)创建上游请求;shouldRetry/shouldRetryTaskRelay在客户端请求已取消时直接返回false,避免对已断开的下游继续切换渠道重试;为什么这样能生效:
当前流式处理在客户端断开时会停止向下游读取/写入,但上游 HTTP 请求未继承客户端 context,导致某些情况下上游仍继续生成和计费。将客户端 context 绑定到上游请求后,客户端主动关闭连接、SDK
stream.close()/client.close()、网络断开或超时取消都会传递给上游连接,从而避免上游继续生成对下游不可见的尾部内容,也避免断连后继续重试浪费上游额度。风险/滥用场景说明
该问题可能被恶意客户端或中间代理利用:攻击者可以构造流式 Responses / function-call / structured-output 请求,让模型先输出用户可见答案或可用的工具参数,再输出较长的隐藏尾部字段、padding、reason 或其他不会展示给最终用户的内容。攻击者在拿到前半部分有效结果后主动关闭下游流。
在未传播客户端 context 的情况下,new-api 会认为下游已经断开并停止处理当前流,但真实上游请求可能继续生成隐藏尾部内容并产生费用;由于 new-api 没有继续读取完整上游流,也可能拿不到最终
usage事件,造成上下游计费差异。本 PR 不公开完整利用脚本,仅说明触发模式;如维护者需要完整 PoC,建议通过 Security Advisory 私下共享。
🚀 变更类型 / Type of change
🔗 关联任务 / Related Issue
✅ 提交前检查项 / Checklist
📸 运行证明 / Proof of Work
本地已运行:
Summary by CodeRabbit
Release Notes