Skip to content

fix: 客户端断连时取消上游请求并停止重试#4500

Open
VonEquinox wants to merge 1 commit intoQuantumNous:mainfrom
VonEquinox:fix/cancel-upstream-on-client-disconnect
Open

fix: 客户端断连时取消上游请求并停止重试#4500
VonEquinox wants to merge 1 commit intoQuantumNous:mainfrom
VonEquinox:fix/cancel-upstream-on-client-disconnect

Conversation

@VonEquinox
Copy link
Copy Markdown

@VonEquinox VonEquinox commented Apr 27, 2026

⚠️ 提交说明 / PR Notice

Important

  • 这是人工整理后的简洁 PR 描述,避免公开完整可复现利用脚本;完整 PoC 如需提供,建议走 Security Advisory 私下沟通。

📝 变更描述 / Description

本 PR 将客户端请求的 Context 传递给上游 HTTP 请求,并在客户端请求已经取消时停止 relay retry。

具体改动:

  • DoApiRequest / DoFormRequest / DoTaskApiRequest 使用 http.NewRequestWithContext(c.Request.Context(), ...) 创建上游请求;
  • 当下游客户端断开或取消请求时,上游 HTTP 请求会随客户端 context 一起取消;
  • shouldRetry / shouldRetryTaskRelay 在客户端请求已取消时直接返回 false,避免对已断开的下游继续切换渠道重试;
  • 补充单元测试,覆盖已取消客户端 context 下不应继续请求上游、不应 retry。

为什么这样能生效:

当前流式处理在客户端断开时会停止向下游读取/写入,但上游 HTTP 请求未继承客户端 context,导致某些情况下上游仍继续生成和计费。将客户端 context 绑定到上游请求后,客户端主动关闭连接、SDK stream.close() / client.close()、网络断开或超时取消都会传递给上游连接,从而避免上游继续生成对下游不可见的尾部内容,也避免断连后继续重试浪费上游额度。

风险/滥用场景说明

该问题可能被恶意客户端或中间代理利用:攻击者可以构造流式 Responses / function-call / structured-output 请求,让模型先输出用户可见答案或可用的工具参数,再输出较长的隐藏尾部字段、padding、reason 或其他不会展示给最终用户的内容。攻击者在拿到前半部分有效结果后主动关闭下游流。

在未传播客户端 context 的情况下,new-api 会认为下游已经断开并停止处理当前流,但真实上游请求可能继续生成隐藏尾部内容并产生费用;由于 new-api 没有继续读取完整上游流,也可能拿不到最终 usage 事件,造成上下游计费差异。

本 PR 不公开完整利用脚本,仅说明触发模式;如维护者需要完整 PoC,建议通过 Security Advisory 私下共享。

🚀 变更类型 / Type of change

  • 🐛 Bug 修复 (Bug fix) - 客户端断连时上游请求未取消,可能造成上游继续计费和无意义重试
  • ✨ 新功能 (New feature)
  • ⚡ 性能优化 / 重构 (Refactor)
  • 📝 文档更新 (Documentation)

🔗 关联任务 / Related Issue

✅ 提交前检查项 / Checklist

  • 人工确认: 我已亲自整理并撰写此描述,没有直接粘贴未经处理的 AI 输出。
  • 非重复提交: 我已搜索现有 Issues 与 PRs,确认相关问题存在但当前修复路径不同于 fix: 流式请求客户端断连后继续读取上游响应,避免 usage 丢失 #4464
  • Bug fix 说明: 本 PR 关联客户端断连导致上游继续处理/计费的问题,并采用取消上游请求而非忽略客户端断连的策略。
  • 变更理解: 我已理解这些更改的工作原理及可能影响。
  • 范围聚焦: 本 PR 未包含与当前任务无关的代码改动。
  • 本地验证: 已在本地运行并通过测试或手动验证,维护者可以据此复核结果。
  • 安全合规: 代码中无敏感凭据;完整 PoC 不在公开 PR 描述中披露。

📸 运行证明 / Proof of Work

本地已运行:

go test ./relay/channel ./controller
# ok  github.com/QuantumNous/new-api/relay/channel
# ok  github.com/QuantumNous/new-api/controller

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved request cancellation handling: the system now correctly stops processing, prevents unnecessary retries, and cancels upstream requests when a client request is cancelled or times out, improving resource efficiency.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

Walkthrough

The changes implement context propagation through the relay system. A new helper function detects when the client request context is done, causing retry decisions to short-circuit. HTTP request construction is updated to inherit the incoming Gin request context, allowing cancellation and timeouts to propagate to upstream calls.

Changes

Cohort / File(s) Summary
Retry Logic with Context Detection
controller/relay.go
Added clientRequestDone helper function to check if Gin request context is finished. Updated retry decision functions to short-circuit and return false when the client request context has completed.
Retry Suppression Tests
controller/relay_retry_test.go
New test file with two tests verifying that shouldRetry and shouldRetryTaskRelay return false when the client request context is already canceled/done.
HTTP Request Context Propagation
relay/channel/api_request.go
Updated DoApiRequest, DoFormRequest, and DoTaskApiRequest to use http.NewRequestWithContext with c.Request.Context(), enabling upstream request cancellation to inherit from incoming request context.
API Request Context Tests
relay/channel/api_request_test.go
New test file introducing contextTestAdaptor test double and TestDoApiRequestUsesClientRequestContext to verify that upstream HTTP calls are canceled when the incoming request context is already done.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐰 Context flows through relays swift,
No retries when requests drift,
Cancellation propagates with care,
Upstream calls now know to spare! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title in Chinese describes the core fix: canceling upstream requests and stopping retries when the client disconnects. This accurately reflects the main changes across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
relay/channel/api_request.go (1)

532-554: ⚠️ Potential issue | 🟡 Minor

Async task submission: genuine orphan upstream task risk on client disconnect.

For DoTaskApiRequest, tying the upstream submit RPC to c.Request.Context() creates a vulnerability in task endpoints (Midjourney, Suno, Sora, Kling, etc.). If the client disconnects after the upstream provider accepts/creates the task but before we complete reading the response body in DoResponse, we will:

  1. Have the HTTP response received successfully (task created upstream).
  2. Cancel the context while io.ReadAll(resp.Body) is executing in DoResponse (e.g., task/sora/adaptor.go:229).
  3. Fail to extract and persist UpstreamTaskID, so task.Insert() is skipped (relay_task.go:96-97 gates insertion on taskErr == nil).
  4. Leave a real, billable task running on the upstream provider with no local handle for polling, cancellation, or refund.

Recommended fix: Use context.WithTimeout(context.Background(), <bounded-duration>) for task submission instead of c.Request.Context(), so the submit RPC completes independently of client disconnection. Polling and streaming paths can still respect client cancellation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@relay/channel/api_request.go` around lines 532 - 554, DoTaskApiRequest
currently uses c.Request.Context() when creating the upstream submit request,
which can be cancelled by client disconnect and cause orphaned upstream tasks;
change DoTaskApiRequest to create a new independent context using
context.WithTimeout(context.Background(), <bounded-duration-or-config>) (e.g.,
taskSubmitTimeout) for the upstream submission and use that ctx in
http.NewRequestWithContext and any related GetBody closure, while leaving
downstream polling/streaming logic (DoResponse, polling paths) to continue
honoring c.Request.Context() for cancellation; ensure the new timeout duration
is configurable/constant and propagate that ctx only for the submit path so
BuildRequestURL/BuildRequestHeader/doRequest use the independent submit ctx.
🧹 Nitpick comments (2)
relay/channel/api_request_test.go (1)

105-133: Good cancellation test — assertions are tight and race-free.

http.Client.Do checks the request context before dialing, so a pre-cancelled context reliably yields context.Canceled without ever reaching the upstream handler — making called.Load() == false a stable assertion. Skipping t.Parallel() here is the right call given service.InitHttpClient() mutates global state.

One small enhancement worth considering: assert that errors.Is(err, context.Canceled) (after unwrapping the wrapped error) so a future regression that swallows the cancel cause and returns a generic "do request failed" still gets caught.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@relay/channel/api_request_test.go` around lines 105 - 133, Enhance
TestDoApiRequestUsesClientRequestContext to assert the cancellation cause: after
calling DoApiRequest (the test currently requires an error and nil resp), unwrap
the returned error if necessary and use errors.Is(err, context.Canceled) to
verify the failure was due to the pre-cancelled request context; update the
assertions in the test (TestDoApiRequestUsesClientRequestContext) to include
this check so regressions that lose the cancel cause are detected while still
verifying the upstream handler (in the upstream httptest server) is not called.
controller/relay.go (1)

318-355: Helper placement and short-circuit logic look correct.

Returning false from both shouldRetry and shouldRetryTaskRelay immediately when c.Request.Context().Err() != nil correctly avoids burning additional channels after the downstream client has gone away. The nil guards in clientRequestDone also keep the helper safe for any future call site where c or c.Request might be nil.

Minor optional simplification: since gin.Context implements context.Context (Gin ≥ v1.8.0), the helper body could be c != nil && c.Err() != nil. Functionally equivalent in this codebase — feel free to keep as-is for explicitness about which context is being checked.

Based on learnings: gin.Context implements context.Context interface since Gin v1.8.0, providing the methods Deadline(), Done(), Err(), and Value().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controller/relay.go` around lines 318 - 355, Update the clientRequestDone
helper to use the gin.Context's built-in context methods by checking c != nil &&
c.Err() != nil (instead of accessing c.Request.Context()), and adjust any
related call sites such as shouldRetry and shouldRetryTaskRelay to rely on the
revised clientRequestDone; this keeps the short-circuit behavior but simplifies
the helper by leveraging gin.Context's context.Interface methods
(Deadline/Done/Err/Value).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@relay/channel/api_request.go`:
- Around line 532-554: DoTaskApiRequest currently uses c.Request.Context() when
creating the upstream submit request, which can be cancelled by client
disconnect and cause orphaned upstream tasks; change DoTaskApiRequest to create
a new independent context using context.WithTimeout(context.Background(),
<bounded-duration-or-config>) (e.g., taskSubmitTimeout) for the upstream
submission and use that ctx in http.NewRequestWithContext and any related
GetBody closure, while leaving downstream polling/streaming logic (DoResponse,
polling paths) to continue honoring c.Request.Context() for cancellation; ensure
the new timeout duration is configurable/constant and propagate that ctx only
for the submit path so BuildRequestURL/BuildRequestHeader/doRequest use the
independent submit ctx.

---

Nitpick comments:
In `@controller/relay.go`:
- Around line 318-355: Update the clientRequestDone helper to use the
gin.Context's built-in context methods by checking c != nil && c.Err() != nil
(instead of accessing c.Request.Context()), and adjust any related call sites
such as shouldRetry and shouldRetryTaskRelay to rely on the revised
clientRequestDone; this keeps the short-circuit behavior but simplifies the
helper by leveraging gin.Context's context.Interface methods
(Deadline/Done/Err/Value).

In `@relay/channel/api_request_test.go`:
- Around line 105-133: Enhance TestDoApiRequestUsesClientRequestContext to
assert the cancellation cause: after calling DoApiRequest (the test currently
requires an error and nil resp), unwrap the returned error if necessary and use
errors.Is(err, context.Canceled) to verify the failure was due to the
pre-cancelled request context; update the assertions in the test
(TestDoApiRequestUsesClientRequestContext) to include this check so regressions
that lose the cancel cause are detected while still verifying the upstream
handler (in the upstream httptest server) is not called.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c02771ec-6610-4965-b6ec-6fa91594aa18

📥 Commits

Reviewing files that changed from the base of the PR and between e36d191 and 799caf7.

📒 Files selected for processing (4)
  • controller/relay.go
  • controller/relay_retry_test.go
  • relay/channel/api_request.go
  • relay/channel/api_request_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant