Skip to content

fix(ttstream): ttstream should not recycle connection when Recv Timeout with DisableCancelRemote=true#1934

Closed
DMwangnima wants to merge 1 commit into
cloudwego:mainfrom
DMwangnima:fix/ttstream_recv_conn_reuse
Closed

fix(ttstream): ttstream should not recycle connection when Recv Timeout with DisableCancelRemote=true#1934
DMwangnima wants to merge 1 commit into
cloudwego:mainfrom
DMwangnima:fix/ttstream_recv_conn_reuse

Conversation

@DMwangnima
Copy link
Copy Markdown
Contributor

@DMwangnima DMwangnima commented Mar 25, 2026

What type of PR is this?

fix

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
zh(optional):

  1. Bug 触发场景
    客户端使用 TTStream 进行流式 RPC 调用,配置 RecvTimeout + DisableCancelRemote=true 时触发。

  2. 触发原因
    当 Recv 超时触发后,ctxDoneCallback 会被触发,执行 s.close 关闭 Stream,并复用 Stream 对应的连接。

但当 DisableCancelRemote=true 时,这个 Stream 的生命周期不应该结束,直接复用会导致一个连接上出现两个 Stream 的数据,违背了一个连接一个 Stream 的原则。

  1. 主要变更
  • 重构 Pipe:构造时绑定 stream ctx 和 callback,用于控制感知整个 stream 的生命周期变化;新增 ReadCtx(perReadCtx, perReadCallback, items) 方法,在 waitCtx 中通过独立的 select 分支分别监听 stream ctx 和 per-read ctx,彻底分离两种超时来源。
  • 解耦 per-read timeout context:RecvMsg 中单次 Recv 超时改为 context.WithTimeout(context.Background(), recvTimeout),不再派生自 stream ctx,避免超时来源混淆。
  • 拆分 callback:新增 recvTimeoutCallback 专门处理 Recv 超时;原 ctxDoneCallback 仅处理流级别 cancel/timeout,使用新增的 newStreamTimeoutException 返回准确的流超时错误。
  • 新增 Stream 超时异常:错误码 12015,错误描述 stream timeout, timeout in ctx: %+v
  • CtxDoneCallback 签名改为返回 error,使 callback 中的具体错误能直接传播给调用方。
  • container 部分全部移动至 ttstream/internal/container 下
  • 修改 streaming 接口注释,明确 Recv/Send 报错后不应再次调用;即使再次调用后也会返回相同的错误

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

@DMwangnima DMwangnima requested review from a team as code owners March 25, 2026 13:00
@DMwangnima DMwangnima force-pushed the fix/ttstream_recv_conn_reuse branch from 4813494 to 72fb453 Compare March 25, 2026 13:01
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 93.27731% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.94%. Comparing base (bfa27f3) to head (3696079).
⚠️ Report is 34 commits behind head on main.

Files with missing lines Patch % Lines
pkg/remote/trans/ttstream/client_handler.go 50.00% 1 Missing and 1 partial ⚠️
...g/remote/trans/ttstream/internal/container/pipe.go 96.49% 1 Missing and 1 partial ⚠️
pkg/remote/trans/ttstream/stream_client.go 91.30% 1 Missing and 1 partial ⚠️
pkg/remote/trans/ttstream/stream_reader.go 85.71% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1934      +/-   ##
==========================================
+ Coverage   61.37%   61.94%   +0.56%     
==========================================
  Files         388      391       +3     
  Lines       35063    30182    -4881     
==========================================
- Hits        21521    18695    -2826     
+ Misses      12247    10202    -2045     
+ Partials     1295     1285      -10     
Flag Coverage Δ
integration 52.01% <86.55%> (+1.51%) ⬆️
unit 52.19% <89.91%> (+0.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DMwangnima DMwangnima force-pushed the fix/ttstream_recv_conn_reuse branch 10 times, most recently from acd91df to 3696079 Compare March 31, 2026 06:44
@DMwangnima DMwangnima force-pushed the fix/ttstream_recv_conn_reuse branch 2 times, most recently from 338bd94 to 4fda48f Compare April 22, 2026 08:55
…fix ttstream connection reuse

Previously, RecvMsg created a child context from the stream ctx for recv timeout.
This coupled stream-level timeout with per-recv timeout, causing incorrect behavior:
- When DisableCancelRemote=true, stream ctx expiration was misidentified as recv timeout,
  preventing the RST frame from being sent and breaking connection reuse.
- When stream ctx expired before recv timeout, the error was reported as recv timeout
  instead of stream timeout.

Changes:
- Redesign Pipe to support two-level ctx: a pipe-level ctx (stream lifecycle) and a
  per-read ctx (individual recv timeout), each with its own callback
- Separate stream timeout (error 12015) from recv timeout (error 12014) so they trigger
  independent close/cancel logic
- Fix Pipe.ReadCtx to drain remaining items after detecting closed state, preventing
  data loss in Write+Close race
- Move container package to internal to prevent external dependency
- Update streaming interface comments to accurately describe client/server side
  RecvMsg/SendMsg semantics and error caching behavior
@DMwangnima DMwangnima force-pushed the fix/ttstream_recv_conn_reuse branch from 4fda48f to aca6d89 Compare April 22, 2026 09:01
@DMwangnima DMwangnima closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant