Skip to content

CBL-8071: Do not retry replicator connections if WebSocket being closed abnormally#2475

Merged
jianminzhao merged 2 commits into
release/3.2from
cbl-8071
Apr 30, 2026
Merged

CBL-8071: Do not retry replicator connections if WebSocket being closed abnormally#2475
jianminzhao merged 2 commits into
release/3.2from
cbl-8071

Conversation

@jianminzhao
Copy link
Copy Markdown
Contributor

@jianminzhao jianminzhao commented Apr 28, 2026

The Abnormal Close occurs after the client is already decided to close the socket but fails to receive response from the server. We don't want to retry replicating on this condition.

We fix it by a change in WebSocket, that it now treats a connection closure that is not fully complete only due to a pending final server acknowledgement (i.e. CLOSE frame was already sent).

…ed abnormally

The Abnormal Close occurs after the client is already decided to close the socket but fails to receive response from the server. We don't want to retry replicating on this condition.

We fix it by a change in WebSocket, that it now treats a connection closure that is  not fully complete only due to a pending final server acknowledgement (i.e. CLOSE frame was already sent).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

This is a release branch and commits are restricted.

Please confirm this PR is one of the following:

  • A response to a customer ask
  • A change per our security policy
  • A non-functional change (i.e. changes needed for building an older version)
  • A change that has been granted an exception (please comment)

@cbl-bot
Copy link
Copy Markdown

cbl-bot commented Apr 28, 2026

Code Coverage Results:

Type Percentage
branches 67.9
functions 79.29
instantiations 35.13
lines 79.38
regions 75.22

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts WebSocket close-status mapping to avoid treating an in-progress client-initiated close (CLOSE sent, no peer ACK) as an “abnormal” close that would trigger replicator retry behavior.

Changes:

  • Update WebSocketImpl::onClose to report kCodeNormal (instead of kCodeAbnormal) when the close was initiated by the client (_closeSent) but not fully acknowledged (!_closeReceived) and the underlying disconnect is otherwise considered clean.
  • Add a replicator API test covering “peer going away” both before CLOSE is sent and after CLOSE is sent.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
Replicator/tests/ReplicatorAPITest.cc Adds a regression test for peer “going away” behavior before/after a client-initiated close.
Networking/WebSockets/WebSocketImpl.cc Changes close code mapping to avoid reporting abnormal close when CLOSE was already sent by the client.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Networking/WebSockets/WebSocketImpl.cc Outdated
Comment on lines 542 to 544
// if (!expected) then _closeSent => !_closeReceived
if ( !expected ) status.code = _closeSent ? kCodeNormal : kCodeAbnormal;
else if ( !_closeMessage )
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment is logically incorrect: !expected does not imply _closeSent => !_closeReceived (it can also mean neither CLOSE was sent nor received). Consider updating the comment and making the intent explicit in code (e.g., branch specifically on _closeSent && !_closeReceived for the "client already initiated close" case, otherwise keep abnormal).

Suggested change
// if (!expected) then _closeSent => !_closeReceived
if ( !expected ) status.code = _closeSent ? kCodeNormal : kCodeAbnormal;
else if ( !_closeMessage )
if ( !expected ) {
// Treat only the "client already initiated close, but no CLOSE was received"
// case as normal; all other unexpected close states remain abnormal.
status.code = (_closeSent && !_closeReceived) ? kCodeNormal : kCodeAbnormal;
} else if ( !_closeMessage )

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed with the copilot comment that the comment is a bit confusing. I have also mistakenly think that it's an obsolute code comment that got left behind.

Comment on lines +891 to +895
// Give some time for Replicator::_stop to be called, but before timeout in WebSocketImpl
// to not get Timeout error.
std::this_thread::sleep_for(1s);
// WebSocket will treat it as Normal Close
c4socket_closed(c4socket, {WebSocketDomain, websocket::kCodeGoingAway});
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test relies on a fixed sleep_for(1s) to ensure the CLOSE frame has been sent before calling c4socket_closed. This is timing-dependent and can be flaky on slow/loaded CI machines. Prefer synchronizing on an observable event (e.g., have factory.write signal when it sees the CLOSE write after c4repl_stop, then trigger c4socket_closed), or poll with a bounded timeout until the expected write happens.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could agree with this comment. For a quick mitigate, I think we could bump the timeout a little bit. However, if we are not seeing the test to be flaky anywhere, we can leave it as is.

Comment on lines +891 to +895
// Give some time for Replicator::_stop to be called, but before timeout in WebSocketImpl
// to not get Timeout error.
std::this_thread::sleep_for(1s);
// WebSocket will treat it as Normal Close
c4socket_closed(c4socket, {WebSocketDomain, websocket::kCodeGoingAway});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could agree with this comment. For a quick mitigate, I think we could bump the timeout a little bit. However, if we are not seeing the test to be flaky anywhere, we can leave it as is.

Comment thread Networking/WebSockets/WebSocketImpl.cc Outdated
Comment on lines 542 to 544
// if (!expected) then _closeSent => !_closeReceived
if ( !expected ) status.code = _closeSent ? kCodeNormal : kCodeAbnormal;
else if ( !_closeMessage )
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed with the copilot comment that the comment is a bit confusing. I have also mistakenly think that it's an obsolute code comment that got left behind.

@pasin pasin changed the title CBL-8071: Do not retry replicator connections if WebSocket being clos… CBL-8071: Do not retry replicator connections if WebSocket being closed abnormally Apr 29, 2026
@github-actions
Copy link
Copy Markdown

This is a release branch and commits are restricted.

Please confirm this PR is one of the following:

  • A response to a customer ask
  • A change per our security policy
  • A non-functional change (i.e. changes needed for building an older version)
  • A change that has been granted an exception (please comment)

@jianminzhao jianminzhao merged commit c8b2b18 into release/3.2 Apr 30, 2026
9 of 10 checks passed
@jianminzhao jianminzhao deleted the cbl-8071 branch April 30, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants