Is there an existing issue for the same bug?
Branch Name
3.0-dev
Commit ID
b79773d
Other Environment Information
- Hardware parameters:
- OS type:
- Others: multi-CN freetier environment; requester CN observed
bind closed, cannot find lockservice address, and remote lock/unlock failures while the owner CN still held the row lock.
Actual Behavior
A remote lock can hang indefinitely after the requester loses a specific remote bind. The owner CN still treats the remote txn as a valid lock holder, so waiters remain blocked even though the requester side has already lost the bind and cannot complete the remote lock/unlock path.
Expected Behavior
Remote lock requests should not hang indefinitely when the response path breaks, and the owner CN should release stale remote holders once the corresponding bind heartbeat is lost.
Steps to Reproduce
- Run a multi-CN cluster with remote row locking enabled.
- Let one CN acquire a remote row lock on a table owned by another CN.
- Break the requester side bind / routing state so the requester logs
bind closed and later cannot find lockservice address for that remote lock.
- Observe that the owner CN still reports the original remote txn as the lock holder and waiters remain blocked for a long time.
Additional information
Root cause analysis shows two protocol gaps:
handleRemoteLock / handleForwardLock wrote the response through an async one-way path after the owner had already taken the lock, so the requester could stay stuck in morpc.Future.Get().
- Remote lock keepalive was effectively tracked at service granularity, so a lost bind could leave a stale remote holder on the owner CN.
The fix uses bounded synchronous response writes for remote lock results, tracks bind-level remote heartbeats in orphan detection, and sends KeepRemoteLock heartbeats per bind to avoid multi-table overwrite on the same peer.
Is there an existing issue for the same bug?
Branch Name
3.0-dev
Commit ID
b79773d
Other Environment Information
bind closed,cannot find lockservice address, and remote lock/unlock failures while the owner CN still held the row lock.Actual Behavior
A remote lock can hang indefinitely after the requester loses a specific remote bind. The owner CN still treats the remote txn as a valid lock holder, so waiters remain blocked even though the requester side has already lost the bind and cannot complete the remote lock/unlock path.
Expected Behavior
Remote lock requests should not hang indefinitely when the response path breaks, and the owner CN should release stale remote holders once the corresponding bind heartbeat is lost.
Steps to Reproduce
bind closedand latercannot find lockservice addressfor that remote lock.Additional information
Root cause analysis shows two protocol gaps:
handleRemoteLock/handleForwardLockwrote the response through an async one-way path after the owner had already taken the lock, so the requester could stay stuck inmorpc.Future.Get().The fix uses bounded synchronous response writes for remote lock results, tracks bind-level remote heartbeats in orphan detection, and sends
KeepRemoteLockheartbeats per bind to avoid multi-table overwrite on the same peer.