Problem
When INSERT INTO ... SELECT ... writes to multiple replicas, if one node channel is slow and times out during close_wait, it gets cancelled but NOT marked as failed. This causes:
close_wait returns OK even though a node was cancelled
- FE is unaware of the failure, commits the transaction
- PUBLISH_VERSION task is sent to ALL nodes including the cancelled one
- Cancelled node can't find the rowset → publish fails
- Data stays COMMITTED but not VISIBLE for a long time (30+ minutes until retry)
Root Cause
In IndexChannel::close_wait() (vtablet_writer.cpp), when unfinished node channels are cancelled due to timeout, mark_as_failed() is not called. FE receives no error tablet info for the cancelled replicas.
Fix
After cancelling unfinished node channels in close_wait timeout:
- Call
mark_as_failed() to record failed tablets
- Call
check_intolerable_failure() - if failures exceed tolerance, fail the entire load
- Call
set_error_tablet_in_state() to propagate error info to FE
This allows FE to:
- Skip failed replicas during PUBLISH_VERSION
- Data becomes visible immediately on healthy replicas
- Background TabletScheduler auto-repairs the failed replica
Behavior after fix
| Scenario |
Replicas |
Result |
| 3 replicas, 1 timeout |
2/3 success |
✅ Publish succeeds, failed replica auto-repairs |
| 3 replicas, 2 timeout |
1/3 success |
❌ Load fails, user gets error |
Problem
When
INSERT INTO ... SELECT ...writes to multiple replicas, if one node channel is slow and times out duringclose_wait, it gets cancelled but NOT marked as failed. This causes:close_waitreturns OK even though a node was cancelledRoot Cause
In
IndexChannel::close_wait()(vtablet_writer.cpp), when unfinished node channels are cancelled due to timeout,mark_as_failed()is not called. FE receives no error tablet info for the cancelled replicas.Fix
After cancelling unfinished node channels in
close_waittimeout:mark_as_failed()to record failed tabletscheck_intolerable_failure()- if failures exceed tolerance, fail the entire loadset_error_tablet_in_state()to propagate error info to FEThis allows FE to:
Behavior after fix