Code of Conduct
Search before asking
Describe the feature
When task is killed for stage cancel, another task attempt succeed or some other reasons, The AddBlockEvent handling and sendShuffleData still work.
Although needCancelRequest may cancel some work, but the AddBlockEvent in the blocking queue of threadPool still holds the shuffleblockdata, and so as to the rpc request that are already called but waiting for repsonse.
That will cause 3 problems:
- We freeAll memory onece the task is killed, but the shuffleBlockData hold by the async thread still occupy memory
- Many useless runnable related to the kille task are still working or wait to be executed
- Currently
checkBlockSendResult can not be interrupted, when the killed task caused by speculation is the last one of the shuffle map stage, it will block the next reduce stage scheduling
Motivation
No response
Describe the solution
- Cancel all the runnable that are wait to be executed or blocked in waiting for rpc callback
- Interrupt
checkBlockSendResult immediately
Additional context
No response
Are you willing to submit PR?
Code of Conduct
Search before asking
Describe the feature
When task is killed for stage cancel, another task attempt succeed or some other reasons, The
AddBlockEventhandling andsendShuffleDatastill work.Although
needCancelRequestmay cancel some work, but theAddBlockEventin the blocking queue of threadPool still holds the shuffleblockdata, and so as to the rpc request that are already called but waiting for repsonse.That will cause 3 problems:
checkBlockSendResultcan not be interrupted, when the killed task caused by speculation is the last one of the shuffle map stage, it will block the next reduce stage schedulingMotivation
No response
Describe the solution
checkBlockSendResultimmediatelyAdditional context
No response
Are you willing to submit PR?