[opt]Defer UCM KV Dump Waiting Until Request Completion#989
Conversation
|
I found one issue in the CP path: |
I think this does not apply to the CP path. UCMCPConnector does not override build_connector_meta(); its MRO is UCMCPConnector -> UCMLayerWiseConnector -> UCMDirectConnector, and UCMLayerWiseConnector does not override build_connector_meta either. Therefore CP uses UCMDirectConnector.build_connector_meta(), which already adds dump requests to scheduler-side _async_dump_req_ids before returning metadata. |
4507ee9 to
a1cd30f
Compare
ygwpz
left a comment
There was a problem hiding this comment.
Follow-up Review (2026-06-13)
I've reviewed the new commits pushed today. The previous concerns from 2026-06-07 have been addressed:
✅ Event handle release (line 930): The finally block in _wait_pending_dump_task ensures cleanup on all paths.
✅ State accumulation (line 958): _async_dump_req_ids is properly cleared in request_finished() (discard) and get_finished() (difference_update).
✅ Type safety (line 939): The isinstance check before getattr properly handles both KVConnectorMetadata | set[str] types.
However, I found 2 NEW issues that need attention:
See inline comments below for details on:
- Resource leak when
store.check()fails in_poll_pending_dump_tasks - Partial task handling logic in
get_finished
Overall, the deferred wait mechanism is well-designed, but these edge cases need handling to prevent resource leaks and ensure correctness.
Purpose
Move UCM KV dump completion waiting out of
wait_for_save()so model execution is no longer blocked by async dump tasks, while still preserving delayed block release correctness for finished requests.This also fixes the non-HMA
MultiConnectorpath by forwardingrequest_finished()from the outerUCMConnectorto the actual inner connector, ensuring requests that UCM saves asynchronously are properly marked for delayed free before they are returned fromget_finished().Modifications
wait_for_save()to submit dump tasks without blocking on task completion.get_finished()for requests that have finished and require delayed block release.wait_for_save()path because the timing is no longer accurate after deferring completion.UCMConnector.request_finished()forwarding for the non-HMA path soMultiConnectorcan correctly count UCM async saves.origin/developand resolved conflicts with the new layerwise / pipeline store metrics changes.Test
Tested on 8*NPU on A3 for Minimax2.7
direct connector
16K concurrancy 8 TTFT decrease(3.6605->3.2271)
64K concurrancy 4 TTFT decrease(13.0390->11.4577)
layerwise connector
slight improve almost the same