Commit 42afd8b
authored
[Fix] Fix dump err when 0 < len(load_blocks) < fully hit (#927)
## Purpose
Fix the dump failure issue that occurs when len(load_blocks) < fully_hit
in the external hit scenario with **use_lite=true**.The failure is
caused by len(ucm_block_ids) < len(vllm_block_ids) during
wait_for_save.This discrepancy arises because, in
**_generate_dispatch_meta**, the **new_tokens** parameter still includes
external hits, even though req_meta.token_processed has already
accounted for those tokens.
## Modifications
- Modify func **get_num_new_matched_tokens** of UCMLiteConnector: using
hbm_hit_tokens as request_meta.token_processed, as it needs to return 0
hit in external storage.
- Modify func **_generate_dispatch_meta** of UCMLiteConnector: only
generating dump block infos when req_meta.token_processed + new_tokens
>= total_hit_tokens
## Test
Tested with online llmperf pipeline.1 parent 3a11f16 commit 42afd8b
1 file changed
Lines changed: 11 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1116 | 1116 | | |
1117 | 1117 | | |
1118 | 1118 | | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
1119 | 1130 | | |
1120 | | - | |
1121 | 1131 | | |
1122 | 1132 | | |
1123 | 1133 | | |
| |||
0 commit comments