Skip to content

Commit 12f0890

Browse files
committed
fix eagle3 acc
Signed-off-by: Bo Deng <deemod@nvidia.com>
1 parent 558260d commit 12f0890

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

tensorrt_llm/_torch/disaggregation/transceiver.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,12 @@ def _create_kv_slice(
178178
groups.append(np.array([], dtype=np.int64))
179179
continue
180180
block_ids = adapter.get_block_ids(req, idx, lg)
181+
# Limit to prompt_len blocks, matching C++ cacheFormatter behavior.
182+
# Extra blocks from num_extra_kv_tokens (speculative decoding) have
183+
# uninitialized KV data and must not be transferred.
184+
prompt_blocks = (req.prompt_len + tpb - 1) // tpb
185+
if block_ids.size > prompt_blocks:
186+
block_ids = block_ids[:prompt_blocks]
181187
window_size = lg.sliding_window_size
182188

183189
if window_size is not None:

0 commit comments

Comments
 (0)