hetero pd2#1980
Conversation
| remote_engine_id, len(meta.local_block_ids), | ||
| len(meta.remote_block_ids)) | ||
| if self.use_host_buffer: | ||
| is_hetero = True |
There was a problem hiding this comment.
whats the plan to set this variable
There was a problem hiding this comment.
need the flag to enable the _recving_metadata
There was a problem hiding this comment.
are u going to add it as env flag ?
There was a problem hiding this comment.
check if this can be moved to hpu_model_runner
| self.vllm_config = vllm_config | ||
| self.block_size = vllm_config.cache_config.block_size | ||
|
|
||
| self.block_factor = 8 # A100.block_size/G2.block_size |
There was a problem hiding this comment.
is it going to be hardcoded value ?
There was a problem hiding this comment.
it's hardcode for now, maybe it's ok since this number won't change
There was a problem hiding this comment.
it's better to check block size on both
There was a problem hiding this comment.
we can check if the remote.block_size is expected. we don't know the remote.block_size here because the handshake occures afterwards.
| "Rank %s, get_finished: %s requests done sending " | ||
| "and %s requests done recving", self.tp_rank, | ||
| len(done_sending), len(done_recving)) | ||
| #import remote_pdb; remote_pdb.set_trace() |
There was a problem hiding this comment.
can you add check remote is gpu attention?
| if num_local_blocks < num_remote_blocks: | ||
| remote_block_ids = remote_block_ids[-num_local_blocks:] | ||
| #if num_local_blocks < num_remote_blocks: | ||
| # remote_block_ids = remote_block_ids[-num_local_blocks:] |
No description provided.