feat: support in-batch prefix cache.#1240
feat: support in-batch prefix cache.#1240Clement-Wang26 wants to merge 1 commit intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new configuration option, enable_in_batch_prefix_cache, to control whether admitted prefill full blocks are cached into the prefix cache. This involves adding the flag definition, declaration, and integration into the Options struct across various core components and schedulers. A new overloaded cache method is added to BlockManagerPool and KVCacheManager to support this functionality, along with new tests for BlockManagerPoolTest and ContinuousSchedulerTest. Review comments indicate that the use of FLAGS_ global variables for this new setting, as well as enable_prefix_cache, deviates from the style guide which prefers configuration through the Options struct. Additionally, the new flag is registered in two different help categories, which could lead to confusion, and a DCHECK in block_manager_pool.cpp might need to be a CHECK for release build robustness.
| } | ||
|
|
||
| void BlockManagerPool::cache(Sequence* sequence, size_t num_tokens) { | ||
| DCHECK(sequence != nullptr); |
There was a problem hiding this comment.
Using DCHECK for sequence != nullptr might be insufficient if sequence being nullptr would lead to a critical error or crash in a release build. DCHECKs are typically compiled out in release builds. If this condition is vital for correctness in production, consider changing it to a CHECK to ensure it's always active.
50d59d6 to
0418622
Compare
0418622 to
108984f
Compare
…ource#1240) Add the ILU ProcessGroup constructor overload that accepts local_rank and explicit group_ranks, so the DiT process-group creation path matches the expected signature and builds successfully.
No description provided.