[sync] sync to upstream 03c4c4a#1597
Conversation
Signed-off-by: AlpinDale <alpindale@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a wide range of features, refactorings, and bug fixes. Key additions include support for Suffix Decoding, prefix caching for Mamba models, and ORCA endpoint load metrics. There are also significant refactorings, such as lazy loading for tool parsers and updates to the KV cache connector factory for better backward compatibility. I've identified one issue related to a configuration default in the Mamba model configuration where a comment and the code are inconsistent, which could be misleading.
| # Since Mamba1 does not have a chunk notion | ||
| # we use a default chunk size of 1024. | ||
| if chunk_size is None: | ||
| chunk_size = 2048 |
There was a problem hiding this comment.
There is a discrepancy between the comment and the code. The comment states that the default chunk size is 1024, but the code sets it to 2048. This can be misleading for developers. Please update the comment to match the code for consistency.
| # Since Mamba1 does not have a chunk notion | |
| # we use a default chunk size of 1024. | |
| if chunk_size is None: | |
| chunk_size = 2048 | |
| # we use a default chunk size of 2048. | |
| if chunk_size is None: | |
| chunk_size = 2048 |
No description provided.