feat: Add prefix cache benchmark#268
Merged
copybara-service[bot] merged 1 commit intomainfrom May 7, 2025
Merged
Conversation
This commit introduces a new benchmark to test the performance of prefix caching in JetStream. The benchmark (`benchmark_prefix_cache.sh`) allows testing with various prompt lengths and common prefix lengths. It utilizes a new mock dataset generated by `load_mock_prefix_cache_test_input_requests` in `benchmark_serving.py`, which creates prompts sharing common prefixes of varying lengths based on a normal distribution. Key changes include: - New script `benchmarks/benchmark_prefix_cache.sh` to orchestrate prefix cache benchmark runs. - Added `PrefixCacheTestTokenizer` for simple character-to-ordinal tokenization, suitable for controlled prefix testing. - Implemented `load_mock_prefix_cache_test_input_requests` in `benchmark_serving.py` to generate test data with shared prefixes. - Added `prefix_cache_test` as a dataset option and `--prefix-cache-test-common-len` argument to `benchmark_serving.py`. - Updated `benchmarks/README.md` with instructions on how to run the new prefix cache benchmark.
3fe08db to
bbfb5bd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a new benchmark to test the performance of prefix caching in JetStream.
The benchmark (
benchmark_prefix_cache.sh) allows testing with various prompt lengths and common prefix lengths. It utilizes a new mock dataset generated byload_mock_prefix_cache_test_input_requestsinbenchmark_serving.py, which creates prompts sharing common prefixes of varying lengths based on a normal distribution.Key changes include:
benchmarks/benchmark_prefix_cache.shto orchestrate prefix cache benchmark runs.PrefixCacheTestTokenizerfor simple character-to-ordinal tokenization, suitable for controlled prefix testing.load_mock_prefix_cache_test_input_requestsinbenchmark_serving.pyto generate test data with shared prefixes.prefix_cache_testas a dataset option and--prefix-cache-test-common-lenargument tobenchmark_serving.py.benchmarks/README.mdwith instructions on how to run the new prefix cache benchmark.