Skip to content

feat: Add prefix cache benchmark#268

Merged
copybara-service[bot] merged 1 commit intomainfrom
yuyan-prefix-cache-benchmark
May 7, 2025
Merged

feat: Add prefix cache benchmark#268
copybara-service[bot] merged 1 commit intomainfrom
yuyan-prefix-cache-benchmark

Conversation

@yuyanpeng-google
Copy link
Copy Markdown
Collaborator

This commit introduces a new benchmark to test the performance of prefix caching in JetStream.

The benchmark (benchmark_prefix_cache.sh) allows testing with various prompt lengths and common prefix lengths. It utilizes a new mock dataset generated by load_mock_prefix_cache_test_input_requests in benchmark_serving.py, which creates prompts sharing common prefixes of varying lengths based on a normal distribution.

Key changes include:

  • New script benchmarks/benchmark_prefix_cache.sh to orchestrate prefix cache benchmark runs.
  • Added PrefixCacheTestTokenizer for simple character-to-ordinal tokenization, suitable for controlled prefix testing.
  • Implemented load_mock_prefix_cache_test_input_requests in benchmark_serving.py to generate test data with shared prefixes.
  • Added prefix_cache_test as a dataset option and --prefix-cache-test-common-len argument to benchmark_serving.py.
  • Updated benchmarks/README.md with instructions on how to run the new prefix cache benchmark.

This commit introduces a new benchmark to test the performance of prefix caching in JetStream.

The benchmark (`benchmark_prefix_cache.sh`) allows testing with various prompt lengths and common prefix lengths. It utilizes a new mock dataset generated by `load_mock_prefix_cache_test_input_requests` in `benchmark_serving.py`, which creates prompts sharing common prefixes of varying lengths based on a normal distribution.

Key changes include:
- New script `benchmarks/benchmark_prefix_cache.sh` to orchestrate prefix cache benchmark runs.
- Added `PrefixCacheTestTokenizer` for simple character-to-ordinal tokenization, suitable for controlled prefix testing.
- Implemented `load_mock_prefix_cache_test_input_requests` in `benchmark_serving.py` to generate test data with shared prefixes.
- Added `prefix_cache_test` as a dataset option and `--prefix-cache-test-common-len` argument to `benchmark_serving.py`.
- Updated `benchmarks/README.md` with instructions on how to run the new prefix cache benchmark.
@yuyanpeng-google yuyanpeng-google force-pushed the yuyan-prefix-cache-benchmark branch from 3fe08db to bbfb5bd Compare May 7, 2025 09:56
Copy link
Copy Markdown
Collaborator

@vipannalla vipannalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@github-actions github-actions Bot added the pull ready This label is needed if we want the copybara service to auto sync it to g3. label May 7, 2025
@copybara-service copybara-service Bot merged commit 4aafd76 into main May 7, 2025
6 checks passed
@copybara-service copybara-service Bot deleted the yuyan-prefix-cache-benchmark branch May 7, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pull ready This label is needed if we want the copybara service to auto sync it to g3.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants