Add W-TinyLFU eviction policy with cost-aware admission#680
Open
a1amit wants to merge 1 commit intozilliztech:mainfrom
Open
Add W-TinyLFU eviction policy with cost-aware admission#680a1amit wants to merge 1 commit intozilliztech:mainfrom
a1amit wants to merge 1 commit intozilliztech:mainfrom
Conversation
Collaborator
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: a1amit The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Collaborator
|
Welcome @a1amit! It looks like this is your first PR to zilliztech/GPTCache 🎉 |
Implement a W-TinyLFU eviction policy that combines frequency-based admission filtering with cost-weighted eviction decisions, targeting LLM caching workloads where response regeneration costs vary widely. Architecture (following Caffeine's design): - Window LRU (1%): absorbs burst traffic - TinyLFU admission gate: Count-Min Sketch + Bloom doorkeeper - Segmented main LRU (99%): probation (20%) + protected (80%) Cost-aware extension: when enabled, admission multiplies frequency by response token count, preferring to retain expensive entries. Components: - count_min_sketch.py: 4-bit packed counters with periodic aging - doorkeeper.py: Bloom filter to reject one-hit-wonders - segmented_lru.py: two-tier LRU with promotion/demotion - wtinylfu_eviction.py: orchestrator implementing EvictionBase Registered as name="wtinylfu" in the eviction factory. Tunable via window_pct, probation_pct, cost_aware, and CMS parameters. No new external dependencies (uses numpy + stdlib only). 32 unit tests covering all components and algorithm properties. Usage examples added to examples/eviction/. Signed-off-by: Amit Abramovich <amitnoa.av@gmail.com>
7d2adb8 to
c2b3768
Compare
Author
|
I'd appreciate it if one of you could take a look at this when you have a moment. thanks and have a great weekend! |
☑️ Command disallowed due to command restrictions in the Mergify configuration.Details
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a W-TinyLFU eviction policy (
name="wtinylfu") that combines frequency-based admission filtering with optional cost-weighted eviction, targeting LLM caching workloads where response regeneration costs vary by orders of magnitude.This addresses the roadmap item: "Support more complicated eviction policies".
Architecture (following Caffeine's design)
Cost-aware extension
When
cost_aware=True(default), the admission decision multiplies each candidate's frequency estimate by its response token count (set_cost()API), biasing eviction toward retaining expensive entries. This is an additive extension — the existingput/get/policyinterface is unchanged.Files changed
gptcache/manager/eviction/wtinylfu_eviction.pyEvictionBase)gptcache/manager/eviction/count_min_sketch.pygptcache/manager/eviction/doorkeeper.pygptcache/manager/eviction/segmented_lru.pygptcache/manager/eviction/manager.pytests/unit_tests/eviction/test_*.pyexamples/eviction/wtinylfu_eviction.pyexamples/README.mdREADME.mdUsage
Design decisions
numpy(already in requirements) + stdlibBenchmark results
Full benchmarks (synthetic Zipfian + LMSYS-Chat-1M real data, 3 trials each) are available in the deliverables folder(not part of the commit to reduce noise) (figures, raw JSON results). Key results vs. LRU baseline: +74% token savings on synthetic Zipfian workload (cs=50), +94% on real LMSYS-Chat-1M conversations (cs=50, threshold=0.80).
References
Test plan
pytest tests/unit_tests/eviction/ -v)examples/eviction/examples/README.mdupdatedREADME.mdroadmap updated