Add W-TinyLFU eviction policy with cost-aware admission by a1amit · Pull Request #680 · zilliztech/GPTCache

a1amit · 2026-04-02T14:36:18Z

Summary

Adds a W-TinyLFU eviction policy (name="wtinylfu") that combines frequency-based admission filtering with optional cost-weighted eviction, targeting LLM caching workloads where response regeneration costs vary by orders of magnitude.

This addresses the roadmap item: "Support more complicated eviction policies".

Architecture (following Caffeine's design)

Window LRU (1% capacity): absorbs burst/recency traffic
TinyLFU admission gate: 4-bit Count-Min Sketch + Bloom filter doorkeeper rejects one-hit-wonders
Segmented main LRU (99%): probation (20%) + protected (80%) with promotion on hit

Cost-aware extension

When cost_aware=True (default), the admission decision multiplies each candidate's frequency estimate by its response token count (set_cost() API), biasing eviction toward retaining expensive entries. This is an additive extension — the existing put/get/policy interface is unchanged.

Files changed

File	Lines	Description
`gptcache/manager/eviction/wtinylfu_eviction.py`	222	Main policy (extends `EvictionBase`)
`gptcache/manager/eviction/count_min_sketch.py`	113	4-bit packed CMS with periodic aging
`gptcache/manager/eviction/doorkeeper.py`	66	Bloom filter for one-hit-wonder rejection
`gptcache/manager/eviction/segmented_lru.py`	99	Two-tier LRU with promotion/demotion
`gptcache/manager/eviction/manager.py`	+7	Factory registration
`tests/unit_tests/eviction/test_*.py`	4 files	32 unit tests
`examples/eviction/wtinylfu_eviction.py`	92	Usage examples
`examples/README.md`	+27	Eviction section added
`README.md`	+1	Updated roadmap checklist

Usage

from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.manager.eviction import EvictionBase

data_manager = get_data_manager(
    cache_base=CacheBase("sqlite"),
    vector_base=VectorBase("faiss", dimension=onnx.dimension),
    eviction_base=EvictionBase(
        "wtinylfu",
        maxsize=200,
        clean_size=50,
        cost_aware=True,       # enable token-cost weighting
        window_pct=1.0,        # window cache % (default: 1%)
        probation_pct=20.0,    # probation segment % of main (default: 20%)
    ),
)

Design decisions

No new dependencies: uses only numpy (already in requirements) + stdlib
Backward compatible: adds a new factory path without modifying existing policies
Configurable: all major parameters exposed with paper-recommended defaults
Hash-DoS defense: at frequency >= 6, admits with ~1/128 probability (matches Caffeine)

Benchmark results

Full benchmarks (synthetic Zipfian + LMSYS-Chat-1M real data, 3 trials each) are available in the deliverables folder(not part of the commit to reduce noise) (figures, raw JSON results). Key results vs. LRU baseline: +74% token savings on synthetic Zipfian workload (cs=50), +94% on real LMSYS-Chat-1M conversations (cs=50, threshold=0.80).

References

Einziger, Friedman, Manes. TinyLFU: A Highly Efficient Cache Admission Policy. ACM Trans. Storage, 2017. (arXiv)
Caffeine (Java) — reference implementation
Theine (Python) — design reference

Test plan

32 new unit tests passing (pytest tests/unit_tests/eviction/ -v)
All 38 eviction tests passing (32 new + 6 existing)
Usage examples added to examples/eviction/
examples/README.md updated
Root README.md roadmap updated

Note: Targeting main since dev has not been updated since September 2024 and recent PRs (#668, #669) merge directly to main.

sre-ci-robot · 2026-04-02T14:36:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: a1amit
To complete the pull request process, please assign cxie after the PR has been reviewed.
You can assign the PR to them by writing /assign @cxie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot · 2026-04-02T14:36:29Z

Welcome @a1amit! It looks like this is your first PR to zilliztech/GPTCache 🎉

Implement a W-TinyLFU eviction policy that combines frequency-based admission filtering with cost-weighted eviction decisions, targeting LLM caching workloads where response regeneration costs vary widely. Architecture (following Caffeine's design): - Window LRU (1%): absorbs burst traffic - TinyLFU admission gate: Count-Min Sketch + Bloom doorkeeper - Segmented main LRU (99%): probation (20%) + protected (80%) Cost-aware extension: when enabled, admission multiplies frequency by response token count, preferring to retain expensive entries. Components: - count_min_sketch.py: 4-bit packed counters with periodic aging - doorkeeper.py: Bloom filter to reject one-hit-wonders - segmented_lru.py: two-tier LRU with promotion/demotion - wtinylfu_eviction.py: orchestrator implementing EvictionBase Registered as name="wtinylfu" in the eviction factory. Tunable via window_pct, probation_pct, cost_aware, and CMS parameters. No new external dependencies (uses numpy + stdlib only). 32 unit tests covering all components and algorithm properties. Usage examples added to examples/eviction/. Signed-off-by: Amit Abramovich <amitnoa.av@gmail.com>

a1amit · 2026-04-02T20:37:18Z

I'd appreciate it if one of you could take a look at this when you have a moment.

@cxie @xiaofan-luan @SimFG

thanks and have a great weekend!

mergify · 2026-04-02T22:51:11Z

queue

☑️ Command disallowed due to command restrictions in the Mergify configuration.

Details

sender-permission >= write

a1amit · 2026-04-12T18:21:49Z

/assign @cxie

sre-ci-robot · 2026-04-12T18:23:41Z

@a1amit: GitHub didn't allow me to assign the following users: SimFG.

Note that only zilliztech members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

Details

In response to this:

/assign @SimFG

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ben-manes · 2026-04-13T07:08:49Z

Cost-aware was something that I was always interested in but lacked enough workload traces to verify a design. I wrote up my ideas which extends yours to possibly handle some edge cases better. A user tried it and it worked well in their workload, but did not share any data sets.
ben-manes/caffeine#1744

@NadavKeren has been working on the problem with his own design which could be of interest:
Latency-Aware Caching with Delayed Hits: From Bursty Traffic to Pipeline Architectures

a1amit · 2026-04-13T15:43:00Z

Hey @ben-manes , just got off work and saw your comment — thank you so much for taking the time to share your thoughts. Coming from the creator of Caffeine, that means a lot.

I'm plannig to go through everything you wrote carefully when I find the time and incorporate what I can. If you're interested I can tag you again when I'm done.

a1amit · 2026-04-14T10:48:01Z

Hey @ben-manes , following up on your suggestions. I implemented the EWMA + z-score cost normalization you proposed in discussion #1744, along with Caffeine's hill climber for adaptive window sizing.

What I built:

EWMA (alpha=0.05) tracks cost distribution, z-score maps to 0-15 discrete levels matching the CMS range
Caffeine-faithful hill climber (6.25% step, 0.98 decay, 5% restart threshold)
Tested on synthetic Zipfian + LMSYS-Chat-1M real conversations

Key finding — multiplicative vs lexicographic scoring:

I initially implemented your multiplicative proposal (freq * cost_score), but it regressed on LMSYS where cost and frequency are uncorrelated — cost dominated admission when most entries had freq 0-1. Switching to lexicographic scoring (freq * 16 + cost_score, frequency always dominates, cost breaks ties) fixed this while preserving the synthetic gains. This confirms your warning that the approach "could suffer from adversarial workloads or edge cases."

Results (cost-aware / no-cost ratio):

Dataset	cs=50	cs=100	cs=200
Synthetic (before)	1.07x	1.04x	0.98x (bug)
Synthetic (after)	1.10x	1.06x	1.03x (fixed)
LMSYS t=0.80 (after)	0.90x	0.98x	0.99x

On synthetic (where cost correlates with popularity): improved +3-5%, and fixed a bug at cs=200 where nocost was beating cost-aware. On LMSYS (uncorrelated): cost-aware is neutral — doesn't help but doesn't hurt, which I think is the correct behavior for the lexicographic approach.

Full evaluation report, raw data, and implementation on the branch: deliverables/ (EWMA report)

I also looked at @NadavKeren's pipeline cache work — the delayed hits concept is very relevant for LLM caches but would be an architectural change beyond the eviction policy. @Yiling-J's suggestion about a cost-aware window is another promising direction I haven't explored yet.

If you have any further pointers or suggestions, I'd be happy to hear them.

Thanks again for the pointers!

ben-manes · 2026-04-14T14:23:53Z

Thanks, very insightful! Nadav’s simulator might be easy to try if you rewrite the trace to his format.

- Add "Community Reception" section to the report documenting Ben Manes' engagement on PR zilliztech#680 (comment screenshots kept locally, not tracked) - Remove duplicate .pdf figures (PNG versions kept and used by report.tex) from figures_synthetic, figures_lmsys_080, figures_lmsys_085 - Regenerate report.pdf

NadavKeren · 2026-07-01T09:53:17Z

@a1amit @ben-manes Sorry about being a little bit late for the party, but I do have some insights regarding this.
First of all, if we allow concurrency in the cache, the issues arising from delayed hits cannot be ignored in this domain, so my algorithm should at least be partially integrated into it. It was also observed in other domains where the cost is significantly lower, and one must handle those pesky delayed hits correctly.

Secondly, I am actively looking into creating a more sophisticated caching algorithm for this specific domain (based on a very similar architecture of GPTCache, that I find a bit limiting in this case), and I can tell that the adaptation of W-TinyLFU directly into this domain isn't trivial and especially doesn't work as one-to-one code translation on the policy side; however, I do have solutions for these problems that are still work in progress.

If you want, we may continue this conversation directly; I would be happy to cooperate on this project.
@ben-manes You may find the issues on the policy side interesting, so we can catch up, and @a1amit, it is a nice project with a SoTA eviction policy behind it that needs some engineering wizardry to make it super efficient. If you want, feel free to contact me.

sre-ci-robot requested review from cxie and xiaofan-luan April 2, 2026 14:36

sre-ci-robot added the size/XXL label Apr 2, 2026

mergify Bot added the needs-dco label Apr 2, 2026

a1amit force-pushed the pr/wtinylfu-eviction branch from 7d2adb8 to c2b3768 Compare April 2, 2026 14:39

mergify Bot added dco-passed and removed needs-dco labels Apr 2, 2026

sre-ci-robot assigned cxie Apr 12, 2026

ben-manes mentioned this pull request Jun 11, 2026

Cost aware LFU moka-rs/moka#595

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add W-TinyLFU eviction policy with cost-aware admission#680

Add W-TinyLFU eviction policy with cost-aware admission#680
a1amit wants to merge 1 commit into
zilliztech:mainfrom
a1amit:pr/wtinylfu-eviction

a1amit commented Apr 2, 2026 •

edited

Loading

Uh oh!

sre-ci-robot commented Apr 2, 2026

Uh oh!

sre-ci-robot commented Apr 2, 2026

Uh oh!

a1amit commented Apr 2, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 2, 2026

Uh oh!

a1amit commented Apr 12, 2026

Uh oh!

sre-ci-robot commented Apr 12, 2026

Uh oh!

ben-manes commented Apr 13, 2026

Uh oh!

a1amit commented Apr 13, 2026

Uh oh!

a1amit commented Apr 14, 2026

Uh oh!

ben-manes commented Apr 14, 2026

Uh oh!

NadavKeren commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

a1amit commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture (following Caffeine's design)

Cost-aware extension

Files changed

Usage

Design decisions

Benchmark results

References

Test plan

Uh oh!

sre-ci-robot commented Apr 2, 2026

Uh oh!

sre-ci-robot commented Apr 2, 2026

Uh oh!

a1amit commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Apr 2, 2026

☑️ Command disallowed due to command restrictions in the Mergify configuration.

Uh oh!

a1amit commented Apr 12, 2026

Uh oh!

sre-ci-robot commented Apr 12, 2026

Uh oh!

ben-manes commented Apr 13, 2026

Uh oh!

a1amit commented Apr 13, 2026

Uh oh!

a1amit commented Apr 14, 2026

Uh oh!

ben-manes commented Apr 14, 2026

Uh oh!

NadavKeren commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

a1amit commented Apr 2, 2026 •

edited

Loading

a1amit commented Apr 2, 2026 •

edited

Loading

NadavKeren commented Jul 1, 2026 •

edited

Loading