Skip to content

Add W-TinyLFU eviction policy with cost-aware admission#680

Open
a1amit wants to merge 1 commit intozilliztech:mainfrom
a1amit:pr/wtinylfu-eviction
Open

Add W-TinyLFU eviction policy with cost-aware admission#680
a1amit wants to merge 1 commit intozilliztech:mainfrom
a1amit:pr/wtinylfu-eviction

Conversation

@a1amit
Copy link
Copy Markdown

@a1amit a1amit commented Apr 2, 2026

Summary

Adds a W-TinyLFU eviction policy (name="wtinylfu") that combines frequency-based admission filtering with optional cost-weighted eviction, targeting LLM caching workloads where response regeneration costs vary by orders of magnitude.

This addresses the roadmap item: "Support more complicated eviction policies".

Architecture (following Caffeine's design)

  • Window LRU (1% capacity): absorbs burst/recency traffic
  • TinyLFU admission gate: 4-bit Count-Min Sketch + Bloom filter doorkeeper rejects one-hit-wonders
  • Segmented main LRU (99%): probation (20%) + protected (80%) with promotion on hit

Cost-aware extension

When cost_aware=True (default), the admission decision multiplies each candidate's frequency estimate by its response token count (set_cost() API), biasing eviction toward retaining expensive entries. This is an additive extension — the existing put/get/policy interface is unchanged.

Files changed

File Lines Description
gptcache/manager/eviction/wtinylfu_eviction.py 222 Main policy (extends EvictionBase)
gptcache/manager/eviction/count_min_sketch.py 113 4-bit packed CMS with periodic aging
gptcache/manager/eviction/doorkeeper.py 66 Bloom filter for one-hit-wonder rejection
gptcache/manager/eviction/segmented_lru.py 99 Two-tier LRU with promotion/demotion
gptcache/manager/eviction/manager.py +7 Factory registration
tests/unit_tests/eviction/test_*.py 4 files 32 unit tests
examples/eviction/wtinylfu_eviction.py 92 Usage examples
examples/README.md +27 Eviction section added
README.md +1 Updated roadmap checklist

Usage

from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.manager.eviction import EvictionBase

data_manager = get_data_manager(
    cache_base=CacheBase("sqlite"),
    vector_base=VectorBase("faiss", dimension=onnx.dimension),
    eviction_base=EvictionBase(
        "wtinylfu",
        maxsize=200,
        clean_size=50,
        cost_aware=True,       # enable token-cost weighting
        window_pct=1.0,        # window cache % (default: 1%)
        probation_pct=20.0,    # probation segment % of main (default: 20%)
    ),
)

Design decisions

  • No new dependencies: uses only numpy (already in requirements) + stdlib
  • Backward compatible: adds a new factory path without modifying existing policies
  • Configurable: all major parameters exposed with paper-recommended defaults
  • Hash-DoS defense: at frequency >= 6, admits with ~1/128 probability (matches Caffeine)

Benchmark results

Full benchmarks (synthetic Zipfian + LMSYS-Chat-1M real data, 3 trials each) are available in the deliverables folder(not part of the commit to reduce noise) (figures, raw JSON results). Key results vs. LRU baseline: +74% token savings on synthetic Zipfian workload (cs=50), +94% on real LMSYS-Chat-1M conversations (cs=50, threshold=0.80).

References

Test plan

  • 32 new unit tests passing (pytest tests/unit_tests/eviction/ -v)
  • All 38 eviction tests passing (32 new + 6 existing)
  • Usage examples added to examples/eviction/
  • examples/README.md updated
  • Root README.md roadmap updated

Note: Targeting main since dev has not been updated since September 2024 and recent PRs (#668, #669) merge directly to main.

@sre-ci-robot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: a1amit
To complete the pull request process, please assign cxie after the PR has been reviewed.
You can assign the PR to them by writing /assign @cxie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link
Copy Markdown
Collaborator

Welcome @a1amit! It looks like this is your first PR to zilliztech/GPTCache 🎉

Implement a W-TinyLFU eviction policy that combines frequency-based
admission filtering with cost-weighted eviction decisions, targeting
LLM caching workloads where response regeneration costs vary widely.

Architecture (following Caffeine's design):
- Window LRU (1%): absorbs burst traffic
- TinyLFU admission gate: Count-Min Sketch + Bloom doorkeeper
- Segmented main LRU (99%): probation (20%) + protected (80%)

Cost-aware extension: when enabled, admission multiplies frequency
by response token count, preferring to retain expensive entries.

Components:
- count_min_sketch.py: 4-bit packed counters with periodic aging
- doorkeeper.py: Bloom filter to reject one-hit-wonders
- segmented_lru.py: two-tier LRU with promotion/demotion
- wtinylfu_eviction.py: orchestrator implementing EvictionBase

Registered as name="wtinylfu" in the eviction factory.
Tunable via window_pct, probation_pct, cost_aware, and CMS parameters.
No new external dependencies (uses numpy + stdlib only).

32 unit tests covering all components and algorithm properties.
Usage examples added to examples/eviction/.

Signed-off-by: Amit Abramovich <amitnoa.av@gmail.com>
@a1amit a1amit force-pushed the pr/wtinylfu-eviction branch from 7d2adb8 to c2b3768 Compare April 2, 2026 14:39
@mergify mergify bot added dco-passed and removed needs-dco labels Apr 2, 2026
@a1amit
Copy link
Copy Markdown
Author

a1amit commented Apr 2, 2026

I'd appreciate it if one of you could take a look at this when you have a moment.

@cxie @xiaofan-luan @SimFG

thanks and have a great weekend!

@mergify
Copy link
Copy Markdown

mergify bot commented Apr 2, 2026

queue

☑️ Command disallowed due to command restrictions in the Mergify configuration.

Details
  • sender-permission >= write

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants