revert the adaptation cache as it didn't improve performance

tstromberg · tstromberg · commit 9660cdaf35bd · 2025-12-23T13:22:24.000-05:00
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# multicache - Adaptive Multi-Tier Cache
+# multicache - High-Performance Multi-Tier Cache
 
 <img src="media/logo-small.png" alt="multicache logo" width="256">
 
@@ -8,22 +8,15 @@
 
 <br clear="right">
 
-multicache is a high-performance cache for Go that automatically adapts to your workload. It combines **multiple eviction strategies** that switch based on access patterns, with an optional **multi-tier architecture** for persistence.
+multicache is a high-performance cache for Go implementing the **S3-FIFO** algorithm from the SOSP'23 paper ["FIFO queues are all you need for cache eviction"](https://s3fifo.com/). It combines **best-in-class hit rates**, **multi-threaded** scalability, and an optional **multi-tier architecture** for persistence.
 
-## Why "multi"?
-
-### Multiple Adaptive Strategies
+**Our philosophy**: Hit rate matters most (cache misses are expensive), then throughput (handle load), then single-threaded latency. We aim to excel at all three.
 
-multicache monitors ghost hit rates (how often evicted keys return) and automatically selects the optimal eviction strategy:
+## Why "multi"?
 
-| Mode | Trigger | Strategy | Workload |
-|------|---------|----------|----------|
-| 0 | Ghost rate <1% | Pure recency | Scan-heavy (unique keys) |
-| 1 | Ghost rate 1-22% | Balanced S3-FIFO | Mixed access patterns |
-| 2 | Ghost rate 7-12% | Frequency-biased | Repeated hot keys |
-| 3 | Ghost rate ≥23% | Clock-like second-chance | High temporal locality |
+### Multi-Threaded Performance
 
-No tuning required - the cache learns your workload and adapts.
+Designed for high-concurrency workloads with dynamic sharding (up to 2048 shards) that scales with `GOMAXPROCS`. At 32 threads, multicache delivers **185M+ QPS** for GetOrSet operations.
 
 ### Multi-Tier Architecture
 
@@ -35,7 +28,7 @@ Stack fast in-memory caching with durable persistence:
 └─────────────────┬───────────────────┘
                   │
 ┌─────────────────▼───────────────────┐
-│    Memory Cache (microseconds)      │  ← L1: S3-FIFO with adaptive modes
+│    Memory Cache (microseconds)      │  ← L1: S3-FIFO eviction
 └─────────────────┬───────────────────┘
                   │ async write / sync read
 ┌─────────────────▼───────────────────┐
@@ -54,12 +47,12 @@ All backends support optional S2 or Zstd compression via [`pkg/store/compress`](
 
 ## Features
 
-- **Best-in-class performance** - 7ns reads, 100M+ QPS single-threaded
-- **Adaptive S3-FIFO eviction** - Better hit-rates than LRU ([learn more](https://s3fifo.com/))
+- **Best-in-class hit rates** - S3-FIFO beats LRU by 5%+ on real traces ([learn more](https://s3fifo.com/))
+- **Multi-threaded throughput** - 185M+ QPS at 32 threads, scales with core count
+- **Low latency** - 7ns reads, 100M+ QPS single-threaded, zero-allocation updates
 - **Thundering herd prevention** - `GetSet` deduplicates concurrent loads
 - **Per-item TTL** - Optional expiration
 - **Graceful degradation** - Cache works even if persistence fails
-- **Zero allocation updates** - Minimal GC pressure
 
 ## Usage
 
@@ -108,7 +101,7 @@ cache, _ := multicache.NewTiered(p)
 
 ## Performance against the Competition
 
-multicache prioritizes high hit-rates and low read latency. We have our own built in `make bench` that asserts cache dominance:
+multicache prioritizes **hit rate** first, **multi-threaded throughput** second, and **single-threaded latency** third—but aims to excel at all three. We have our own built in `make bench` that asserts cache dominance:
 
 ```
 >>> TestLatencyNoEviction: Latency - No Evictions (Set cycles within cache size) (go test -run=TestLatencyNoEviction -v)
@@ -202,24 +195,16 @@ Want even more comprehensive benchmarks? See https://github.com/tstromberg/gocac
 
 ## Implementation Notes
 
-### S3-FIFO Enhancements
+multicache implements the S3-FIFO algorithm from the SOSP'23 paper with these optimizations for production use:
 
-multicache implements the S3-FIFO algorithm from SOSP'23 with these optimizations:
-
-1. **Dynamic Sharding** - 1-2048 independent shards for concurrent workloads
-2. **Bloom Filter Ghosts** - Two rotating Bloom filters (vs storing keys), 10-100x less memory
-3. **Lazy Ghost Checks** - Only check ghosts at capacity, saving 5-9% latency during warmup
+1. **Dynamic Sharding** - Up to 2048 shards (capped at 2× GOMAXPROCS) for concurrent workloads
+2. **Bloom Filter Ghosts** - Two rotating Bloom filters instead of storing keys, 10-100× less memory
+3. **Lazy Ghost Checks** - Only check ghosts at capacity, saving latency during warmup
 4. **Intrusive Lists** - Zero-allocation queue operations
 5. **Fast-path Hashing** - Specialized `int`/`string` hashing via wyhash
+6. **Higher Frequency Cap** - Max freq=7 (vs paper's 3) for better hot/warm discrimination
 
-### Adaptive Mode Details
-
-Mode switching uses **hysteresis** to prevent oscillation. Mode 2 (frequency-biased) requires 7-12% ghost rate to enter, but stays active while rate is 5-22%.
-
-Additional tuning beyond the paper:
-- **Adaptive queue sizing** - Small queue is 20% for caches ≤32K, 15% for ≤128K, 10% for larger
-- **Ghost frequency boost** - Returning items start with freq=1 instead of 0
-- **Higher frequency cap** - Max freq=7 (vs 3) for better hot/warm discrimination
+The core algorithm follows the paper closely: items enter the small queue, get promoted to main after 2+ accesses, and evicted items are tracked in a ghost queue to inform future admissions.
 
 ## License
 
diff --git a/benchmarks/cmd/mem_multicache/main.go b/benchmarks/cmd/mem_multicache/main.go
@@ -20,8 +20,7 @@ func main() {
 	valSize := flag.Int("valSize", 1024, "value size")
 	flag.Parse()
 
-	//nolint:revive // explicit GC required for accurate memory benchmarking
-	runtime.GC()
+	runtime.GC() //nolint:revive // call-to-gc: explicit GC required for accurate memory benchmarking
 	debug.FreeOSMemory()
 
 	cache := multicache.New[string, []byte](multicache.Size(*capacity))
@@ -37,11 +36,9 @@ func main() {
 
 	keepAlive = cache
 
-	//nolint:revive // explicit GC required for accurate memory benchmarking
-	runtime.GC()
+	runtime.GC() //nolint:revive // call-to-gc: explicit GC required for accurate memory benchmarking
 	time.Sleep(100 * time.Millisecond)
-	//nolint:revive // explicit GC required for accurate memory benchmarking
-	runtime.GC()
+	runtime.GC() //nolint:revive // call-to-gc: explicit GC required for accurate memory benchmarking
 	debug.FreeOSMemory()
 
 	var mem runtime.MemStats
diff --git a/s3fifo.go b/s3fifo.go
@@ -3,17 +3,13 @@ package multicache
 import (
 	"fmt"
 	"math/bits"
-	"os"
 	"runtime"
 	"sync"
 	"sync/atomic"
 	"time"
 	"unsafe"
 )
 
-// debugAdaptive enables adaptive mode debug output when MULTICACHE_DEBUG=1.
-var debugAdaptive = os.Getenv("MULTICACHE_DEBUG") == "1"
-
 // wyhash constants for fast string hashing.
 const (
 	wyp0 = 0xa0761d6478bd642f
@@ -65,8 +61,8 @@ const maxFreq = 7
 // Each shard is an independent S3-FIFO instance with its own queues and lock.
 //
 // Algorithm per shard:
-// - Small queue (S): 10-20% of shard capacity, for new entries
-// - Main queue (M): 80-90% of shard capacity, for promoted entries
+// - Small queue (S): ~10% of shard capacity, for new entries
+// - Main queue (M): ~90% of shard capacity, for promoted entries
 // - Ghost queue (G): Tracks evicted keys (no data)
 //
 // On cache miss:
@@ -115,25 +111,13 @@ type shard[K comparable, V any] struct {
 	hasher      func(K) uint64
 
 	capacity int
-	smallCap int
 
 	// Free list for reducing allocations
 	freeEntries *entry[K, V]
 
 	// Warmup: during initial fill, admit everything without eviction checks
 	warmupComplete bool
 
-	// Adaptive mode detection based on ghost hit rate with hysteresis:
-	// - Mode 0 (scan-heavy, ghost rate < 1%): pure recency, skip ghost tracking
-	// - Mode 1 (balanced): lenient promotion (freq > 0)
-	// - Mode 2 (frequency-heavy): strict promotion (freq > 1)
-	//   Entry: ghost rate 7-12%. Stay: 5-22%. Exit: <5% or >=23%
-	// - Mode 3 (clock-like, ghost rate >= 23%): all items to main, second-chance
-	insertions            uint32
-	ghostHits             uint32
-	adaptiveMode          uint8  // 0=scan/recency, 1=balanced, 2=frequency-heavy
-	adaptiveMinInsertions uint32 // min insertions before adaptive kicks in
-
 	// Parent pointer for global capacity tracking
 	parent *s3fifo[K, V]
 }
@@ -235,18 +219,7 @@ func newS3FIFO[K comparable, V any](cfg *config) *s3fifo[K, V] {
 		cache.keyIsString = true
 	}
 
-	// Adaptive small queue ratio: larger for smaller caches to allow more frequency accumulation.
-	// S3-FIFO paper recommends 10%, but smaller caches need more room for frequency estimation.
-	var smallRatio float64
-	switch {
-	case capacity <= 32768: // ≤32K: 20% small queue
-		smallRatio = 0.20
-	case capacity <= 131072: // ≤128K: 15% small queue
-		smallRatio = 0.15
-	default: // >128K: 10% small queue (paper recommendation)
-		smallRatio = 0.10
-	}
-	// Ghost queue at 100% matches reference implementation for better hit rate.
+	// Ghost queue at 100% of shard capacity matches reference implementation.
 	const ghostRatio = 1.0
 
 	// Prepare hasher for Bloom filter
@@ -284,20 +257,15 @@ func newS3FIFO[K comparable, V any](cfg *config) *s3fifo[K, V] {
 	}
 
 	for i := range nshards {
-		smallCap := max(int(float64(shardCap)*smallRatio), 1)
 		ghostCap := max(int(float64(shardCap)*ghostRatio), 1)
-		minIns := uint32(max(shardCap, 256)) //nolint:gosec // G115: shardCap bounded by capacity/nshards
 		cache.shards[i] = &shard[K, V]{
-			capacity:              shardCap,
-			smallCap:              smallCap,
-			ghostCap:              ghostCap,
-			entries:               make(map[K]*entry[K, V], shardCap),
-			ghostActive:           newBloomFilter(ghostCap, 0.00001),
-			ghostAging:            newBloomFilter(ghostCap, 0.00001),
-			hasher:                hasher,
-			adaptiveMinInsertions: minIns,
-			adaptiveMode:          1, // Start in balanced mode
-			parent:                cache,
+			capacity:    shardCap,
+			ghostCap:    ghostCap,
+			entries:     make(map[K]*entry[K, V], shardCap),
+			ghostActive: newBloomFilter(ghostCap, 0.00001),
+			ghostAging:  newBloomFilter(ghostCap, 0.00001),
+			hasher:      hasher,
+			parent:      cache,
 		}
 	}
 
@@ -497,78 +465,10 @@ func (s *shard[K, V]) set(key K, value V, expiryNano int64) {
 	// Lazily check ghost only if at capacity (when eviction matters)
 	// This saves 2× bloom filter checks + hash computation when cache isn't full
 	if full {
-		// Track insertions for adaptive mode detection
-		s.insertions++
-
-		// In scan/recency mode (mode=0), skip ghost checks entirely
-		switch s.adaptiveMode {
-		case 0:
-			ent.inSmall = true
-		case 3:
-			// Clock mode: all items go directly to main queue with freq=1
-			// This mimics clock's second-chance behavior for high-recency workloads
-			ent.inSmall = false
-			ent.freq.Store(1)
-			// Still track ghost for mode detection
-			h := s.hasher(key)
-			if s.ghostActive.Contains(h) || s.ghostAging.Contains(h) {
-				s.ghostHits++
-			}
-		default:
-			// Check if key is in ghost (Bloom filter)
-			h := s.hasher(key)
-			inGhost := s.ghostActive.Contains(h) || s.ghostAging.Contains(h)
-			ent.inSmall = !inGhost
-
-			// Track ghost hits and apply frequency boost
-			if inGhost {
-				s.ghostHits++
-				// Ghost Freq Boost: Items returning from ghost start with freq=1
-				// This rewards items that proved popularity, but actual re-accesses build more frequency
-				ent.freq.Store(1)
-			}
-		}
-
-		// Adaptive mode detection: check every 256 insertions after warmup
-		// Mode 0: scan-heavy (ghost rate < 1%) - pure recency
-		// Mode 1: balanced - lenient promotion (freq > 0)
-		// Mode 2: frequency-heavy - strict promotion (freq > 1)
-		// Mode 3: clock-like (ghost rate >= 23%) - all items to main
-		//
-		// Hysteresis prevents oscillation: entry thresholds differ from exit.
-		// Enter mode 2: 7-12%. Stay in mode 2: 5-22%. Exit: <5% or >=23%.
-		if s.insertions >= s.adaptiveMinInsertions && s.insertions&0xFF == 0 {
-			rate := s.ghostHits * 100 / s.insertions // percentage
-			prev := s.adaptiveMode
-
-			// Apply hysteresis: current mode affects switching thresholds
-			switch {
-			case rate < 1:
-				s.adaptiveMode = 0 // Scan-heavy: use pure recency
-			case rate >= 23:
-				s.adaptiveMode = 3 // Very high recency: clock-like behavior
-			case s.adaptiveMode == 2:
-				// In mode 2: use wider band (5-22%) to prevent oscillation
-				if rate < 5 {
-					s.adaptiveMode = 1
-				}
-				// rate >= 23 handled above
-			default:
-				// Not in mode 2: use narrow entry band (7-12%)
-				if rate >= 7 && rate <= 12 {
-					s.adaptiveMode = 2
-				} else {
-					s.adaptiveMode = 1
-				}
-			}
-			if debugAdaptive && s.adaptiveMode != prev {
-				fmt.Printf("[multicache] mode %d→%d (ghost=%d%%, cap=%d)\n",
-					prev, s.adaptiveMode, rate, s.capacity)
-			}
-			// Reset counters for next period
-			s.insertions = 0
-			s.ghostHits = 0
-		}
+		// Pure S3-FIFO: check ghost and route accordingly
+		h := s.hasher(key)
+		inGhost := s.ghostActive.Contains(h) || s.ghostAging.Contains(h)
+		ent.inSmall = !inGhost
 
 		// Evict one entry to make room
 		if s.main.len > 0 && s.small.len <= s.capacity/10 {
@@ -619,37 +519,27 @@ func (s *shard[K, V]) delete(key K) {
 }
 
 // evictFromSmall evicts an entry from the small queue.
-// Promotion threshold adapts based on workload characteristics:
-//   - Mode 0 (scan): always promote (pure recency).
-//   - Mode 1 (balanced): need freq > 0 (one access).
-//   - Mode 2 (frequency): need freq > 1 (two accesses).
+// Pure S3-FIFO: promote if freq >= 2 (accessed at least twice).
 func (s *shard[K, V]) evictFromSmall() {
 	mainCap := (s.capacity * 9) / 10 // 90% for main queue
 
-	// Adaptive promotion threshold based on detected workload type
-	thresh := uint32(s.adaptiveMode) // 0, 1, or 2
-
 	for s.small.len > 0 {
 		e := s.small.head
 		s.small.remove(e)
 
-		if e.freq.Load() < thresh {
-			// Not accessed enough - evict
+		if e.freq.Load() < 2 {
+			// Not accessed enough - evict and track in ghost
 			k := e.key
 			delete(s.entries, k)
 
-			// Track in ghost queue only if not in scan/recency mode
-			// (scan-heavy workloads don't benefit from ghost tracking)
-			if s.adaptiveMode > 0 {
-				h := s.hasher(k)
-				if !s.ghostActive.Contains(h) {
-					s.ghostActive.Add(h)
-				}
-				// Rotate filters when active is full (provides approximate FIFO)
-				if s.ghostActive.entries >= s.ghostCap {
-					s.ghostAging.Reset()
-					s.ghostActive, s.ghostAging = s.ghostAging, s.ghostActive
-				}
+			h := s.hasher(k)
+			if !s.ghostActive.Contains(h) {
+				s.ghostActive.Add(h)
+			}
+			// Rotate filters when active is full (provides approximate FIFO)
+			if s.ghostActive.entries >= s.ghostCap {
+				s.ghostAging.Reset()
+				s.ghostActive, s.ghostAging = s.ghostAging, s.ghostActive
 			}
 
 			s.putEntry(e)