Skip to content

Commit 9660cda

Browse files
committed
revert the adaptation cache as it didn't improve performance
1 parent ffd5d39 commit 9660cda

File tree

3 files changed

+45
-173
lines changed

3 files changed

+45
-173
lines changed

README.md

Lines changed: 17 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# multicache - Adaptive Multi-Tier Cache
1+
# multicache - High-Performance Multi-Tier Cache
22

33
<img src="media/logo-small.png" alt="multicache logo" width="256">
44

@@ -8,22 +8,15 @@
88

99
<br clear="right">
1010

11-
multicache is a high-performance cache for Go that automatically adapts to your workload. It combines **multiple eviction strategies** that switch based on access patterns, with an optional **multi-tier architecture** for persistence.
11+
multicache is a high-performance cache for Go implementing the **S3-FIFO** algorithm from the SOSP'23 paper ["FIFO queues are all you need for cache eviction"](https://s3fifo.com/). It combines **best-in-class hit rates**, **multi-threaded** scalability, and an optional **multi-tier architecture** for persistence.
1212

13-
## Why "multi"?
14-
15-
### Multiple Adaptive Strategies
13+
**Our philosophy**: Hit rate matters most (cache misses are expensive), then throughput (handle load), then single-threaded latency. We aim to excel at all three.
1614

17-
multicache monitors ghost hit rates (how often evicted keys return) and automatically selects the optimal eviction strategy:
15+
## Why "multi"?
1816

19-
| Mode | Trigger | Strategy | Workload |
20-
|------|---------|----------|----------|
21-
| 0 | Ghost rate <1% | Pure recency | Scan-heavy (unique keys) |
22-
| 1 | Ghost rate 1-22% | Balanced S3-FIFO | Mixed access patterns |
23-
| 2 | Ghost rate 7-12% | Frequency-biased | Repeated hot keys |
24-
| 3 | Ghost rate ≥23% | Clock-like second-chance | High temporal locality |
17+
### Multi-Threaded Performance
2518

26-
No tuning required - the cache learns your workload and adapts.
19+
Designed for high-concurrency workloads with dynamic sharding (up to 2048 shards) that scales with `GOMAXPROCS`. At 32 threads, multicache delivers **185M+ QPS** for GetOrSet operations.
2720

2821
### Multi-Tier Architecture
2922

@@ -35,7 +28,7 @@ Stack fast in-memory caching with durable persistence:
3528
└─────────────────┬───────────────────┘
3629
3730
┌─────────────────▼───────────────────┐
38-
│ Memory Cache (microseconds) │ ← L1: S3-FIFO with adaptive modes
31+
│ Memory Cache (microseconds) │ ← L1: S3-FIFO eviction
3932
└─────────────────┬───────────────────┘
4033
│ async write / sync read
4134
┌─────────────────▼───────────────────┐
@@ -54,12 +47,12 @@ All backends support optional S2 or Zstd compression via [`pkg/store/compress`](
5447

5548
## Features
5649

57-
- **Best-in-class performance** - 7ns reads, 100M+ QPS single-threaded
58-
- **Adaptive S3-FIFO eviction** - Better hit-rates than LRU ([learn more](https://s3fifo.com/))
50+
- **Best-in-class hit rates** - S3-FIFO beats LRU by 5%+ on real traces ([learn more](https://s3fifo.com/))
51+
- **Multi-threaded throughput** - 185M+ QPS at 32 threads, scales with core count
52+
- **Low latency** - 7ns reads, 100M+ QPS single-threaded, zero-allocation updates
5953
- **Thundering herd prevention** - `GetSet` deduplicates concurrent loads
6054
- **Per-item TTL** - Optional expiration
6155
- **Graceful degradation** - Cache works even if persistence fails
62-
- **Zero allocation updates** - Minimal GC pressure
6356

6457
## Usage
6558

@@ -108,7 +101,7 @@ cache, _ := multicache.NewTiered(p)
108101

109102
## Performance against the Competition
110103

111-
multicache prioritizes high hit-rates and low read latency. We have our own built in `make bench` that asserts cache dominance:
104+
multicache prioritizes **hit rate** first, **multi-threaded throughput** second, and **single-threaded latency** third—but aims to excel at all three. We have our own built in `make bench` that asserts cache dominance:
112105

113106
```
114107
>>> TestLatencyNoEviction: Latency - No Evictions (Set cycles within cache size) (go test -run=TestLatencyNoEviction -v)
@@ -202,24 +195,16 @@ Want even more comprehensive benchmarks? See https://github.com/tstromberg/gocac
202195

203196
## Implementation Notes
204197

205-
### S3-FIFO Enhancements
198+
multicache implements the S3-FIFO algorithm from the SOSP'23 paper with these optimizations for production use:
206199

207-
multicache implements the S3-FIFO algorithm from SOSP'23 with these optimizations:
208-
209-
1. **Dynamic Sharding** - 1-2048 independent shards for concurrent workloads
210-
2. **Bloom Filter Ghosts** - Two rotating Bloom filters (vs storing keys), 10-100x less memory
211-
3. **Lazy Ghost Checks** - Only check ghosts at capacity, saving 5-9% latency during warmup
200+
1. **Dynamic Sharding** - Up to 2048 shards (capped at 2× GOMAXPROCS) for concurrent workloads
201+
2. **Bloom Filter Ghosts** - Two rotating Bloom filters instead of storing keys, 10-100× less memory
202+
3. **Lazy Ghost Checks** - Only check ghosts at capacity, saving latency during warmup
212203
4. **Intrusive Lists** - Zero-allocation queue operations
213204
5. **Fast-path Hashing** - Specialized `int`/`string` hashing via wyhash
205+
6. **Higher Frequency Cap** - Max freq=7 (vs paper's 3) for better hot/warm discrimination
214206

215-
### Adaptive Mode Details
216-
217-
Mode switching uses **hysteresis** to prevent oscillation. Mode 2 (frequency-biased) requires 7-12% ghost rate to enter, but stays active while rate is 5-22%.
218-
219-
Additional tuning beyond the paper:
220-
- **Adaptive queue sizing** - Small queue is 20% for caches ≤32K, 15% for ≤128K, 10% for larger
221-
- **Ghost frequency boost** - Returning items start with freq=1 instead of 0
222-
- **Higher frequency cap** - Max freq=7 (vs 3) for better hot/warm discrimination
207+
The core algorithm follows the paper closely: items enter the small queue, get promoted to main after 2+ accesses, and evicted items are tracked in a ghost queue to inform future admissions.
223208

224209
## License
225210

benchmarks/cmd/mem_multicache/main.go

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,7 @@ func main() {
2020
valSize := flag.Int("valSize", 1024, "value size")
2121
flag.Parse()
2222

23-
//nolint:revive // explicit GC required for accurate memory benchmarking
24-
runtime.GC()
23+
runtime.GC() //nolint:revive // call-to-gc: explicit GC required for accurate memory benchmarking
2524
debug.FreeOSMemory()
2625

2726
cache := multicache.New[string, []byte](multicache.Size(*capacity))
@@ -37,11 +36,9 @@ func main() {
3736

3837
keepAlive = cache
3938

40-
//nolint:revive // explicit GC required for accurate memory benchmarking
41-
runtime.GC()
39+
runtime.GC() //nolint:revive // call-to-gc: explicit GC required for accurate memory benchmarking
4240
time.Sleep(100 * time.Millisecond)
43-
//nolint:revive // explicit GC required for accurate memory benchmarking
44-
runtime.GC()
41+
runtime.GC() //nolint:revive // call-to-gc: explicit GC required for accurate memory benchmarking
4542
debug.FreeOSMemory()
4643

4744
var mem runtime.MemStats

s3fifo.go

Lines changed: 25 additions & 135 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,13 @@ package multicache
33
import (
44
"fmt"
55
"math/bits"
6-
"os"
76
"runtime"
87
"sync"
98
"sync/atomic"
109
"time"
1110
"unsafe"
1211
)
1312

14-
// debugAdaptive enables adaptive mode debug output when MULTICACHE_DEBUG=1.
15-
var debugAdaptive = os.Getenv("MULTICACHE_DEBUG") == "1"
16-
1713
// wyhash constants for fast string hashing.
1814
const (
1915
wyp0 = 0xa0761d6478bd642f
@@ -65,8 +61,8 @@ const maxFreq = 7
6561
// Each shard is an independent S3-FIFO instance with its own queues and lock.
6662
//
6763
// Algorithm per shard:
68-
// - Small queue (S): 10-20% of shard capacity, for new entries
69-
// - Main queue (M): 80-90% of shard capacity, for promoted entries
64+
// - Small queue (S): ~10% of shard capacity, for new entries
65+
// - Main queue (M): ~90% of shard capacity, for promoted entries
7066
// - Ghost queue (G): Tracks evicted keys (no data)
7167
//
7268
// On cache miss:
@@ -115,25 +111,13 @@ type shard[K comparable, V any] struct {
115111
hasher func(K) uint64
116112

117113
capacity int
118-
smallCap int
119114

120115
// Free list for reducing allocations
121116
freeEntries *entry[K, V]
122117

123118
// Warmup: during initial fill, admit everything without eviction checks
124119
warmupComplete bool
125120

126-
// Adaptive mode detection based on ghost hit rate with hysteresis:
127-
// - Mode 0 (scan-heavy, ghost rate < 1%): pure recency, skip ghost tracking
128-
// - Mode 1 (balanced): lenient promotion (freq > 0)
129-
// - Mode 2 (frequency-heavy): strict promotion (freq > 1)
130-
// Entry: ghost rate 7-12%. Stay: 5-22%. Exit: <5% or >=23%
131-
// - Mode 3 (clock-like, ghost rate >= 23%): all items to main, second-chance
132-
insertions uint32
133-
ghostHits uint32
134-
adaptiveMode uint8 // 0=scan/recency, 1=balanced, 2=frequency-heavy
135-
adaptiveMinInsertions uint32 // min insertions before adaptive kicks in
136-
137121
// Parent pointer for global capacity tracking
138122
parent *s3fifo[K, V]
139123
}
@@ -235,18 +219,7 @@ func newS3FIFO[K comparable, V any](cfg *config) *s3fifo[K, V] {
235219
cache.keyIsString = true
236220
}
237221

238-
// Adaptive small queue ratio: larger for smaller caches to allow more frequency accumulation.
239-
// S3-FIFO paper recommends 10%, but smaller caches need more room for frequency estimation.
240-
var smallRatio float64
241-
switch {
242-
case capacity <= 32768: // ≤32K: 20% small queue
243-
smallRatio = 0.20
244-
case capacity <= 131072: // ≤128K: 15% small queue
245-
smallRatio = 0.15
246-
default: // >128K: 10% small queue (paper recommendation)
247-
smallRatio = 0.10
248-
}
249-
// Ghost queue at 100% matches reference implementation for better hit rate.
222+
// Ghost queue at 100% of shard capacity matches reference implementation.
250223
const ghostRatio = 1.0
251224

252225
// Prepare hasher for Bloom filter
@@ -284,20 +257,15 @@ func newS3FIFO[K comparable, V any](cfg *config) *s3fifo[K, V] {
284257
}
285258

286259
for i := range nshards {
287-
smallCap := max(int(float64(shardCap)*smallRatio), 1)
288260
ghostCap := max(int(float64(shardCap)*ghostRatio), 1)
289-
minIns := uint32(max(shardCap, 256)) //nolint:gosec // G115: shardCap bounded by capacity/nshards
290261
cache.shards[i] = &shard[K, V]{
291-
capacity: shardCap,
292-
smallCap: smallCap,
293-
ghostCap: ghostCap,
294-
entries: make(map[K]*entry[K, V], shardCap),
295-
ghostActive: newBloomFilter(ghostCap, 0.00001),
296-
ghostAging: newBloomFilter(ghostCap, 0.00001),
297-
hasher: hasher,
298-
adaptiveMinInsertions: minIns,
299-
adaptiveMode: 1, // Start in balanced mode
300-
parent: cache,
262+
capacity: shardCap,
263+
ghostCap: ghostCap,
264+
entries: make(map[K]*entry[K, V], shardCap),
265+
ghostActive: newBloomFilter(ghostCap, 0.00001),
266+
ghostAging: newBloomFilter(ghostCap, 0.00001),
267+
hasher: hasher,
268+
parent: cache,
301269
}
302270
}
303271

@@ -497,78 +465,10 @@ func (s *shard[K, V]) set(key K, value V, expiryNano int64) {
497465
// Lazily check ghost only if at capacity (when eviction matters)
498466
// This saves 2× bloom filter checks + hash computation when cache isn't full
499467
if full {
500-
// Track insertions for adaptive mode detection
501-
s.insertions++
502-
503-
// In scan/recency mode (mode=0), skip ghost checks entirely
504-
switch s.adaptiveMode {
505-
case 0:
506-
ent.inSmall = true
507-
case 3:
508-
// Clock mode: all items go directly to main queue with freq=1
509-
// This mimics clock's second-chance behavior for high-recency workloads
510-
ent.inSmall = false
511-
ent.freq.Store(1)
512-
// Still track ghost for mode detection
513-
h := s.hasher(key)
514-
if s.ghostActive.Contains(h) || s.ghostAging.Contains(h) {
515-
s.ghostHits++
516-
}
517-
default:
518-
// Check if key is in ghost (Bloom filter)
519-
h := s.hasher(key)
520-
inGhost := s.ghostActive.Contains(h) || s.ghostAging.Contains(h)
521-
ent.inSmall = !inGhost
522-
523-
// Track ghost hits and apply frequency boost
524-
if inGhost {
525-
s.ghostHits++
526-
// Ghost Freq Boost: Items returning from ghost start with freq=1
527-
// This rewards items that proved popularity, but actual re-accesses build more frequency
528-
ent.freq.Store(1)
529-
}
530-
}
531-
532-
// Adaptive mode detection: check every 256 insertions after warmup
533-
// Mode 0: scan-heavy (ghost rate < 1%) - pure recency
534-
// Mode 1: balanced - lenient promotion (freq > 0)
535-
// Mode 2: frequency-heavy - strict promotion (freq > 1)
536-
// Mode 3: clock-like (ghost rate >= 23%) - all items to main
537-
//
538-
// Hysteresis prevents oscillation: entry thresholds differ from exit.
539-
// Enter mode 2: 7-12%. Stay in mode 2: 5-22%. Exit: <5% or >=23%.
540-
if s.insertions >= s.adaptiveMinInsertions && s.insertions&0xFF == 0 {
541-
rate := s.ghostHits * 100 / s.insertions // percentage
542-
prev := s.adaptiveMode
543-
544-
// Apply hysteresis: current mode affects switching thresholds
545-
switch {
546-
case rate < 1:
547-
s.adaptiveMode = 0 // Scan-heavy: use pure recency
548-
case rate >= 23:
549-
s.adaptiveMode = 3 // Very high recency: clock-like behavior
550-
case s.adaptiveMode == 2:
551-
// In mode 2: use wider band (5-22%) to prevent oscillation
552-
if rate < 5 {
553-
s.adaptiveMode = 1
554-
}
555-
// rate >= 23 handled above
556-
default:
557-
// Not in mode 2: use narrow entry band (7-12%)
558-
if rate >= 7 && rate <= 12 {
559-
s.adaptiveMode = 2
560-
} else {
561-
s.adaptiveMode = 1
562-
}
563-
}
564-
if debugAdaptive && s.adaptiveMode != prev {
565-
fmt.Printf("[multicache] mode %d→%d (ghost=%d%%, cap=%d)\n",
566-
prev, s.adaptiveMode, rate, s.capacity)
567-
}
568-
// Reset counters for next period
569-
s.insertions = 0
570-
s.ghostHits = 0
571-
}
468+
// Pure S3-FIFO: check ghost and route accordingly
469+
h := s.hasher(key)
470+
inGhost := s.ghostActive.Contains(h) || s.ghostAging.Contains(h)
471+
ent.inSmall = !inGhost
572472

573473
// Evict one entry to make room
574474
if s.main.len > 0 && s.small.len <= s.capacity/10 {
@@ -619,37 +519,27 @@ func (s *shard[K, V]) delete(key K) {
619519
}
620520

621521
// evictFromSmall evicts an entry from the small queue.
622-
// Promotion threshold adapts based on workload characteristics:
623-
// - Mode 0 (scan): always promote (pure recency).
624-
// - Mode 1 (balanced): need freq > 0 (one access).
625-
// - Mode 2 (frequency): need freq > 1 (two accesses).
522+
// Pure S3-FIFO: promote if freq >= 2 (accessed at least twice).
626523
func (s *shard[K, V]) evictFromSmall() {
627524
mainCap := (s.capacity * 9) / 10 // 90% for main queue
628525

629-
// Adaptive promotion threshold based on detected workload type
630-
thresh := uint32(s.adaptiveMode) // 0, 1, or 2
631-
632526
for s.small.len > 0 {
633527
e := s.small.head
634528
s.small.remove(e)
635529

636-
if e.freq.Load() < thresh {
637-
// Not accessed enough - evict
530+
if e.freq.Load() < 2 {
531+
// Not accessed enough - evict and track in ghost
638532
k := e.key
639533
delete(s.entries, k)
640534

641-
// Track in ghost queue only if not in scan/recency mode
642-
// (scan-heavy workloads don't benefit from ghost tracking)
643-
if s.adaptiveMode > 0 {
644-
h := s.hasher(k)
645-
if !s.ghostActive.Contains(h) {
646-
s.ghostActive.Add(h)
647-
}
648-
// Rotate filters when active is full (provides approximate FIFO)
649-
if s.ghostActive.entries >= s.ghostCap {
650-
s.ghostAging.Reset()
651-
s.ghostActive, s.ghostAging = s.ghostAging, s.ghostActive
652-
}
535+
h := s.hasher(k)
536+
if !s.ghostActive.Contains(h) {
537+
s.ghostActive.Add(h)
538+
}
539+
// Rotate filters when active is full (provides approximate FIFO)
540+
if s.ghostActive.entries >= s.ghostCap {
541+
s.ghostAging.Reset()
542+
s.ghostActive, s.ghostAging = s.ghostAging, s.ghostActive
653543
}
654544

655545
s.putEntry(e)

0 commit comments

Comments
 (0)