S3Fifo article feedback

I attempted to reproduce the hit rate results in your [blog article][s3fifo-blog], where you observed that TinyLFU had lower performance than S3Fifo. I ran your workloads against the reference implementation, Caffeine.

### Background
Caffeine uses the adaptive windowing technique, so this does not fully discount your observations. The [paper][tinylfu] recommended a 1% window as a starting point, since that works well in many critical workloads like database, search, and analytics. It concludes by showing workloads where a larger window is required and that dynamically adjusting this is left for a future work. That [subsequent work][adaptive] to evaluate approaches does appear to correct this deficiency and is most often used in practice (available in Java, Rust, Go, Python, TypeScript).

Caffeine starts with a 1% window, monitors the hit rate over the TinyLFU sample period, and uses hill climbing to adjust the window size. The number of frequency counters is not known upfront, so like a hash table the capacity grows based on the entry count. This resizing loses the popularity scores and the policy may have started at a poor window size. Thus, like a JIT compiler, during this warmup period the hit rate may be lower but it quickly reaches peak performance so that this becomes noise except in very short runs. The admission decisions vary slightly between runs due to jitter introduced to mitigate hash flooding attacks.

### Analysis
I ran the two simulators side-by-side by comparing the LRU hit rates to ensure that the traces are being evaluated correctly. Caffeine's simulator does not have rich variable size support (entry's weight) as this is less common in Java due to the object's heap usage being mostly hidden from the developer. There is a [fork][ohad-caffeine] that extends support in order to evaluate size-aware TinyLFU strategies ([research paper][size-aware]) where they proposed an adaption to further improve the hit rate. I hope to someday evaluate that proposal and incorporate the improvements.

I used libcachesim at [0f4d135](https://github.com/1a1a11a/libCacheSim/tree/0f4d13539646e2ff1f69789d0c829d4b58e354ce) (current master) and Caffeine at [cb5f75d](https://github.com/1a1a11a/libCacheSim/tree/0f4d13539646e2ff1f69789d0c829d4b58e354ce) (current master) with this [patch](https://github.com/1a1a11a/libCacheSim/files/12390451/patch.txt) to include the new trace formats.

In some traces the simulators report different results for FIFO and LRU by up to 0.25%. I determined that this was because libcachesim does not update the entry's size on a cache hit but does record the byte hit with the new weight.

#### [trace.csv][trace.csv]
```console
./cachesim ../../data/trace.csv csv fifo,lru,qdlp,s3fifo,sieve 1gb -t "time-col=2, obj-id-col=5, obj-size-col=4"
```

| | Hit Rate | Byte Hit Rate
|--------|--------|--------|
| FIFO | 36.64% (63.36% MR) | 26.77% (73.23% MR)
| LRU | 36.96% (63.04% MR) | 27.14% (72.86% MR)
| QDLP | 43.43% (56.57% MR) | 35.99% (64.01% MR)
| S3Fifo | 38.22% (61.78% MR) | 29.10% (70.90% MR)
| Sieve | 43.47% (56.53% MR) | 36.03% (63.97% MR)
| Caffeine | 47.07% (52.93% MR) | 38.22% (61.78% MR)

Caffeine leads Sieve (the best algorithm) by 3.60% HR / 2.19% BHR, and S3Fifo by 8.85% HR / 9.12% BHR.

#### [twitter_cluster52.csv][twitter_cluster52.csv]
```console
./cachesim ../../data/twitter_cluster52.csv csv fifo,lru,qdlp,s3fifo,sieve 1mb -t "time-col=1, obj-id-col=2, obj-size-col=3"
```

| | Hit Rate | Byte Hit Rate
|--------|--------|--------|
| FIFO | 70.65 (29.35% MR) | 68.89% (31.11% MR)
| LRU | 73.13% (26.87% MR) | 71.53% (28.47% MR)
| QDLP | 75.63% (24.37% MR) | 74.82% (25.18% MR)
| S3Fifo | 76.66% (23.34% MR) | 75.99% (24.01% MR)
| Sieve | 74.86% (25.14% MR) | 73.88 (26.12% MR)
| Caffeine | 75.91% (24.09% MR) | 75.09% (24.91% MR)

Caffeine is beaten by S3Fifo (the best algorithm) by -0.75% HR / -0.90% BHR. That is very different from the -10% difference that was observed in the blog article.

#### [hm_0.csv.gz][hm_0.csv.gz]
```console
./cachesim ../../data/hm_0.csv csv fifo,lru,qdlp,s3fifo,sieve 1gb -t "time-col=2, obj-id-col=5, obj-size-col=6"
```

| | Hit Rate | Byte Hit Rate
|--------|--------|--------|
| FIFO | 72.86 (27.14% MR) | 59.12% (40.88% MR)
| LRU | 73.76% (26.24% MR) | 60.67% (39.33% MR)
| QDLP | 75.27% (24.73% MR) | 63.82% (36.18% MR)
| S3Fifo | 75.22 (24.78% MR) | 63.52% (36.48% MR)
| Sieve | 73.93% (26.07% MR) | 62.34% (37.66% MR)
| Caffeine | 75.65% (24.35% MR) | 64.27% (35.73% MR)

Caffeine leads QDLP (the best algorithm) by 0.38% HR / 0.45% BHR, and S3Fifo by 0.43% HR / 0.75% BHR.

#### Multithreading

The article states that LRU is a scalability bottleneck due to reordering under a lock. This assertion might be misleading because many caches decouple their eviction policy from the way that they manage concurrency, rendering LRU's characteristics here irrelevant. This allows the cache designer to consider a broader set of algorithms and data structures that may allow for better time and space efficiency, or that enable additional features. For example, Caffeine scaled linearly to [380M reads/s][throughput] on 16 cores (2015 hardware). In comparison, Segcache's FIFO policy was reported at 70M reads/s on 24 cores (2020 hardware). The benchmarks may differ, but it shows that the policy's concurrency does not need to be a bottleneck and designers can focus on other areas to optimize.

#### Conclusion
In the workloads that were considered exemplary by this project and used for its policy designs, we can observe that adaptive W-TinyLFU (Caffeine) demonstrates competitive performance. It is important to compare reference implementations to avoid accidental differences and I have not attempted to debug your implementation to explain the degradation. I believe any static configuration will underperform in some scenario and that designers should continue to explore ways to make more effective and adaptive choices.

P.S. This is an issue since you disabled the discussion tab. You are welcome to close this after reading (self destructs in 10, 9, ...)

[trace.csv]: https://github.com/1a1a11a/libCacheSim/blob/develop/data/trace.csv
[twitter_cluster52.csv]: https://github.com/1a1a11a/libCacheSim/blob/develop/data/twitter_cluster52.csv
[hm_0.csv.gz]: http://iotta.snia.org/traces/block-io/388
[s3fifo-blog]: https://blog.jasony.me/system/cache/2023/08/01/s3fifo
[ohad-caffeine]: https://github.com/ohadeytan/caffeine/tree/TOS_submission#tos-submission-artifact-for-lightweight-robust-size-aware-cache-management-paper
[tinylfu]: https://dl.acm.org/doi/10.1145/3149371?cid=99659224047
[adaptive]: https://dl.acm.org/doi/10.1145/3274808.3274816?cid=99659224047
[size-aware]: https://dl.acm.org/doi/10.1145/3507920?cid=99659224047
[throughput]: https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3Fifo article feedback #18

Background

Analysis

trace.csv

twitter_cluster52.csv

hm_0.csv.gz

Multithreading

Conclusion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	Hit Rate	Byte Hit Rate
FIFO	36.64% (63.36% MR)	26.77% (73.23% MR)
LRU	36.96% (63.04% MR)	27.14% (72.86% MR)
QDLP	43.43% (56.57% MR)	35.99% (64.01% MR)
S3Fifo	38.22% (61.78% MR)	29.10% (70.90% MR)
Sieve	43.47% (56.53% MR)	36.03% (63.97% MR)
Caffeine	47.07% (52.93% MR)	38.22% (61.78% MR)

	Hit Rate	Byte Hit Rate
FIFO	70.65 (29.35% MR)	68.89% (31.11% MR)
LRU	73.13% (26.87% MR)	71.53% (28.47% MR)
QDLP	75.63% (24.37% MR)	74.82% (25.18% MR)
S3Fifo	76.66% (23.34% MR)	75.99% (24.01% MR)
Sieve	74.86% (25.14% MR)	73.88 (26.12% MR)
Caffeine	75.91% (24.09% MR)	75.09% (24.91% MR)

	Hit Rate	Byte Hit Rate
FIFO	72.86 (27.14% MR)	59.12% (40.88% MR)
LRU	73.76% (26.24% MR)	60.67% (39.33% MR)
QDLP	75.27% (24.73% MR)	63.82% (36.18% MR)
S3Fifo	75.22 (24.78% MR)	63.52% (36.48% MR)
Sieve	73.93% (26.07% MR)	62.34% (37.66% MR)
Caffeine	75.65% (24.35% MR)	64.27% (35.73% MR)

S3Fifo article feedback #18

Description

Background

Analysis

trace.csv

twitter_cluster52.csv

hm_0.csv.gz

Multithreading

Conclusion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions