Skip to content
113 changes: 36 additions & 77 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,35 @@
# libCacheSim - building and running cache simulations
<p align="center">
<picture>
<img alt="libCacheSim" src="doc/assets/logo.jpg" width=55%>
</picture>
</p>

<h3 align="center">
A high-performance library for building and running cache simulations
</h3>

---

[![build](https://github.com/1a1a11a/libCacheSim/actions/workflows/build.yml/badge.svg)](https://github.com/1a1a11a/libCacheSim/actions/workflows/build.yml)
[![Python Release](https://github.com/1a1a11a/libCacheSim/actions/workflows/pypi-release.yml/badge.svg)](https://github.com/1a1a11a/libCacheSim/actions/workflows/pypi-release.yml)
[![NPM Release](https://github.com/1a1a11a/libCacheSim/actions/workflows/npm-release.yml/badge.svg)](https://github.com/1a1a11a/libCacheSim/actions/workflows/npm-release.yml)
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/1a1a11a/libCacheSim/badge)](https://scorecard.dev/viewer/?uri=github.com/1a1a11a/libCacheSim)


<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->
- [libCacheSim - building and running cache simulations](#libcachesim---building-and-running-cache-simulations)
- [News](#news)
- [What is libCacheSim](#what-is-libcachesim)
- [libCacheSim features](#libcachesim-features)
- [Supported algorithms](#supported-algorithms)
- [Eviction algorithms](#eviction-algorithms)
- [Admission algorithms](#admission-algorithms)
- [Prefetching algorithms](#prefetching-algorithms)
- [Build and Install libCacheSim](#build-and-install-libcachesim)
- [One-line install](#one-line-install)
- [Install dependency](#install-dependency)
- [Build libCacheSim](#build-libcachesim)
- [Developer Setup](#developer-setup)
- [Pre-commit Hooks](#pre-commit-hooks)
- [Usage](#usage)
- [cachesim (a high-performance cache simulator)](#cachesim-a-high-performance-cache-simulator)
- [basic usage](#basic-usage)
- [Run a single cache simulation](#run-a-single-cache-simulation)
- [Run multiple cache simulations with different cache sizes](#run-multiple-cache-simulations-with-different-cache-sizes)
- [Debug cachesim](#debug-cachesim)
- [Plot miss ratio curve](#plot-miss-ratio-curve)
- [Trace analysis](#trace-analysis)
- [Miss ratio curves profiling](#miss-ratio-curves-profiling)
- [Using libCacheSim as a library](#using-libcachesim-as-a-library)
- [Extending libCacheSim (new algorithms and trace types)](#extending-libcachesim-new-algorithms-and-trace-types)
- [Python package](#python-package)
- [Simulation with python](#simulation-with-python)
- [Extending new algorithm](#extending-new-algorithm)
- [Open source cache traces](#open-source-cache-traces)
- [Contributions](#contributions)
- [Reference](#reference)
- [License](#license)
- [Related](#related)
<!-- TOC end -->


<!-- TOC --><a name="news"></a>
## News
* **2024 Oct**: **S3-FIFO** gets an upgrade! Please try out the new version (the old is now renamed to S3-FIFOv0).
* **2023 June**: **QDLP** is available now, see [our paper](https://dl.acm.org/doi/10.1145/3593856.3595887) for details.
* **2023 Oct**: **[S3-FIFO](https://dl.acm.org/doi/10.1145/3600006.3613147)** and **SIEVE(https://sievecache.com)** are available! These are very simple algorithms that are very effective in reducing cache misses. Try them out in libCacheSim and your production!
* **2024 Jan**: We compiled a list of open-source cache datasets at the bottom of this page
---

<!-- TOC --><a name="what-is-libcachesim"></a>
## What is libCacheSim
* a high-performance **cache simulator** for running cache simulations.
* a high-performance and versatile trace analyzer for **analyzing different cache traces**.
* a high-performance **library** for building cache simulators.

---

<!-- TOC --><a name="libcachesim-features"></a>
## libCacheSim features
* **High performance** - over 20M requests/sec for a realistic trace replay.
* **High memory efficiency** - predictable and small memory footprint.
Expand All @@ -71,7 +41,6 @@
* **Efficient Miss Ratio Curve profiler** - quickly build highly accurate miss ratio curves on large-scale workloads; see [here](/doc/quickstart_mrcProfiler.md).
---

<!-- TOC --><a name="supported-algorithms"></a>
## Supported algorithms
cachesim supports the following algorithms:
### Eviction algorithms
Expand Down Expand Up @@ -103,9 +72,7 @@ cachesim supports the following algorithms:
---


<!-- TOC --><a name="build-and-install-libcachesim"></a>
## Build and Install libCacheSim
<!-- TOC --><a name="one-line-install"></a>
### One-line install
We provide some scripts for quick installation of libCacheSim.
```bash
Expand All @@ -115,13 +82,14 @@ If this does not work, please
1. let us know what system you are using and what error you get
2. read the following sections for self-installation.

<!-- TOC --><a name="install-dependency"></a>
<details>
<summary>Step-by-step installation guide </summary>

### Install dependency
libCacheSim uses [cmake](https://cmake.org/) build system and has a few dependencies: [glib](https://developer.gnome.org/glib/), [tcmalloc](https://github.com/google/tcmalloc), [zstd](https://github.com/facebook/zstd).
Please see [install.md](/doc/install.md) for instructions on how to install the dependencies.


<!-- TOC --><a name="build-libcachesim"></a>
### Build libCacheSim
cmake recommends **out-of-source build**, so we do it in a new directory:
```bash
Expand All @@ -137,10 +105,14 @@ cmake -G Ninja .. && ninja
[sudo] ninja install
popd
```
</details>


<details>
<summary> Developer setup </summary>

<!-- TOC --><a name="developer-setup"></a>
### Developer Setup
For developers, we provide tools to ensure code quality and consistent formatting:
If you contribute to libCacheSim, we provide tools to ensure code quality and consistent formatting:

#### Pre-commit Hooks
We provide a git pre-commit hook that runs linting checks before each commit, helping catch issues early:
Expand All @@ -157,22 +129,20 @@ The pre-commit hook:
- Prevents committing code with formatting, static analysis, or compiler issues
- Logs are preserved for debugging in `.lint-logs/` directory

</details>

---

<!-- TOC --><a name="usage"></a>
## Usage
<!-- TOC --><a name="cachesim-a-high-performance-cache-simulator"></a>
### cachesim (a high-performance cache simulator)
After building and installing libCacheSim, `cachesim` should be in the `_build/bin/` directory.
<!-- TOC --><a name="basic-usage"></a>
#### basic usage
```
./bin/cachesim trace_path trace_type eviction_algo cache_size [OPTION...]
```

use `./bin/cachesim --help` to get more information.

<!-- TOC --><a name="run-a-single-cache-simulation"></a>
#### Run a single cache simulation
Run the example traces using the LRU eviction algorithm and a 1 GB cache size.

Expand All @@ -181,7 +151,6 @@ Run the example traces using the LRU eviction algorithm and a 1 GB cache size.
./bin/cachesim ../data/trace.vscsi vscsi lru 1gb
```

<!-- TOC --><a name="run-multiple-cache-simulations-with-different-cache-sizes"></a>
#### Run multiple cache simulations with different cache sizes
```bash
# Note that there is no space between the cache sizes
Expand All @@ -202,7 +171,6 @@ Run the example traces using the LRU eviction algorithm and a 1 GB cache size.

See [quick start cachesim](/doc/quickstart_cachesim.md) for more usages.

<!-- TOC --><a name="debug-cachesim"></a>
#### Debug cachesim
We provide a debug script to help you debug cachesim with GDB. For detailed usage instructions, see [debug guide](/doc/usage.md).

Expand All @@ -214,7 +182,6 @@ We provide a debug script to help you debug cachesim with GDB. For detailed usag
./scripts/debug.sh -- data/cloudPhysicsIO.vscsi vscsi lru,s3fifo 100mb,1gb
```

<!-- TOC --><a name="plot-miss-ratio-curve"></a>
#### Plot miss ratio curve
You can plot miss ratios of different algorithms and sizes, and plot the miss ratios over time.

Expand All @@ -235,7 +202,6 @@ python3 plot_appr_mrc.py MINI ../data/twitter_cluster52.vscsi vscsi s3fifo "0.00

---

<!-- TOC --><a name="trace-analysis"></a>
### Trace analysis
libCacheSim also has a trace analyzer that provides a lot of useful information about the trace.
And it is very fast, designed to work with billions of requests.
Expand All @@ -244,7 +210,6 @@ See [trace analysis](/doc/quickstart_traceAnalyzer.md) for more details.

---

<!-- TOC --><a name="miss-ratio-curves-profiling"></a>
### Miss ratio curves profiling

Constructing fine-grained miss ratio curves for large-scale workloads is very demanding on CPU and memory resources. libCacheSim provides advanced miss ratio curves profiling tools to help you quickly build miss ratio curves for large-scale workloads. See [mrcProfiler](/doc/quickstart_mrcProfiler.md) for more details.
Expand All @@ -253,11 +218,13 @@ Constructing fine-grained miss ratio curves for large-scale workloads is very de

---

<!-- TOC --><a name="using-libcachesim-as-a-library"></a>
### Using libCacheSim as a library
libCacheSim can be used as a library for building cache simulators.
For example, you can build a cache cluster with consistent hashing or a multi-layer cache simulator.

<details>
<summary> See a code example </summary>

Here is a simplified example showing the basic APIs.
```c
#include <libCacheSim.h>
Expand Down Expand Up @@ -299,13 +266,13 @@ To run the executable,
```bash
./test.out
```
</details>

See [here](/doc/advanced_lib.md) for more details, and see [example folder](/example) for examples on how to use libCacheSim, such as building a cache cluster with consistent hashing, multi-layer cache simulators.

---


<!-- TOC --><a name="extending-libcachesim-new-algorithms-and-trace-types"></a>
### Extending libCacheSim (new algorithms and trace types)
libCacheSim supports *txt*, *csv*, and *binary* traces. We prefer binary traces because they allow libCacheSim to run faster, and the traces are more compact.

Expand All @@ -316,7 +283,6 @@ If you need to add a new trace type or a new algorithm, please see [here](/doc/a
We encourage the users to check [deepWiki](https://deepwiki.com/1a1a11a/libCacheSim) for a more detailed documentation.

---
<!-- TOC --><a name="python-package"></a>
## Python package

If you are not extremely sensitive to the performance, our python binding can offer you an easier way to access the core feature of libCacheSim.
Expand All @@ -340,6 +306,8 @@ print(f"Obj miss ratio: {obj_miss_ratio:.4f}, byte miss ratio: {byte_miss_ratio:
### Extending new algorithm

With python package, you can extend new algorithm to test your own eviction design **without any C/C++ compilation**.
<details>
<summary> See an example below </summary>

```python
import libcachesim as lcs
Expand Down Expand Up @@ -375,11 +343,11 @@ obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader)
print(f"Obj miss ratio: {obj_miss_ratio:.4f}, byte miss ratio: {byte_miss_ratio:.4f}")
```

</details>

See more information in [README.md](./libCacheSim-python/README.md) of the Python binding.

---
<!-- TOC --><a name="open-source-cache-traces"></a>
## Open source cache traces
In the [repo](/data/), there are sample traces in different formats (`csv`, `txt`, `vscsi`, and `oracleGeneral`). Note that the sampled traces are **very small** and __should not be used for evaluating different algorithms' miss ratios__. The full traces can be found either with the original release or the processed `oracleGeneral` format.

Expand All @@ -395,31 +363,22 @@ struct {
```
The compressed traces can be used with libCacheSim without decompression. And libCacheSim provides a `tracePrint` tool to print the trace in a human-readable format.

We provide a more comprehensive cache datasets at [https://github.com/cacheMon/cache_dataset](https://github.com/cacheMon/cache_dataset).

| Dataset | Year | Type | Original release | OracleGeneral format |
|---------------|------|:---------:|:-----------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------:|
| Tencent Photo | 2018 | object | [link](http://iotta.snia.org/traces/parallel?only=27476) | [link](https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/tencentPhoto/) |
| WikiCDN | 2019 | object | [link](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Caching) | [link](https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/wiki/) |
| Tencent CBS | 2020 | block | [link](http://iotta.snia.org/traces/parallel?only=27917) | [link](https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/tencentBlock/) |
| Alibaba Block | 2020 | block | [link](https://github.com/alibaba/block-traces) | [link](https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/alibabaBlock/) |
| Twitter | 2020 | key-value | [link](https://github.com/twitter/cache-trace) | [link](https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/twitter/) |
| MetaKV | 2022 | key-value | [link](https://cachelib.org/docs/Cache_Library_User_Guides/Cachebench_FB_HW_eval/#list-of-traces) | [link](https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/metaKV/) |
| MetaCDN | 2023 | object | [link](https://cachelib.org/docs/Cache_Library_User_Guides/Cachebench_FB_HW_eval/#list-of-traces) | [link](https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/metaCDN/) |

Among the large number of traces, I recommend using the newer ones from Twitter (cluster52), Wiki, and Meta.


---
<!-- TOC --><a name="contributions"></a>
## Contributions
We gladly welcome pull requests.
Before making any large changes, we recommend opening an issue and discussing your proposed changes.
If the changes are minor, then feel free to make them without discussion.
This project adheres to Google's coding style. By participating, you are expected to uphold this code.

---
<!-- TOC --><a name="reference"></a>
## Reference
<details>
<summary> Please cite the following papers if you use libCacheSim. </summary>

```
@inproceedings{yang2020-workload,
author = {Juncheng Yang and Yao Yue and K. V. Rashmi},
Expand Down Expand Up @@ -455,16 +414,16 @@ This project adheres to Google's coding style. By participating, you are expecte
numpages = {10},
}
```
If you used libCacheSim in your research, please cite the above papers. And we welcome you to send us a link to your paper and add a reference to [references.md](references.md).
If you used libCacheSim in your research, please cite the above papers.

</details>

---


<!-- TOC --><a name="license"></a>
## License
See [LICENSE](LICENSE) for details.

<!-- TOC --><a name="related"></a>
## Related
* [PyMimircache](https://github.com/1a1a11a/PyMimircache): a python based cache trace analysis platform, now deprecated
---
Binary file added doc/assets/logo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/assets/logo_circle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading