Skip to content

Commit beddbfa

Browse files
authored
Docs/update (#135)
* docs: expand design documentation with new sections on concurrency, metrics, error model, and weighted eviction - Added detailed documentation on concurrency strategies, outlining the design rationale for concurrent cache types and their usage. - Introduced a metrics section to explain the metrics infrastructure, including recording, snapshotting, and exporting metrics for observability. - Documented the error model, clarifying the panic vs. `Result` discipline and the handling of different error types. - Included a comprehensive overview of weighted eviction strategies, detailing the implementation of `WeightStore` and `ConcurrentWeightStore`. These additions enhance the overall documentation, providing clearer guidance on design principles and usage patterns for developers. * docs: expand design documentation with new sections on benchmarking, hashing, sharding, serialization, and non-goals - Added a comprehensive section on benchmarking design, detailing the benchmark layers, goals, and artifact schema to enhance performance evaluation. - Introduced documentation on hashing and key identity, explaining hasher choices, key interning, and shard routing strategies. - Documented sharding design, outlining current sharded primitives, routing requirements, and capacity semantics for improved concurrency. - Included a section on serialization, clarifying the current serialization surface and future considerations for cache-state persistence. - Added a non-goals document to define explicit boundaries for cachekit's design, ensuring clarity on what the library does not aim to achieve. These additions significantly enhance the documentation, providing clearer guidance on design principles and usage patterns for developers. * docs: update design documentation to reflect the addition of CAR policy and clarify policy counts - Updated the builder and dynamic dispatch documentation to indicate that cachekit now ships 18 implemented eviction policies, with CAR being a concrete policy not yet exposed through `CachePolicy` / `DynCache`. - Clarified the distinction between implemented policies and runtime-dispatch variants, ensuring accurate representation of the current state of the library. - Revised concurrency and trait hierarchy documentation to reflect the updated policy count, enhancing clarity for users regarding available features and capabilities. These changes improve the accuracy and comprehensiveness of the design documentation, aiding developers in understanding the current state of cachekit's policy implementations.
1 parent 11da219 commit beddbfa

13 files changed

Lines changed: 3587 additions & 52 deletions

docs/design/benchmarking.md

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# Benchmarking
2+
3+
> Status: design rationale for the benchmark suite under [`benches/`](../../benches)
4+
> and shared benchmark support under [`bench-support/`](../../bench-support).
5+
> Companion to [`design.md`](design.md) §10 and the benchmark reference docs.
6+
7+
cachekit benchmarks are designed to answer cache questions, not just produce
8+
fast-looking numbers. A cache policy can be excellent on uniform keys and weak
9+
under scans, or fast on micro-operations and poor at preserving hit rate. The
10+
benchmark suite therefore separates micro-operation cost, policy effectiveness,
11+
trace-shaped workloads, reporting, and machine-readable artifacts.
12+
13+
## Goals
14+
15+
- Compare policies under workload shapes that resemble real cache traffic.
16+
- Keep measured loops free of allocator noise and dynamic dispatch.
17+
- Produce both human-readable reports and stable JSON artifacts.
18+
- Preserve enough metadata to reproduce a run: git commit, branch, dirty bit,
19+
rustc version, host triple, CPU model, capacity, universe, operations, seed.
20+
- Make adding a policy or workload a registry edit, not a benchmark rewrite.
21+
22+
## Benchmark Layers
23+
24+
The benchmark suite has four layers:
25+
26+
| Layer | Files | Purpose |
27+
|---|---|---|
28+
| Criterion measurements | `benches/workloads.rs`, `benches/ops.rs`, `benches/comparison.rs`, `benches/policy/*.rs` | statistically sampled latency and throughput |
29+
| Console reports | `benches/reports.rs` | fast, readable tables without Criterion overhead |
30+
| JSON artifact runner | `benches/runner.rs` | structured output for docs, charts, CI, historical comparison |
31+
| Shared support crate | `bench-support/` | policy registry, workloads, metrics, JSON schema, doc renderer |
32+
33+
This split is deliberate. Criterion is good for micro-benchmark statistics; the
34+
artifact runner is good for automation; console reports are good while tuning a
35+
policy locally. No single binary is forced to serve every audience.
36+
37+
## Monomorphic Policy Registry
38+
39+
Benchmarks iterate policies through `for_each_policy!` in
40+
[`bench-support/src/registry.rs`](../../bench-support/src/registry.rs):
41+
42+
```rust,ignore
43+
for_each_policy! {
44+
with |policy_id, display_name, make_cache| {
45+
let mut cache = make_cache(CAPACITY);
46+
// measured workload...
47+
}
48+
}
49+
```
50+
51+
The macro expands to one block per concrete policy type. This avoids dynamic
52+
dispatch in the measured loop while keeping policy iteration centralized.
53+
`POLICIES` in the same module provides presentation metadata (stable id,
54+
display name, chart color) for renderers and reports.
55+
56+
The trade-off is that adding a policy touches the macro and metadata table. A
57+
test (`policies_metadata_matches_macro`) keeps the two from drifting. This is
58+
the same explicit-boilerplate-over-magic choice as `DynCache`: more arms in
59+
source, fewer surprises in hot code.
60+
61+
## Workload Registry
62+
63+
Workload definitions live in `bench-support/src/registry.rs`; generators live in
64+
[`bench-support/src/workload.rs`](../../bench-support/src/workload.rs). The
65+
current standard workloads cover:
66+
67+
- Uniform random keys for raw overhead baselines.
68+
- Hot-set access for explicit skew.
69+
- Sequential scan for scan-pollution stress.
70+
- Zipfian and scrambled Zipfian for power-law access.
71+
- Latest / recency-biased access.
72+
- Shifting hotspots and flash crowds for adaptation.
73+
- Composite scan-resistance mixes.
74+
75+
[`docs/benchmarks/workloads.md`](../benchmarks/workloads.md) is the catalog. It
76+
also contains a large roadmap of workloads that should not be confused with
77+
implemented cases. New workloads should land first in the support crate, then in
78+
the docs, then in reports.
79+
80+
## Value Construction Discipline
81+
82+
`benches/runner.rs` pre-allocates one `Arc<u64>` per key in the universe and
83+
passes a closure that returns `Arc::clone`:
84+
85+
```rust,ignore
86+
fn preallocate_values() -> Vec<Arc<u64>> {
87+
(0..UNIVERSE).map(Arc::new).collect()
88+
}
89+
```
90+
91+
The rule is: **do not allocate values inside the measured operation loop**.
92+
Allocating on every miss makes the benchmark measure the allocator and value
93+
constructor, not the policy. A cheap `Arc::clone` isolates hit/miss behaviour,
94+
eviction order, and policy metadata overhead.
95+
96+
This is especially important because policies store values differently:
97+
`FastLru` stores `V` directly, while LRU / LFU / Heap-LFU use `Arc<V>` in some
98+
paths. Pre-allocation keeps those representation differences from dominating
99+
the benchmark.
100+
101+
## Artifact Schema
102+
103+
`bench-support/src/json_results.rs` defines the stable JSON schema for results:
104+
105+
- `SCHEMA_VERSION` follows semantic schema rules.
106+
- Major bumps remove or rename required fields.
107+
- Minor bumps add optional fields.
108+
- Renderers accept any artifact with a matching major.
109+
110+
Each `BenchmarkArtifact` contains:
111+
112+
- `metadata`: timestamp, git commit, branch, dirty bit, rustc, host, CPU,
113+
benchmark config.
114+
- `results`: rows keyed by policy, workload, and `case_id`.
115+
- `metrics`: optional typed sections for hit rate, throughput, latency,
116+
eviction, scan resistance, adaptation speed.
117+
118+
The schema is presentation-neutral. Markdown tables and charts are rendered
119+
later by `bench-support/src/bin/render_docs.rs`, so measurement and presentation
120+
can evolve independently.
121+
122+
## Case IDs
123+
124+
Use `case_id::*` constants from `json_results.rs` instead of string literals:
125+
126+
- `hit_rate`
127+
- `comprehensive`
128+
- `scan_resistance`
129+
- `adaptation`
130+
131+
This catches typos at compile time and prevents a result section from silently
132+
disappearing from rendered docs. Adding a new case means adding a constant,
133+
teaching the runner to populate it, and teaching the renderer how to display it.
134+
135+
## What Each Benchmark Answers
136+
137+
| Benchmark | Question |
138+
|---|---|
139+
| `ops.rs` | What is the raw cost of `get` / `insert` / policy-specific operations? |
140+
| `workloads.rs` | Which policies preserve hit rate under standard workloads? |
141+
| `comparison.rs` | How does cachekit compare with external crates (`lru`, `quick_cache`)? |
142+
| `policy/*.rs` | What is the cost of each policy's unique operations? |
143+
| `reports.rs` | What should a human inspect while tuning? |
144+
| `runner.rs` | What should CI and docs consume? |
145+
146+
Do not overload one benchmark to answer all questions. If you need policy
147+
micro-cost, use `ops.rs`; if you need hit rate under scans, use `workloads.rs`
148+
or `runner.rs`.
149+
150+
## Reproducibility Rules
151+
152+
- Seed every workload. Default seed is 42 unless a benchmark is explicitly
153+
sweeping seeds.
154+
- Record the git dirty bit. Dirty runs are useful locally but should not be
155+
published as release baselines without a note.
156+
- Keep capacity, universe, and operation count visible in the artifact.
157+
- Prefer `ScrambledZipfian` over raw `Zipfian` for cross-policy comparison when
158+
hardware prefetch could bias hot-key locality.
159+
- Do not compare results across machines without CPU metadata. Tail latency and
160+
pointer-heavy policy cost are machine-sensitive.
161+
162+
## CI and Documentation Flow
163+
164+
The docs pipeline runs the benchmark suite, writes
165+
`target/benchmarks/<run-id>/results.json`, and renders
166+
`docs/benchmarks/latest/` plus charts. Release-tag snapshots live under
167+
`docs/benchmarks/vX.Y.Z/`.
168+
169+
Manual workflow:
170+
171+
```bash
172+
cargo bench --bench runner
173+
./scripts/update_benchmark_docs.sh
174+
```
175+
176+
The script is the high-level path for refreshing published benchmark docs. Use
177+
individual benches (`cargo bench --bench ops`, `cargo bench --bench reports -- scan`)
178+
while developing a policy.
179+
180+
## Adding a Policy to Benchmarks
181+
182+
1. Add the policy to `for_each_policy!` with a concrete constructor.
183+
2. Add matching `PolicyMeta` in `POLICIES`.
184+
3. Run the registry drift test.
185+
4. Run `cargo bench --bench reports -- hit_rate` for a quick sanity check.
186+
5. Run `cargo bench --bench runner` before publishing docs.
187+
188+
Keep constructors comparable. If one policy needs `Arc<u64>` and another stores
189+
`u64`, choose the value shape that preserves fairness and document the exception
190+
in the registry comment.
191+
192+
## Adding a Workload
193+
194+
1. Implement the generator in `bench-support/src/workload.rs`.
195+
2. Add a `WorkloadCase` in the registry with stable id and display name.
196+
3. Add docs in [`docs/benchmarks/workloads.md`](../benchmarks/workloads.md).
197+
4. Add renderer support if the workload needs a custom section.
198+
5. Run at least one policy family expected to behave differently (for example,
199+
LRU vs S3-FIFO for scan-heavy workloads).
200+
201+
Do not add a workload just because it is mathematically interesting. It should
202+
answer a policy-selection question.
203+
204+
## Non-goals
205+
206+
- Benchmarks are not formal proofs of policy optimality.
207+
- Benchmarks are not stable ABI. The JSON schema is versioned, but Criterion
208+
names and report formatting can change.
209+
- Benchmarks do not hide hardware effects. They record enough metadata for the
210+
reader to judge them.
211+
- Benchmarks do not replace fuzzing or invariant tests; they measure behaviour
212+
under selected workloads.
213+
214+
## See Also
215+
216+
- [Design overview](design.md) - §10 frames benchmarking at the principles level
217+
- [Metrics](metrics.md) - recorder / snapshot / exporter split
218+
- [Benchmark docs](../benchmarks/README.md)
219+
- [Workload catalog](../benchmarks/workloads.md)
220+
- [`bench-support/src/registry.rs`](../../bench-support/src/registry.rs)
221+
- [`bench-support/src/json_results.rs`](../../bench-support/src/json_results.rs)
222+
- [`benches/runner.rs`](../../benches/runner.rs)

0 commit comments

Comments
 (0)