Skip to content

Commit 808aef5

Browse files
committed
docs: expand design documentation with new sections on benchmarking, hashing, sharding, serialization, and non-goals
- Added a comprehensive section on benchmarking design, detailing the benchmark layers, goals, and artifact schema to enhance performance evaluation. - Introduced documentation on hashing and key identity, explaining hasher choices, key interning, and shard routing strategies. - Documented sharding design, outlining current sharded primitives, routing requirements, and capacity semantics for improved concurrency. - Included a section on serialization, clarifying the current serialization surface and future considerations for cache-state persistence. - Added a non-goals document to define explicit boundaries for cachekit's design, ensuring clarity on what the library does not aim to achieve. These additions significantly enhance the documentation, providing clearer guidance on design principles and usage patterns for developers.
1 parent afe12b4 commit 808aef5

7 files changed

Lines changed: 921 additions & 0 deletions

File tree

docs/design/benchmarking.md

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# Benchmarking
2+
3+
> Status: design rationale for the benchmark suite under [`benches/`](../../benches)
4+
> and shared benchmark support under [`bench-support/`](../../bench-support).
5+
> Companion to [`design.md`](design.md) §10 and the benchmark reference docs.
6+
7+
cachekit benchmarks are designed to answer cache questions, not just produce
8+
fast-looking numbers. A cache policy can be excellent on uniform keys and weak
9+
under scans, or fast on micro-operations and poor at preserving hit rate. The
10+
benchmark suite therefore separates micro-operation cost, policy effectiveness,
11+
trace-shaped workloads, reporting, and machine-readable artifacts.
12+
13+
## Goals
14+
15+
- Compare policies under workload shapes that resemble real cache traffic.
16+
- Keep measured loops free of allocator noise and dynamic dispatch.
17+
- Produce both human-readable reports and stable JSON artifacts.
18+
- Preserve enough metadata to reproduce a run: git commit, branch, dirty bit,
19+
rustc version, host triple, CPU model, capacity, universe, operations, seed.
20+
- Make adding a policy or workload a registry edit, not a benchmark rewrite.
21+
22+
## Benchmark Layers
23+
24+
The benchmark suite has four layers:
25+
26+
| Layer | Files | Purpose |
27+
|---|---|---|
28+
| Criterion measurements | `benches/workloads.rs`, `benches/ops.rs`, `benches/comparison.rs`, `benches/policy/*.rs` | statistically sampled latency and throughput |
29+
| Console reports | `benches/reports.rs` | fast, readable tables without Criterion overhead |
30+
| JSON artifact runner | `benches/runner.rs` | structured output for docs, charts, CI, historical comparison |
31+
| Shared support crate | `bench-support/` | policy registry, workloads, metrics, JSON schema, doc renderer |
32+
33+
This split is deliberate. Criterion is good for micro-benchmark statistics; the
34+
artifact runner is good for automation; console reports are good while tuning a
35+
policy locally. No single binary is forced to serve every audience.
36+
37+
## Monomorphic Policy Registry
38+
39+
Benchmarks iterate policies through `for_each_policy!` in
40+
[`bench-support/src/registry.rs`](../../bench-support/src/registry.rs):
41+
42+
```rust,ignore
43+
for_each_policy! {
44+
with |policy_id, display_name, make_cache| {
45+
let mut cache = make_cache(CAPACITY);
46+
// measured workload...
47+
}
48+
}
49+
```
50+
51+
The macro expands to one block per concrete policy type. This avoids dynamic
52+
dispatch in the measured loop while keeping policy iteration centralized.
53+
`POLICIES` in the same module provides presentation metadata (stable id,
54+
display name, chart color) for renderers and reports.
55+
56+
The trade-off is that adding a policy touches the macro and metadata table. A
57+
test (`policies_metadata_matches_macro`) keeps the two from drifting. This is
58+
the same explicit-boilerplate-over-magic choice as `DynCache`: more arms in
59+
source, fewer surprises in hot code.
60+
61+
## Workload Registry
62+
63+
Workload definitions live in `bench-support/src/registry.rs`; generators live in
64+
[`bench-support/src/workload.rs`](../../bench-support/src/workload.rs). The
65+
current standard workloads cover:
66+
67+
- Uniform random keys for raw overhead baselines.
68+
- Hot-set access for explicit skew.
69+
- Sequential scan for scan-pollution stress.
70+
- Zipfian and scrambled Zipfian for power-law access.
71+
- Latest / recency-biased access.
72+
- Shifting hotspots and flash crowds for adaptation.
73+
- Composite scan-resistance mixes.
74+
75+
[`docs/benchmarks/workloads.md`](../benchmarks/workloads.md) is the catalog. It
76+
also contains a large roadmap of workloads that should not be confused with
77+
implemented cases. New workloads should land first in the support crate, then in
78+
the docs, then in reports.
79+
80+
## Value Construction Discipline
81+
82+
`benches/runner.rs` pre-allocates one `Arc<u64>` per key in the universe and
83+
passes a closure that returns `Arc::clone`:
84+
85+
```rust,ignore
86+
fn preallocate_values() -> Vec<Arc<u64>> {
87+
(0..UNIVERSE).map(Arc::new).collect()
88+
}
89+
```
90+
91+
The rule is: **do not allocate values inside the measured operation loop**.
92+
Allocating on every miss makes the benchmark measure the allocator and value
93+
constructor, not the policy. A cheap `Arc::clone` isolates hit/miss behaviour,
94+
eviction order, and policy metadata overhead.
95+
96+
This is especially important because policies store values differently:
97+
`FastLru` stores `V` directly, while LRU / LFU / Heap-LFU use `Arc<V>` in some
98+
paths. Pre-allocation keeps those representation differences from dominating
99+
the benchmark.
100+
101+
## Artifact Schema
102+
103+
`bench-support/src/json_results.rs` defines the stable JSON schema for results:
104+
105+
- `SCHEMA_VERSION` follows semantic schema rules.
106+
- Major bumps remove or rename required fields.
107+
- Minor bumps add optional fields.
108+
- Renderers accept any artifact with a matching major.
109+
110+
Each `BenchmarkArtifact` contains:
111+
112+
- `metadata`: timestamp, git commit, branch, dirty bit, rustc, host, CPU,
113+
benchmark config.
114+
- `results`: rows keyed by policy, workload, and `case_id`.
115+
- `metrics`: optional typed sections for hit rate, throughput, latency,
116+
eviction, scan resistance, adaptation speed.
117+
118+
The schema is presentation-neutral. Markdown tables and charts are rendered
119+
later by `bench-support/src/bin/render_docs.rs`, so measurement and presentation
120+
can evolve independently.
121+
122+
## Case IDs
123+
124+
Use `case_id::*` constants from `json_results.rs` instead of string literals:
125+
126+
- `hit_rate`
127+
- `comprehensive`
128+
- `scan_resistance`
129+
- `adaptation`
130+
131+
This catches typos at compile time and prevents a result section from silently
132+
disappearing from rendered docs. Adding a new case means adding a constant,
133+
teaching the runner to populate it, and teaching the renderer how to display it.
134+
135+
## What Each Benchmark Answers
136+
137+
| Benchmark | Question |
138+
|---|---|
139+
| `ops.rs` | What is the raw cost of `get` / `insert` / policy-specific operations? |
140+
| `workloads.rs` | Which policies preserve hit rate under standard workloads? |
141+
| `comparison.rs` | How does cachekit compare with external crates (`lru`, `quick_cache`)? |
142+
| `policy/*.rs` | What is the cost of each policy's unique operations? |
143+
| `reports.rs` | What should a human inspect while tuning? |
144+
| `runner.rs` | What should CI and docs consume? |
145+
146+
Do not overload one benchmark to answer all questions. If you need policy
147+
micro-cost, use `ops.rs`; if you need hit rate under scans, use `workloads.rs`
148+
or `runner.rs`.
149+
150+
## Reproducibility Rules
151+
152+
- Seed every workload. Default seed is 42 unless a benchmark is explicitly
153+
sweeping seeds.
154+
- Record the git dirty bit. Dirty runs are useful locally but should not be
155+
published as release baselines without a note.
156+
- Keep capacity, universe, and operation count visible in the artifact.
157+
- Prefer `ScrambledZipfian` over raw `Zipfian` for cross-policy comparison when
158+
hardware prefetch could bias hot-key locality.
159+
- Do not compare results across machines without CPU metadata. Tail latency and
160+
pointer-heavy policy cost are machine-sensitive.
161+
162+
## CI and Documentation Flow
163+
164+
The docs pipeline runs the benchmark suite, writes
165+
`target/benchmarks/<run-id>/results.json`, and renders
166+
`docs/benchmarks/latest/` plus charts. Release-tag snapshots live under
167+
`docs/benchmarks/vX.Y.Z/`.
168+
169+
Manual workflow:
170+
171+
```bash
172+
cargo bench --bench runner
173+
./scripts/update_benchmark_docs.sh
174+
```
175+
176+
The script is the high-level path for refreshing published benchmark docs. Use
177+
individual benches (`cargo bench --bench ops`, `cargo bench --bench reports -- scan`)
178+
while developing a policy.
179+
180+
## Adding a Policy to Benchmarks
181+
182+
1. Add the policy to `for_each_policy!` with a concrete constructor.
183+
2. Add matching `PolicyMeta` in `POLICIES`.
184+
3. Run the registry drift test.
185+
4. Run `cargo bench --bench reports -- hit_rate` for a quick sanity check.
186+
5. Run `cargo bench --bench runner` before publishing docs.
187+
188+
Keep constructors comparable. If one policy needs `Arc<u64>` and another stores
189+
`u64`, choose the value shape that preserves fairness and document the exception
190+
in the registry comment.
191+
192+
## Adding a Workload
193+
194+
1. Implement the generator in `bench-support/src/workload.rs`.
195+
2. Add a `WorkloadCase` in the registry with stable id and display name.
196+
3. Add docs in [`docs/benchmarks/workloads.md`](../benchmarks/workloads.md).
197+
4. Add renderer support if the workload needs a custom section.
198+
5. Run at least one policy family expected to behave differently (for example,
199+
LRU vs S3-FIFO for scan-heavy workloads).
200+
201+
Do not add a workload just because it is mathematically interesting. It should
202+
answer a policy-selection question.
203+
204+
## Non-goals
205+
206+
- Benchmarks are not formal proofs of policy optimality.
207+
- Benchmarks are not stable ABI. The JSON schema is versioned, but Criterion
208+
names and report formatting can change.
209+
- Benchmarks do not hide hardware effects. They record enough metadata for the
210+
reader to judge them.
211+
- Benchmarks do not replace fuzzing or invariant tests; they measure behaviour
212+
under selected workloads.
213+
214+
## See Also
215+
216+
- [Design overview](design.md) - §10 frames benchmarking at the principles level
217+
- [Metrics](metrics.md) - recorder / snapshot / exporter split
218+
- [Benchmark docs](../benchmarks/README.md)
219+
- [Workload catalog](../benchmarks/workloads.md)
220+
- [`bench-support/src/registry.rs`](../../bench-support/src/registry.rs)
221+
- [`bench-support/src/json_results.rs`](../../bench-support/src/json_results.rs)
222+
- [`benches/runner.rs`](../../benches/runner.rs)

docs/design/design.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,16 @@ Design docs:
339339
`MetricsCell`, Prometheus exporter, feature gating
340340
- [Error model](error-model.md) — panic vs `Result` discipline,
341341
four error types, debug-only invariant checks
342+
- [Benchmarking](benchmarking.md) — benchmark layers, monomorphic policy
343+
registry, JSON artifact schema, reproducibility rules
344+
- [Hashing and key identity](hashing.md) — hasher choices, `KeyInterner`,
345+
`ShardSelector`, HashDoS trade-offs
346+
- [Sharding](sharding.md) — current sharded primitives, routing,
347+
capacity semantics, roadmap for sharded caches
348+
- [Serialization](serialization.md) — current `serde` surface, cache-state
349+
persistence boundaries, TTL and hash-seed rules
350+
- [Non-goals](non-goals.md) — explicit boundaries for what cachekit does
351+
not try to be
342352
- [TTL](ttl.md) — applied example of every principle above
343353
- [Doc style guide](style-guide.md)
344354

0 commit comments

Comments
 (0)