Skip to content

Commit 42c0b7e

Browse files
TKorrcursoragent
andcommitted
docs: add testing catalog for coverage taxonomy and gaps
Centralize test types, CI integration notes, and prioritized coverage gaps so contributors can find what exists and what is still missing without duplicating the policy spec matrix. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 949e1c9 commit 42c0b7e

7 files changed

Lines changed: 217 additions & 1 deletion

File tree

CONTRIBUTING.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,10 @@ Use conventional commit format for PR titles:
198198

199199
## Testing Guidelines
200200

201+
### Testing overview
202+
203+
For a full taxonomy of test types, what CacheKit runs today, coverage gaps, and suggested priorities, see the [Testing catalog](docs/testing/catalog.md).
204+
201205
### Unit Tests
202206

203207
- Write tests for each public function

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,5 +59,6 @@ Key features:
5959
## Testing and Fuzzing
6060

6161
- [Testing strategy](testing/testing.md)
62+
- [Testing catalog](testing/catalog.md) — test types, current coverage, and gaps
6263
- [Policy semantic testing](testing/static-analysis.md)
6364
- [Adding fuzz targets](testing/adding-fuzz-targets.md)

docs/testing/catalog.md

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
# Testing Catalog
2+
3+
Reference for **testing types** used in CacheKit, types we do not use yet, and **coverage gaps** worth closing. For how to run tests and contributor workflows, see [Testing strategy](testing.md). For the policy semantic harness, see [Policy semantic testing](static-analysis.md) and the [policy spec matrix](specs/matrix.md).
4+
5+
## Overview
6+
7+
CacheKit combines several layers: unit tests close to implementation, property tests and semantic oracles for eviction correctness, fuzz targets for data-structure robustness, Miri for undefined-behavior smoke checks, integration tests for concurrency and composition, and Criterion benchmarks for performance analysis.
8+
9+
Not every testing type applies to every component. Exact eviction policies benefit most from model-based dual-run tests; adaptive policies need weaker legal-set oracles; concurrent wrappers need stress tests and thread-safety tooling; product-facing performance claims belong in benchmarks with optional regression gates.
10+
11+
```text
12+
fast / cheap slow / expensive
13+
│ │
14+
unit ───────┼── property ─── integration ─── fuzz ─── formal
15+
│ │ │ │
16+
│ policy_semantics │ Miri / Loom
17+
│ (model dual-run) │
18+
└──────── benchmarks / workload sim ────────┘
19+
```
20+
21+
## What CacheKit uses today
22+
23+
| Type | Purpose | Location / entrypoint | CI |
24+
|------|---------|----------------------|-----|
25+
| **Unit tests** | Specific behaviors and edge cases | `#[cfg(test)] mod tests` in `src/` | Partial — filtered Miri on `--lib ds::` / `policy::`; not full `mod tests` in main test job |
26+
| **Property tests (in-module)** | DS invariants over random inputs | `#[cfg(test)] mod property_tests` in `src/ds/`, some policies | `PROPTEST_CASES=1000 cargo test --lib property_tests` (property-tests job) |
27+
| **Policy semantic tests** | Eviction/residency oracles vs implementations | `tests/abstract_models/`, `tests/policy_semantics/` | `cargo test --test policy_semantics --all-features`; higher case count in property-tests job |
28+
| **Dual-impl equivalence** | Two implementations, same trace | `tests/policy_semantics/dual_impl_tests.rs` | Via `cargo test --test policy_semantics --all-features` |
29+
| **Integration tests** | Multi-module wiring | `tests/lru_integration_test.rs`, `tests/ttl_integration_test.rs`, other `tests/*.rs` | `cargo test --tests --all-features` (main test job) |
30+
| **Cross-policy invariants** | Library-wide consistency | `tests/policy_invariants.rs` | `cargo test --tests --all-features` |
31+
| **Concurrency / stress** | Races and atomicity under load | `tests/*_concurrency.rs`, `tests/slab_concurrency.rs` | `cargo test --tests --all-features`; `lru_concurrency` / `slab_concurrency` need `concurrency` feature |
32+
| **O(1) regression** | Algorithmic complexity guards | `tests/performance_regression.rs` | `cargo test --tests --all-features` |
33+
| **Fuzz tests** | Crashes and invariant violations on mutated input | `fuzz/fuzz_targets/` (27 targets, DS layer) | PR smoke (60s/target), nightly continuous |
34+
| **Miri** | Undefined-behavior smoke checks | Lib `ds::` / `policy::` (filtered) + `policy_semantics` `smoke_*` | Dedicated Miri job |
35+
| **Doc tests** | Rustdoc examples compile and run | `///` examples in `src/` | `cargo test --doc --all-features` |
36+
| **Example tests** | `examples/` compile and pass | `examples/` | `cargo test --examples --all-features` |
37+
| **Benchmarks** | Throughput, hit rate, cross-policy comparison | `benches/`, `bench-support/` | `cargo bench` on `main` only |
38+
| **Static analysis** | Style, lints, deps, security | clippy, rustfmt, `cargo audit`, `cargo deny` | Dedicated CI jobs |
39+
| **Formal verification (manual)** | Reachable-state invariants on finite instances | `docs/testing/specs/formal/` (FIFO, LRU TLA+) | Not in CI; run `./scripts/run-tlc.sh` locally |
40+
| **MSRV check** | Minimum supported Rust | `Cargo.toml` MSRV | Dedicated CI job |
41+
42+
### Policy semantic coverage by tier
43+
44+
Canonical per-policy status: [matrix.md](specs/matrix.md).
45+
46+
| Tier | Harness mode | Policies | Oracle strength |
47+
|------|--------------|----------|-----------------|
48+
| Exact | DualRun + CrossModel | FIFO, LRU, Fast-LRU, LIFO, MRU, LFU, Heap-LFU, MFU, LRU-K | Strong — independent `reference/` models for all exact-tier policies |
49+
| Mirror | DualRun | Clock, 2Q, SLRU, NRU | Medium — `exact/` models mirror implementation shape; specs still `stub` |
50+
| Bounded | InvariantOnly | ARC, CAR, Clock-PRO, S3-FIFO | Weak — structural invariants only; victim legal sets deferred |
51+
| Composed | DualRun + deadlines | TTL | Medium — `LruOccupancyModel` + TTL integration tests; spec still `stub` |
52+
| Deferred || Random | None — no semantic harness |
53+
54+
### Concurrency test coverage
55+
56+
Three policies ship native `Concurrent*` wrappers today (LRU, FIFO, S3-FIFO). See [Concurrency](../design/concurrency.md).
57+
58+
Integration concurrency tests in `tests/` cover FIFO, LFU, LRU, LRU-K, NRU, and `ConcurrentSlabStore`. Only `lru_concurrency.rs` and `slab_concurrency.rs` require the `concurrency` feature; the other `*_concurrency.rs` files run unconditionally. `lru_concurrency.rs` exercises `ConcurrentLruCache`; FIFO, LFU, and NRU use ad-hoc `Arc<Mutex<…>>` wrappers around sequential cores rather than native `Concurrent*` types. `ConcurrentS3FifoCache` has in-module tests in `src/policy/s3_fifo.rs` but no dedicated `tests/*_concurrency.rs` file.
59+
60+
### Performance regression coverage
61+
62+
`performance_regression.rs` verifies O(1) scaling for **LRU, LRU-K, LFU, Clock, and S3-FIFO** only. Other policies rely on unit tests and semantic harnesses for correctness, not complexity guards.
63+
64+
### Fuzz coverage
65+
66+
All 27 fuzz targets exercise **internal data structures** (ClockRing, SlotArena, GhostList, etc.), not end-to-end cache policy APIs. See [Fuzzing in CI/CD](fuzzing-cicd.md) and [fuzz/README.md](../../fuzz/README.md).
67+
68+
---
69+
70+
## Testing types not used (or barely used)
71+
72+
The tables below list other testing types common in systems and Rust libraries. **Relevance** indicates how well each type fits CacheKit.
73+
74+
### Correctness and semantics
75+
76+
| Type | What it checks | CacheKit today | Relevance |
77+
|------|----------------|----------------|-----------|
78+
| **Golden / snapshot tests** | Output matches a saved baseline | Not used | Low — semantic oracles are more stable for caches |
79+
| **Model-based testing** | Random traces vs a reference model | Core of `policy_semantics` | **High** — already central |
80+
| **Equivalence testing** | Two implementations, same behavior | `dual_impl_tests.rs` (LRU/Fast-LRU, Clock/ClockRing) | **High** — extend to more pairs |
81+
| **Differential testing** | Output matches an external library on same traces | Cross-library benches only | **Medium** — good for hit-rate sanity, not CI today |
82+
| **Formal verification** | All reachable states satisfy invariants | TLA+ for FIFO/LRU (manual) | **Medium** — complementary to proptest |
83+
| **Symbolic / concolic execution** | Systematic path exploration | Not used | Low for this codebase size |
84+
| **Mutation testing** | Tests catch injected bugs | Not used | **Medium** — validates harness strength |
85+
86+
### Input exploration
87+
88+
| Type | What it checks | CacheKit today | Relevance |
89+
|------|----------------|----------------|-----------|
90+
| **Property-based testing** | Invariants over random inputs | proptest throughout | **High** — already central |
91+
| **Fuzz testing** | Panics and invariant breaks on arbitrary bytes | DS layer only | **High** — policy-level fuzz is a gap |
92+
| **Regression corpora** | Saved minimal failing inputs | `tests/proptest-regressions/` | **Medium** — expand as failures shrink |
93+
| **Adversarial testing** | Hash-collision DoS, untrusted keys | Documented (`KeysAreTrusted`) | **Medium** — no dedicated suite |
94+
95+
### Concurrency and runtime
96+
97+
| Type | What it checks | CacheKit today | Relevance |
98+
|------|----------------|----------------|-----------|
99+
| **Concurrency / stress tests** | Races under threaded load | Partial (`*_concurrency.rs`) | **High** |
100+
| **Miri** | UB detection under interpreted execution | Curated subset | **High** — extend to TTL smoke |
101+
| **Loom / Shuttle** | Exhaustive interleavings on small models | Not used | **High** for `Concurrent*` wrappers |
102+
| **Sanitizer CI (TSan/ASan)** | Runtime memory and thread bugs | Not in CI | **Medium** — pairs well with concurrency tests |
103+
104+
### Performance and capacity
105+
106+
| Type | What it checks | CacheKit today | Relevance |
107+
|------|----------------|----------------|-----------|
108+
| **Micro-benchmarks** | Op latency and throughput | Criterion in `benches/` | **High** — already central |
109+
| **Workload simulation** | Zipfian, scan, mixed R/W | `benches/workloads.rs` | **High** — benchmarks only |
110+
| **Complexity / scaling tests** | O(1) vs O(n) as size grows | 5 policies in `performance_regression.rs` | **High** — expand coverage |
111+
| **Performance regression gates** | CI fails on perf drop vs baseline | Benches on `main` only | **Medium** — optional nightly/PR gate |
112+
| **Memory / allocation profiling** | Allocs in hot paths | Not automated | **Medium** — aligns with design goals |
113+
| **Soak / endurance tests** | Long-run leaks and drift | Not used | **Low–medium** — nightly fuzz partially covers |
114+
115+
### Build, platform, and delivery
116+
117+
| Type | What it checks | CacheKit today | Relevance |
118+
|------|----------------|----------------|-----------|
119+
| **Feature-matrix testing** | All `Cargo` feature combinations | `--all-features` in CI | **High** — in place |
120+
| **Cross-compilation** | Multiple OS/arch targets | ubuntu, macos, windows | **High** — in place |
121+
| **Compatibility / semver tests** | Public API stable across releases | Not automated | **Medium** — as API stabilizes |
122+
| **Serialization round-trip** | Persist and restore cache state | Design docs; limited tests | **Low** until serde ships |
123+
124+
### Security and robustness
125+
126+
| Type | What it checks | CacheKit today | Relevance |
127+
|------|----------------|----------------|-----------|
128+
| **Panic safety tests** | Invariants restored after panic | Partial in `ConcurrentLruCache` tests | **Medium** |
129+
| **Boundary tests** | capacity=0, capacity=1, empty cache | Partial in `policy_invariants.rs` | **Medium** — extend cross-policy |
130+
| **DoS-resistance tests** | Untrusted keys and hasher choice | Documented, not a dedicated suite | **Medium** |
131+
132+
---
133+
134+
## Known coverage gaps
135+
136+
Prioritized gaps between current practice and the testing goals in [Testing strategy](testing.md) and [Policy semantic testing](static-analysis.md).
137+
138+
### 1. Semantic oracle depth (highest impact)
139+
140+
- **Bounded policies** (ARC, CAR, Clock-PRO, S3-FIFO): `InvariantOnly` checks do not assert victim correctness. Future work: `OracleExpectation::Legal` legal-victim sets ([static-analysis.md](static-analysis.md)).
141+
- **Mirror policies** (Clock, 2Q, SLRU, NRU): oracles mirror implementation; independent `reference/` models and `reference` spec maturity are pending.
142+
- **Random**: no `PolicyModel` or `policy_semantics` harness ([matrix.md](specs/matrix.md)).
143+
- **TTL**: integration tests exist; Miri smoke and composed-tier spec maturity are incomplete.
144+
- **Path-sensitive traces**: documented as future harness work, not implemented.
145+
146+
### 2. Concurrency
147+
148+
- Only LRU, FIFO, and S3-FIFO ship `Concurrent*` wrappers; 15 policies rely on external locking ([Concurrency](../design/concurrency.md)).
149+
- No `tests/*_concurrency.rs` integration suite for `ConcurrentS3FifoCache` (in-module tests exist in `src/policy/s3_fifo.rs`).
150+
- No concurrency tests for Clock, ARC, CAR, 2Q, SLRU, MFU, and others.
151+
- FIFO/LFU/NRU integration concurrency tests use `Mutex` wrappers, not native `Concurrent*` APIs.
152+
153+
### 3. Performance regression breadth
154+
155+
- O(1) guards cover 5 of ~18 policies.
156+
- Workload hit-rate and throughput regressions are not gated on PRs (benchmarks run on `main` only).
157+
158+
### 4. Fuzz scope
159+
160+
- All targets hit DS primitives, not full `Cache` API sequences per policy.
161+
- Likely missing DS fuzz: `expiration_index` (TTL path).
162+
163+
### 5. Formal verification automation
164+
165+
- TLA+ exists for FIFO and LRU only; TLC is manual, not CI.
166+
- No trace-export bridge between TLC reachable states and Rust oracle traces ([formal/fifo/tlc.md](specs/formal/fifo/tlc.md)).
167+
168+
### 6. Cross-cutting integration
169+
170+
- **Weighted eviction**: unit tests in `store/weight.rs`; no integration tests ([Weighted eviction](../design/weighted-eviction.md)).
171+
- **Builder / DynCache**: TTL builder path tested; CAR builder gap documented but not covered ([Builder and runtime dispatch](../design/builder-and-dyn-dispatch.md)).
172+
- **Cross-policy invariants**: `policy_invariants.rs` is thin (mostly capacity-0 edge cases).
173+
- **Dual-impl equivalence**: only two pairs today (LRU/Fast-LRU, Clock/ClockRing).
174+
175+
### 7. Process and documentation maturity
176+
177+
- Several policies have harness tests but operational specs remain `stub` (mirror, bounded, composed tiers).
178+
- [Testing strategy](testing.md) describes in-module `property_tests` per module; most policies delegate to `policy_semantics` instead.
179+
- Main CI **test** job runs `cargo test --tests --all-features` only; lib `mod tests` outside `property_tests` are not executed there (partial lib coverage via Miri and the property-tests job).
180+
181+
---
182+
183+
## Recommended additions (priority order)
184+
185+
1. **Legal-victim oracles for bounded policies** — upgrade invariant-only tests to semantic checks where victims are not unique.
186+
2. **Independent reference models for mirror policies** — cross-model drift guards like exact-tier LRU.
187+
3. **End-to-end policy fuzz targets** — full `Cache` op sequences, complementing DS fuzz.
188+
4. **Expand O(1) regression** — FIFO, Fast-LRU, Heap-LFU, MFU, and other hot policies.
189+
5. **`tests/*_concurrency.rs` for `ConcurrentS3FifoCache`** (beyond in-module tests) and Miri smoke for TTL.
190+
6. **Random policy harness** — even a statistical residency/uniformity oracle helps benchmarking baselines.
191+
7. **Optional nightly TLC job** — FIFO/LRU formal specs as a non-blocking check.
192+
8. **Loom or TSan on concurrency tests** — small-model or sanitizer coverage for `Concurrent*` wrappers.
193+
194+
When adding a new test type, update this catalog if coverage or CI integration changes materially (including `.github/workflows/ci.yml`).
195+
196+
---
197+
198+
## Related documentation
199+
200+
- [Testing strategy](testing.md) — philosophy, four layers, how to run tests
201+
- [Policy semantic testing](static-analysis.md) — harness architecture and contributor checklist
202+
- [Policy spec matrix](specs/matrix.md) — canonical per-policy harness index
203+
- [Fuzzing in CI/CD](fuzzing-cicd.md) — fuzz pipeline and target discovery
204+
- [Adding fuzz targets](adding-fuzz-targets.md) — contributor guide for new fuzz targets
205+
- [tests/README.md](../../tests/README.md) — integration test layout
206+
- [abstract_models README](../../tests/abstract_models/README.md) — oracle models and tiers
207+
- [Benchmarking design](../design/benchmarking.md) — benchmark layers vs regression tests
208+
- [Concurrency](../design/concurrency.md)`Concurrent*` coverage and gaps

docs/testing/specs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ Success: no `SemanticOK` violation on the bundled config. Runbooks: [formal/fifo
7878

7979
## Related documentation
8080

81+
- [Testing catalog](../catalog.md) — test types, current coverage, and gaps
8182
- [Policy matrix](matrix.md) — canonical index
8283
- [Policy specs by tier](policies/README.md)
8384
- [Spec template](template.md) — new policy skeleton

docs/testing/static-analysis.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,7 @@ Use `op_strategy_no_evict()` when the policy lacks [`EvictingCache`](../../src/t
181181

182182
## Related documentation
183183

184+
- [Testing catalog](catalog.md) — test types, current coverage, and gaps
184185
- [Operational policy specs](specs/README.md) — spec-first source of truth
185186
- [Abstract models README](../../tests/abstract_models/README.md) — directory layout and policy matrix
186187
- [Testing strategy](testing.md) — four test layers including policy semantics

docs/testing/testing.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,7 @@ fuzz_target!(|data: &[u8]| {
283283

284284
## Related Documentation
285285

286+
- [Testing catalog](catalog.md) — test types, current coverage, and gaps
286287
- [Contributing Guide](../../CONTRIBUTING.md)
287288
- [Fuzz Testing](../../fuzz/README.md)
288289
- [Benchmarking](../../benches/README.md)

tests/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Test Organization
22

3-
This directory contains all integration and regression tests for cachekit.
3+
This directory contains all integration and regression tests for cachekit. For a full map of test types, CI coverage, and gaps, see the [Testing catalog](../docs/testing/catalog.md).
44

55
## Test Files
66

0 commit comments

Comments
 (0)