|
| 1 | +# Testing Catalog |
| 2 | + |
| 3 | +Reference for **testing types** used in CacheKit, types we do not use yet, and **coverage gaps** worth closing. For how to run tests and contributor workflows, see [Testing strategy](testing.md). For the policy semantic harness, see [Policy semantic testing](static-analysis.md) and the [policy spec matrix](specs/matrix.md). |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +CacheKit combines several layers: unit tests close to implementation, property tests and semantic oracles for eviction correctness, fuzz targets for data-structure robustness, Miri for undefined-behavior smoke checks, integration tests for concurrency and composition, and Criterion benchmarks for performance analysis. |
| 8 | + |
| 9 | +Not every testing type applies to every component. Exact eviction policies benefit most from model-based dual-run tests; adaptive policies need weaker legal-set oracles; concurrent wrappers need stress tests and thread-safety tooling; product-facing performance claims belong in benchmarks with optional regression gates. |
| 10 | + |
| 11 | +```text |
| 12 | + fast / cheap slow / expensive |
| 13 | + │ │ |
| 14 | + unit ───────┼── property ─── integration ─── fuzz ─── formal |
| 15 | + │ │ │ │ |
| 16 | + │ policy_semantics │ Miri / Loom |
| 17 | + │ (model dual-run) │ |
| 18 | + └──────── benchmarks / workload sim ────────┘ |
| 19 | +``` |
| 20 | + |
| 21 | +## What CacheKit uses today |
| 22 | + |
| 23 | +| Type | Purpose | Location / entrypoint | CI | |
| 24 | +|------|---------|----------------------|-----| |
| 25 | +| **Unit tests** | Specific behaviors and edge cases | `#[cfg(test)] mod tests` in `src/` | Partial — filtered Miri on `--lib ds::` / `policy::`; not full `mod tests` in main test job | |
| 26 | +| **Property tests (in-module)** | DS invariants over random inputs | `#[cfg(test)] mod property_tests` in `src/ds/`, some policies | `PROPTEST_CASES=1000 cargo test --lib property_tests` (property-tests job) | |
| 27 | +| **Policy semantic tests** | Eviction/residency oracles vs implementations | `tests/abstract_models/`, `tests/policy_semantics/` | `cargo test --test policy_semantics --all-features`; higher case count in property-tests job | |
| 28 | +| **Dual-impl equivalence** | Two implementations, same trace | `tests/policy_semantics/dual_impl_tests.rs` | Via `cargo test --test policy_semantics --all-features` | |
| 29 | +| **Integration tests** | Multi-module wiring | `tests/lru_integration_test.rs`, `tests/ttl_integration_test.rs`, other `tests/*.rs` | `cargo test --tests --all-features` (main test job) | |
| 30 | +| **Cross-policy invariants** | Library-wide consistency | `tests/policy_invariants.rs` | `cargo test --tests --all-features` | |
| 31 | +| **Concurrency / stress** | Races and atomicity under load | `tests/*_concurrency.rs`, `tests/slab_concurrency.rs` | `cargo test --tests --all-features`; `lru_concurrency` / `slab_concurrency` need `concurrency` feature | |
| 32 | +| **O(1) regression** | Algorithmic complexity guards | `tests/performance_regression.rs` | `cargo test --tests --all-features` | |
| 33 | +| **Fuzz tests** | Crashes and invariant violations on mutated input | `fuzz/fuzz_targets/` (27 targets, DS layer) | PR smoke (60s/target), nightly continuous | |
| 34 | +| **Miri** | Undefined-behavior smoke checks | Lib `ds::` / `policy::` (filtered) + `policy_semantics` `smoke_*` | Dedicated Miri job | |
| 35 | +| **Doc tests** | Rustdoc examples compile and run | `///` examples in `src/` | `cargo test --doc --all-features` | |
| 36 | +| **Example tests** | `examples/` compile and pass | `examples/` | `cargo test --examples --all-features` | |
| 37 | +| **Benchmarks** | Throughput, hit rate, cross-policy comparison | `benches/`, `bench-support/` | `cargo bench` on `main` only | |
| 38 | +| **Static analysis** | Style, lints, deps, security | clippy, rustfmt, `cargo audit`, `cargo deny` | Dedicated CI jobs | |
| 39 | +| **Formal verification (manual)** | Reachable-state invariants on finite instances | `docs/testing/specs/formal/` (FIFO, LRU TLA+) | Not in CI; run `./scripts/run-tlc.sh` locally | |
| 40 | +| **MSRV check** | Minimum supported Rust | `Cargo.toml` MSRV | Dedicated CI job | |
| 41 | + |
| 42 | +### Policy semantic coverage by tier |
| 43 | + |
| 44 | +Canonical per-policy status: [matrix.md](specs/matrix.md). |
| 45 | + |
| 46 | +| Tier | Harness mode | Policies | Oracle strength | |
| 47 | +|------|--------------|----------|-----------------| |
| 48 | +| Exact | DualRun + CrossModel | FIFO, LRU, Fast-LRU, LIFO, MRU, LFU, Heap-LFU, MFU, LRU-K | Strong — independent `reference/` models for all exact-tier policies | |
| 49 | +| Mirror | DualRun | Clock, 2Q, SLRU, NRU | Medium — `exact/` models mirror implementation shape; specs still `stub` | |
| 50 | +| Bounded | InvariantOnly | ARC, CAR, Clock-PRO, S3-FIFO | Weak — structural invariants only; victim legal sets deferred | |
| 51 | +| Composed | DualRun + deadlines | TTL | Medium — `LruOccupancyModel` + TTL integration tests; spec still `stub` | |
| 52 | +| Deferred | — | Random | None — no semantic harness | |
| 53 | + |
| 54 | +### Concurrency test coverage |
| 55 | + |
| 56 | +Three policies ship native `Concurrent*` wrappers today (LRU, FIFO, S3-FIFO). See [Concurrency](../design/concurrency.md). |
| 57 | + |
| 58 | +Integration concurrency tests in `tests/` cover FIFO, LFU, LRU, LRU-K, NRU, and `ConcurrentSlabStore`. Only `lru_concurrency.rs` and `slab_concurrency.rs` require the `concurrency` feature; the other `*_concurrency.rs` files run unconditionally. `lru_concurrency.rs` exercises `ConcurrentLruCache`; FIFO, LFU, and NRU use ad-hoc `Arc<Mutex<…>>` wrappers around sequential cores rather than native `Concurrent*` types. `ConcurrentS3FifoCache` has in-module tests in `src/policy/s3_fifo.rs` but no dedicated `tests/*_concurrency.rs` file. |
| 59 | + |
| 60 | +### Performance regression coverage |
| 61 | + |
| 62 | +`performance_regression.rs` verifies O(1) scaling for **LRU, LRU-K, LFU, Clock, and S3-FIFO** only. Other policies rely on unit tests and semantic harnesses for correctness, not complexity guards. |
| 63 | + |
| 64 | +### Fuzz coverage |
| 65 | + |
| 66 | +All 27 fuzz targets exercise **internal data structures** (ClockRing, SlotArena, GhostList, etc.), not end-to-end cache policy APIs. See [Fuzzing in CI/CD](fuzzing-cicd.md) and [fuzz/README.md](../../fuzz/README.md). |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +## Testing types not used (or barely used) |
| 71 | + |
| 72 | +The tables below list other testing types common in systems and Rust libraries. **Relevance** indicates how well each type fits CacheKit. |
| 73 | + |
| 74 | +### Correctness and semantics |
| 75 | + |
| 76 | +| Type | What it checks | CacheKit today | Relevance | |
| 77 | +|------|----------------|----------------|-----------| |
| 78 | +| **Golden / snapshot tests** | Output matches a saved baseline | Not used | Low — semantic oracles are more stable for caches | |
| 79 | +| **Model-based testing** | Random traces vs a reference model | Core of `policy_semantics` | **High** — already central | |
| 80 | +| **Equivalence testing** | Two implementations, same behavior | `dual_impl_tests.rs` (LRU/Fast-LRU, Clock/ClockRing) | **High** — extend to more pairs | |
| 81 | +| **Differential testing** | Output matches an external library on same traces | Cross-library benches only | **Medium** — good for hit-rate sanity, not CI today | |
| 82 | +| **Formal verification** | All reachable states satisfy invariants | TLA+ for FIFO/LRU (manual) | **Medium** — complementary to proptest | |
| 83 | +| **Symbolic / concolic execution** | Systematic path exploration | Not used | Low for this codebase size | |
| 84 | +| **Mutation testing** | Tests catch injected bugs | Not used | **Medium** — validates harness strength | |
| 85 | + |
| 86 | +### Input exploration |
| 87 | + |
| 88 | +| Type | What it checks | CacheKit today | Relevance | |
| 89 | +|------|----------------|----------------|-----------| |
| 90 | +| **Property-based testing** | Invariants over random inputs | proptest throughout | **High** — already central | |
| 91 | +| **Fuzz testing** | Panics and invariant breaks on arbitrary bytes | DS layer only | **High** — policy-level fuzz is a gap | |
| 92 | +| **Regression corpora** | Saved minimal failing inputs | `tests/proptest-regressions/` | **Medium** — expand as failures shrink | |
| 93 | +| **Adversarial testing** | Hash-collision DoS, untrusted keys | Documented (`KeysAreTrusted`) | **Medium** — no dedicated suite | |
| 94 | + |
| 95 | +### Concurrency and runtime |
| 96 | + |
| 97 | +| Type | What it checks | CacheKit today | Relevance | |
| 98 | +|------|----------------|----------------|-----------| |
| 99 | +| **Concurrency / stress tests** | Races under threaded load | Partial (`*_concurrency.rs`) | **High** | |
| 100 | +| **Miri** | UB detection under interpreted execution | Curated subset | **High** — extend to TTL smoke | |
| 101 | +| **Loom / Shuttle** | Exhaustive interleavings on small models | Not used | **High** for `Concurrent*` wrappers | |
| 102 | +| **Sanitizer CI (TSan/ASan)** | Runtime memory and thread bugs | Not in CI | **Medium** — pairs well with concurrency tests | |
| 103 | + |
| 104 | +### Performance and capacity |
| 105 | + |
| 106 | +| Type | What it checks | CacheKit today | Relevance | |
| 107 | +|------|----------------|----------------|-----------| |
| 108 | +| **Micro-benchmarks** | Op latency and throughput | Criterion in `benches/` | **High** — already central | |
| 109 | +| **Workload simulation** | Zipfian, scan, mixed R/W | `benches/workloads.rs` | **High** — benchmarks only | |
| 110 | +| **Complexity / scaling tests** | O(1) vs O(n) as size grows | 5 policies in `performance_regression.rs` | **High** — expand coverage | |
| 111 | +| **Performance regression gates** | CI fails on perf drop vs baseline | Benches on `main` only | **Medium** — optional nightly/PR gate | |
| 112 | +| **Memory / allocation profiling** | Allocs in hot paths | Not automated | **Medium** — aligns with design goals | |
| 113 | +| **Soak / endurance tests** | Long-run leaks and drift | Not used | **Low–medium** — nightly fuzz partially covers | |
| 114 | + |
| 115 | +### Build, platform, and delivery |
| 116 | + |
| 117 | +| Type | What it checks | CacheKit today | Relevance | |
| 118 | +|------|----------------|----------------|-----------| |
| 119 | +| **Feature-matrix testing** | All `Cargo` feature combinations | `--all-features` in CI | **High** — in place | |
| 120 | +| **Cross-compilation** | Multiple OS/arch targets | ubuntu, macos, windows | **High** — in place | |
| 121 | +| **Compatibility / semver tests** | Public API stable across releases | Not automated | **Medium** — as API stabilizes | |
| 122 | +| **Serialization round-trip** | Persist and restore cache state | Design docs; limited tests | **Low** until serde ships | |
| 123 | + |
| 124 | +### Security and robustness |
| 125 | + |
| 126 | +| Type | What it checks | CacheKit today | Relevance | |
| 127 | +|------|----------------|----------------|-----------| |
| 128 | +| **Panic safety tests** | Invariants restored after panic | Partial in `ConcurrentLruCache` tests | **Medium** | |
| 129 | +| **Boundary tests** | capacity=0, capacity=1, empty cache | Partial in `policy_invariants.rs` | **Medium** — extend cross-policy | |
| 130 | +| **DoS-resistance tests** | Untrusted keys and hasher choice | Documented, not a dedicated suite | **Medium** | |
| 131 | + |
| 132 | +--- |
| 133 | + |
| 134 | +## Known coverage gaps |
| 135 | + |
| 136 | +Prioritized gaps between current practice and the testing goals in [Testing strategy](testing.md) and [Policy semantic testing](static-analysis.md). |
| 137 | + |
| 138 | +### 1. Semantic oracle depth (highest impact) |
| 139 | + |
| 140 | +- **Bounded policies** (ARC, CAR, Clock-PRO, S3-FIFO): `InvariantOnly` checks do not assert victim correctness. Future work: `OracleExpectation::Legal` legal-victim sets ([static-analysis.md](static-analysis.md)). |
| 141 | +- **Mirror policies** (Clock, 2Q, SLRU, NRU): oracles mirror implementation; independent `reference/` models and `reference` spec maturity are pending. |
| 142 | +- **Random**: no `PolicyModel` or `policy_semantics` harness ([matrix.md](specs/matrix.md)). |
| 143 | +- **TTL**: integration tests exist; Miri smoke and composed-tier spec maturity are incomplete. |
| 144 | +- **Path-sensitive traces**: documented as future harness work, not implemented. |
| 145 | + |
| 146 | +### 2. Concurrency |
| 147 | + |
| 148 | +- Only LRU, FIFO, and S3-FIFO ship `Concurrent*` wrappers; 15 policies rely on external locking ([Concurrency](../design/concurrency.md)). |
| 149 | +- No `tests/*_concurrency.rs` integration suite for `ConcurrentS3FifoCache` (in-module tests exist in `src/policy/s3_fifo.rs`). |
| 150 | +- No concurrency tests for Clock, ARC, CAR, 2Q, SLRU, MFU, and others. |
| 151 | +- FIFO/LFU/NRU integration concurrency tests use `Mutex` wrappers, not native `Concurrent*` APIs. |
| 152 | + |
| 153 | +### 3. Performance regression breadth |
| 154 | + |
| 155 | +- O(1) guards cover 5 of ~18 policies. |
| 156 | +- Workload hit-rate and throughput regressions are not gated on PRs (benchmarks run on `main` only). |
| 157 | + |
| 158 | +### 4. Fuzz scope |
| 159 | + |
| 160 | +- All targets hit DS primitives, not full `Cache` API sequences per policy. |
| 161 | +- Likely missing DS fuzz: `expiration_index` (TTL path). |
| 162 | + |
| 163 | +### 5. Formal verification automation |
| 164 | + |
| 165 | +- TLA+ exists for FIFO and LRU only; TLC is manual, not CI. |
| 166 | +- No trace-export bridge between TLC reachable states and Rust oracle traces ([formal/fifo/tlc.md](specs/formal/fifo/tlc.md)). |
| 167 | + |
| 168 | +### 6. Cross-cutting integration |
| 169 | + |
| 170 | +- **Weighted eviction**: unit tests in `store/weight.rs`; no integration tests ([Weighted eviction](../design/weighted-eviction.md)). |
| 171 | +- **Builder / DynCache**: TTL builder path tested; CAR builder gap documented but not covered ([Builder and runtime dispatch](../design/builder-and-dyn-dispatch.md)). |
| 172 | +- **Cross-policy invariants**: `policy_invariants.rs` is thin (mostly capacity-0 edge cases). |
| 173 | +- **Dual-impl equivalence**: only two pairs today (LRU/Fast-LRU, Clock/ClockRing). |
| 174 | + |
| 175 | +### 7. Process and documentation maturity |
| 176 | + |
| 177 | +- Several policies have harness tests but operational specs remain `stub` (mirror, bounded, composed tiers). |
| 178 | +- [Testing strategy](testing.md) describes in-module `property_tests` per module; most policies delegate to `policy_semantics` instead. |
| 179 | +- Main CI **test** job runs `cargo test --tests --all-features` only; lib `mod tests` outside `property_tests` are not executed there (partial lib coverage via Miri and the property-tests job). |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## Recommended additions (priority order) |
| 184 | + |
| 185 | +1. **Legal-victim oracles for bounded policies** — upgrade invariant-only tests to semantic checks where victims are not unique. |
| 186 | +2. **Independent reference models for mirror policies** — cross-model drift guards like exact-tier LRU. |
| 187 | +3. **End-to-end policy fuzz targets** — full `Cache` op sequences, complementing DS fuzz. |
| 188 | +4. **Expand O(1) regression** — FIFO, Fast-LRU, Heap-LFU, MFU, and other hot policies. |
| 189 | +5. **`tests/*_concurrency.rs` for `ConcurrentS3FifoCache`** (beyond in-module tests) and Miri smoke for TTL. |
| 190 | +6. **Random policy harness** — even a statistical residency/uniformity oracle helps benchmarking baselines. |
| 191 | +7. **Optional nightly TLC job** — FIFO/LRU formal specs as a non-blocking check. |
| 192 | +8. **Loom or TSan on concurrency tests** — small-model or sanitizer coverage for `Concurrent*` wrappers. |
| 193 | + |
| 194 | +When adding a new test type, update this catalog if coverage or CI integration changes materially (including `.github/workflows/ci.yml`). |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## Related documentation |
| 199 | + |
| 200 | +- [Testing strategy](testing.md) — philosophy, four layers, how to run tests |
| 201 | +- [Policy semantic testing](static-analysis.md) — harness architecture and contributor checklist |
| 202 | +- [Policy spec matrix](specs/matrix.md) — canonical per-policy harness index |
| 203 | +- [Fuzzing in CI/CD](fuzzing-cicd.md) — fuzz pipeline and target discovery |
| 204 | +- [Adding fuzz targets](adding-fuzz-targets.md) — contributor guide for new fuzz targets |
| 205 | +- [tests/README.md](../../tests/README.md) — integration test layout |
| 206 | +- [abstract_models README](../../tests/abstract_models/README.md) — oracle models and tiers |
| 207 | +- [Benchmarking design](../design/benchmarking.md) — benchmark layers vs regression tests |
| 208 | +- [Concurrency](../design/concurrency.md) — `Concurrent*` coverage and gaps |
0 commit comments