Skip to content

Commit 949e1c9

Browse files
TKorrcursoragent
andauthored
Add policy semantic test harness with spec-first oracles
Introduces a test-side abstract interpreter for eviction policies: operational specs drive independent reference models and DS-shaped exact oracles, with `policy_semantics` integration tests that catch semantic drift before it reaches production caches. ### Harness architecture - **`tests/abstract_models/`** — shared `Op<K>` alphabet, `PolicyModel` trait, dual-run driver helpers, and tiered oracles (`exact/`, `reference/`, `bounded/`) - **`tests/policy_semantics/`** — proptest traces dual-running models against real implementations; cross-model tests where reference and exact formulations must agree - **Harness modes:** DualRun (exact/mirror/composed), CrossModel (reference vs exact), InvariantOnly (bounded policies) ### Reference models and cross-model coverage (exact tier) Independent naive reference oracles for all exact-tier policies, each cross-checked against the exact model on shared traces: FIFO, LRU, Fast-LRU (shared LRU reference), LIFO, LFU, MRU, Heap-LFU, MFU, LRU-K Impl fixes discovered during cross-model work: - **LFU** — tie-break uses earliest `last_admission` among min-frequency keys (not naive global FIFO) - **Heap-LFU** — `peek_lfu_key` uses `Ord` tie-break (was hash iteration order) - **MFU** — `peek_mfu_key` selects max valid heap entry (was arbitrary scan) ### Spec-first documentation - **Operational specs** under `docs/testing/specs/policies/` organized by tier (`exact/`, `mirror/`, `bounded/`, `composed/`) - **Canonical index** in `matrix.md` (maturity, harness mode, models, op strategy, traits) - **Shared fragments** in `_includes/` (harness `Op` mapping, spec maturity levels) - **Contributor flow:** spec → reference (optional) → exact → impl dual-run; checklist in `spec_harness.rs` and `static-analysis.md` ### TLA+ pilots (manual, not CI) - **FIFO** and **LRU** formal specs under `docs/testing/specs/formal/` - TLC runbooks, `run-tlc.sh` / `run-fifo-tlc.sh` / `run-lru-tlc.sh` - Contributor guide in `tla-guide.md` ### TTL integration - `ttl_integration_test.rs` wired through `LruOccupancyModel` for LRU-layer residency checks alongside deadline semantics ### Refactoring and hygiene - Feature-gate `exact/` and `bounded/` modules plus op strategies by `policy-*` features - Remove unwired helpers; document `allow(dead_code)` on shared harness root for partial multi-crate usage - DRY residency probes in `policy_semantics` tests; align README/strategy tables with actual dual-run vs invariant-only behavior - Reorganize flat `docs/testing/specs/` into tiered `policies/` + `formal/` layout; remove redirect stubs; update all cross-references ### Policy coverage snapshot | Tier | Policies | Harness | |------|----------|---------| | Exact | FIFO, LRU, Fast-LRU, LIFO, LFU, MRU, Heap-LFU, MFU, LRU-K | DualRun + CrossModel | | Mirror | Clock, 2Q, SLRU, NRU | DualRun (stub specs) | | Bounded | ARC, CAR, Clock-PRO, S3-FIFO | InvariantOnly | | Composed | TTL | DualRun + deadlines | ### Verification ```bash cargo test --test policy_semantics --all-features # 57 tests cargo test --test ttl_integration_test --features ttl ./scripts/run-fifo-tlc.sh && ./scripts/run-lru-tlc.sh # optional manual TLC ``` ### Deferred - `OracleExpectation::Legal` for bounded adaptive victims - Mirror-tier independent reference models - TLA+ in CI and trace-export cross-check - Random policy harness Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent fbb9a41 commit 949e1c9

97 files changed

Lines changed: 7997 additions & 76 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ jobs:
8181
run: |
8282
cargo miri test --package cachekit --lib ds:: -- --skip concurrent --skip stress --skip large --skip performance --skip memory
8383
cargo miri test --package cachekit --lib policy:: -- --skip concurrent --skip stress --skip large --skip performance --skip memory
84+
cargo miri test --test policy_semantics --all-features smoke_ -- --test-threads=1
8485
env:
8586
MIRIFLAGS: -Zmiri-isolation-error=warn
8687

@@ -93,7 +94,9 @@ jobs:
9394
- uses: actions/checkout@v6
9495
- uses: actions-rust-lang/setup-rust-toolchain@v1
9596
- name: Run property tests with increased cases
96-
run: PROPTEST_CASES=1000 cargo test --lib property_tests
97+
run: |
98+
PROPTEST_CASES=1000 cargo test --lib property_tests
99+
PROPTEST_CASES=1000 cargo test --test policy_semantics --all-features
97100
env:
98101
RUST_BACKTRACE: 1
99102

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,10 @@ docs/benchmarks/*/*
6666
!docs/benchmarks/*/charts.js
6767
!docs/benchmarks/*/index.md
6868
!docs/benchmarks/*/results.json
69+
70+
# TLA+ TLC run artifacts (manual model checking — see docs/testing/specs/formal/).
71+
# Spec sources (*.tla, *.cfg) stay tracked; TLC writes these on each run.
72+
docs/testing/specs/formal/**/states/
73+
docs/testing/specs/formal/**/*_TTrace_*.bin
74+
docs/testing/specs/formal/**/*_TTrace_*.tla
75+
docs/testing/specs/formal/**/*.tlc

CONTRIBUTING.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,13 @@ Use conventional commit format for PR titles:
216216

217217
Consider using `proptest` for testing complex invariants.
218218

219+
### Policy Semantic Oracles
220+
221+
New eviction policies with deterministic semantics should include a
222+
reference model in `tests/abstract_models/` and proptest/smoke coverage
223+
in `tests/policy_semantics/`. See
224+
[Policy semantic testing](docs/testing/static-analysis.md).
225+
219226
### Test Organization
220227

221228
```rust

docs/design/trait-hierarchy.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,12 @@ it. The `RecencyTracking` / `FrequencyTracking` / `HistoryTracking`
401401
naming established the convention; adding `WeightTracking` only when
402402
GDS lands keeps the surface honest.
403403

404+
## Testing
405+
406+
Policy semantic tests assert behavior through these capability traits
407+
(`VictimInspectable`, `RecencyTracking`, `EvictingCache`, etc.). See
408+
[Policy semantic testing](../testing/static-analysis.md).
409+
404410
## See also
405411

406412
- [Design overview](design.md) — §7 frames the layering at the

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,4 +59,5 @@ Key features:
5959
## Testing and Fuzzing
6060

6161
- [Testing strategy](testing/testing.md)
62+
- [Policy semantic testing](testing/static-analysis.md)
6263
- [Adding fuzz targets](testing/adding-fuzz-targets.md)

docs/policies/README.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -34,25 +34,25 @@ If you can only implement one “general purpose” policy for mixed workloads,
3434

3535
### Implemented Policies (CacheKit)
3636

37-
| Policy | Summary | Doc |
38-
|--------|---------|-----|
39-
| LRU | Strong default for temporal locality | [LRU doc](lru.md) |
40-
| MRU | Evicts most recent (niche: cyclic patterns) | [MRU doc](mru.md) |
41-
| SLRU | Segmented LRU with probation/protected | [SLRU doc](slru.md) |
42-
| LFU | Frequency-driven, stable hot sets | [LFU doc](lfu.md) |
43-
| Heap-LFU | LFU with heap eviction | [Heap-LFU doc](heap-lfu.md) |
44-
| MFU | Evicts highest frequency (niche/baseline) | [MFU doc](mfu.md) |
45-
| LRU-K | Scan-resistant recency | [LRU-K doc](lru-k.md) |
46-
| 2Q | Probation + protected queues | [2Q doc](2q.md) |
47-
| ARC | Adaptive recency/frequency balance | [ARC doc](arc.md) |
48-
| CAR | ARC-like with Clock (lower hit overhead) | [CAR doc](car.md) |
49-
| FIFO | Simple insertion-order (oldest first) | [FIFO doc](fifo.md) |
50-
| LIFO | Stack-based (newest first) | [LIFO doc](lifo.md) |
51-
| Clock | Approximate LRU | [Clock doc](clock.md) |
52-
| Clock-PRO | Scan-resistant Clock variant | [Clock-PRO doc](clock-pro.md) |
53-
| NRU | Coarse recency tracking | [NRU doc](nru.md) |
54-
| S3-FIFO | Scan-resistant FIFO | [S3-FIFO doc](s3-fifo.md) |
55-
| Random | Baseline: uniform random eviction | [Random doc](random.md) |
37+
| Policy | Summary | Semantic oracle | Doc |
38+
|--------|---------|-----------------|-----|
39+
| LRU | Strong default for temporal locality | exact | [LRU doc](lru.md) |
40+
| MRU | Evicts most recent (niche: cyclic patterns) | exact | [MRU doc](mru.md) |
41+
| SLRU | Segmented LRU with probation/protected | mirror | [SLRU doc](slru.md) |
42+
| LFU | Frequency-driven, stable hot sets | exact | [LFU doc](lfu.md) |
43+
| Heap-LFU | LFU with heap eviction | exact | [Heap-LFU doc](heap-lfu.md) |
44+
| MFU | Evicts highest frequency (niche/baseline) | exact | [MFU doc](mfu.md) |
45+
| LRU-K | Scan-resistant recency | exact | [LRU-K doc](lru-k.md) |
46+
| 2Q | Probation + protected queues | mirror | [2Q doc](2q.md) |
47+
| ARC | Adaptive recency/frequency balance | bounded | [ARC doc](arc.md) |
48+
| CAR | ARC-like with Clock (lower hit overhead) | bounded | [CAR doc](car.md) |
49+
| FIFO | Simple insertion-order (oldest first) | exact | [FIFO doc](fifo.md) |
50+
| LIFO | Stack-based (newest first) | exact | [LIFO doc](lifo.md) |
51+
| Clock | Approximate LRU | mirror | [Clock doc](clock.md) |
52+
| Clock-PRO | Scan-resistant Clock variant | bounded | [Clock-PRO doc](clock-pro.md) |
53+
| NRU | Coarse recency tracking | mirror | [NRU doc](nru.md) |
54+
| S3-FIFO | Scan-resistant FIFO | bounded | [S3-FIFO doc](s3-fifo.md) |
55+
| Random | Baseline: uniform random eviction | none | [Random doc](random.md) |
5656

5757
### Roadmap Policies (Planned)
5858

docs/testing/specs/README.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Operational policy specs
2+
3+
Human-readable specifications for eviction policies used as the **source of truth** for test-side oracles.
4+
5+
## Directory layout
6+
7+
```text
8+
docs/testing/specs/
9+
├── README.md, matrix.md, template.md, tla-guide.md # hub docs
10+
├── _includes/ # shared fragments (Op mapping, maturity levels)
11+
├── policies/ # human operational specs by tier
12+
│ ├── exact/ # FIFO, LRU, LFU, …
13+
│ ├── mirror/ # Clock, 2Q, …
14+
│ ├── bounded/ # ARC, S3-FIFO, …
15+
│ └── composed/ # TTL
16+
└── formal/ # TLA+ modules + TLC runbooks
17+
├── fifo/
18+
└── lru/
19+
```
20+
21+
**Canonical index:** [matrix.md](matrix.md) (maturity, harness mode, model paths per policy).
22+
23+
## Pipeline (all tiers)
24+
25+
```text
26+
policies/<tier>/<policy>.md
27+
→ reference/ PolicyModel (optional — independent formulation)
28+
→ exact/ PolicyModel (deque / DS-shaped oracle)
29+
→ policy_semantics dual-run vs implementation
30+
```
31+
32+
| Tier | Harness mode | Oracle |
33+
|------|--------------|--------|
34+
| Exact / mirror | DualRun | `exact/` `PolicyModel` vs impl |
35+
| Exact (policies with `reference/` rows) | CrossModel | `reference/` vs `exact/` |
36+
| Bounded | InvariantOnly | structural invariants on impl |
37+
| Composed (TTL) | DualRun + deadlines | `LruOccupancyModel` + TTL layer |
38+
39+
Cross-model tests prove `reference/` agrees with `exact/` on the same traces. Impl dual-run proves `exact/` agrees with real caches.
40+
41+
## Required sections (every policy spec)
42+
43+
Use [template.md](template.md) as the skeleton:
44+
45+
1. **Maturity banner**`stub`, `reference`, and/or `tla`
46+
2. **State variables** — abstract state at rest between operations
47+
3. **Init** — empty cache at capacity `C`
48+
4. **Per-`Op` transitions** — match the harness [`Op<K>`](../../../tests/abstract_models/mod.rs) alphabet
49+
5. **Tie-breaks** — deterministic victim and rank when multiple keys qualify
50+
6. **Observables**`resident`, `peek_victim`, `recency_rank` (if applicable), `hit` classification
51+
7. **API mapping** — how each `Op` maps to cache traits (`peek` must not promote on LRU, etc.)
52+
53+
Shared harness `Op` table: [_includes/harness-op-mapping.md](_includes/harness-op-mapping.md). Trait semantics: [trait hierarchy](../../design/trait-hierarchy.md).
54+
55+
## Spec-change checklist
56+
57+
When a spec changes, update in order:
58+
59+
1. Spec doc under `policies/<tier>/`
60+
2. `tests/abstract_models/reference/<policy>.rs` (if reference model exists)
61+
3. Cross-model test expectations (if behavior changed)
62+
4. `tests/abstract_models/exact/<policy>.rs` if the exact model was wrong
63+
5. `formal/<policy>/` TLA+ module and `tlc.md` alignment notes (if applicable)
64+
6. Row in [matrix.md](matrix.md)
65+
66+
## TLA+ (optional manual check)
67+
68+
FIFO and LRU include TLA+ pilots under [formal/](formal/README.md). **Read first:** [tla-guide.md](tla-guide.md). TLC is **not** run in CI.
69+
70+
```bash
71+
./scripts/run-tlc.sh fifo # or ./scripts/run-fifo-tlc.sh
72+
./scripts/run-tlc.sh lru # or ./scripts/run-lru-tlc.sh
73+
```
74+
75+
Success: no `SemanticOK` violation on the bundled config. Runbooks: [formal/fifo/tlc.md](formal/fifo/tlc.md), [formal/lru/tlc.md](formal/lru/tlc.md).
76+
77+
**TLC vs Rust:** TLC proves `SemanticOK` on reachable states for a finite instance; proptest runs long sequential traces on `u8` keys. They are complementary.
78+
79+
## Related documentation
80+
81+
- [Policy matrix](matrix.md) — canonical index
82+
- [Policy specs by tier](policies/README.md)
83+
- [Spec template](template.md) — new policy skeleton
84+
- [TLA+ guide](tla-guide.md) — contributor guide
85+
- [Abstract models README](../../../tests/abstract_models/README.md)
86+
- [Policy semantic testing](../static-analysis.md)
87+
- [Testing strategy](../testing.md)
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Harness `Op` mapping (shared)
2+
3+
Standard mapping from harness [`Op<K>`](../../../../tests/abstract_models/mod.rs) to cache traits. Copy into policy specs; adjust side effects per policy.
4+
5+
| `Op` | Cache API | Typical side effects |
6+
|------|-----------|----------------------|
7+
| `Insert(k)` | `insert(k, v)` | Evict if full; may promote on re-insert (policy-specific) |
8+
| `Get(k)` | `get(k)` | Promote on hit (recency / frequency policies) |
9+
| `Peek(k)` | `peek(k)` | **No promotion** on LRU-family policies |
10+
| `GetMut(k)` | `get_mut(k)` | Promote on hit where adapter models it |
11+
| `Touch(k)` | `touch(k)` | Promote on hit |
12+
| `Remove(k)` | `remove(k)` | Remove key; ordering side effects policy-specific |
13+
| `EvictOne` | `evict_one()` | Evict victim per policy rule |
14+
15+
Align with [trait hierarchy](../../../design/trait-hierarchy.md): `Peek` must not change `recency_rank` on LRU-family policies.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Spec maturity levels (shared)
2+
3+
| Level | Meaning |
4+
|-------|---------|
5+
| `stub` | Transcribed from executable oracle; independent reference pending |
6+
| `reference` | Independent naive `reference/` model exists |
7+
| `tla` | Machine-readable TLA+ module under [`formal/`](../formal/README.md) + TLC runbook |
8+
9+
Canonical per-policy status: [matrix.md](../matrix.md).
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Formal (TLA+) specs
2+
3+
Optional machine-readable specs for finite-state model checking with TLC. **Manual only** — not run in CI.
4+
5+
## Layout
6+
7+
Each policy with TLA+ support lives in its own subdirectory:
8+
9+
```text
10+
formal/
11+
├── fifo/
12+
│ ├── Fifo.tla # MODULE name must match filename
13+
│ ├── fifo.cfg # TLC constants and INVARIANT
14+
│ └── tlc.md # Runbook and alignment log
15+
└── lru/
16+
├── Lru.tla
17+
├── lru.cfg
18+
└── tlc.md
19+
```
20+
21+
Human operational specs remain under [`policies/`](../policies/README.md). Contributor guide: [tla-guide.md](../tla-guide.md).
22+
23+
## Run TLC
24+
25+
```bash
26+
./scripts/run-tlc.sh fifo # or: ./scripts/run-fifo-tlc.sh
27+
./scripts/run-tlc.sh lru # or: ./scripts/run-lru-tlc.sh
28+
```
29+
30+
Success: no `SemanticOK` violation on the bundled config. See per-policy `tlc.md` runbooks for alignment checklists.

0 commit comments

Comments
 (0)