Skip to content

Commit abd8b73

Browse files
Phase 2: pager — checksummed pages, atomic meta swap, free-list, cache
Fixed 4 KiB page frames with a self-describing CRC32C header (type + self-id) detecting truncation, bit-rot, and misdirected writes; a non-panicking decoder returning typed Corruption errors. Durable atomic commit via double-buffered meta slots (data → fsync → inactive-slot meta → fsync → promote), with reopen adopting the valid meta of the highest committed txn. SQLite-style free-list trunk chain for page reuse, a byte-budgeted LRU page cache (dirty pages pinned), and validate() proving meta + free-list integrity. Tests (PLAN §3.6/§6 exit): seeded model-based alloc/free/read/write property test across reopens, multi-trunk reuse, injected-fault crash test around the meta swap, and decoder/meta-recovery robustness tests. common: FaultInjectingBackend::reset_counters and an IoBackend impl for Arc<B> to support the crash test. In-house CRC32C and seeded tests keep the dependency graph empty (DECISIONS.md D3, D4); CHANGELOG/DECISIONS updated.
1 parent 2865a7f commit abd8b73

14 files changed

Lines changed: 1700 additions & 5 deletions

File tree

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,31 @@ under a category (`Added` / `Changed` / `Fixed` / `Removed` / `Security`).
88

99
## [Unreleased]
1010

11+
### Phase 2 — Pager (paged, checksummed, atomically-committing storage)
12+
13+
#### Added
14+
- `pager`: fixed 4 KiB page frames with a self-describing 16-byte header
15+
(CRC32C, page type, self-id) that detects truncation, bit-rot, and misdirected
16+
writes; a decoder that returns typed `Corruption` errors and never panics on
17+
hostile input.
18+
- In-house software CRC32C (Castagnoli), zero-dependency (`DECISIONS.md` D3).
19+
- Double-buffered meta pages (slots 0/1) with an atomic commit: dirty pages →
20+
fsync → new meta to the inactive slot → fsync → promote. Reopen adopts the
21+
valid meta slot with the highest committed txn id.
22+
- SQLite-style free-list trunk chain (`alloc`/`free`) reusing freed pages before
23+
extending the file; `validate()` walks meta + free-list proving range, no
24+
cycles/duplicates, and length agreement.
25+
- Byte-budgeted LRU page cache (dirty pages pinned until commit).
26+
- Exit-criteria tests: a seeded model-based alloc/free/read/write property test
27+
across commits/reopens (`DECISIONS.md` D4), a multi-trunk reuse test, an
28+
injected-fault crash test around the meta swap (reopen lands on the last whole
29+
commit, never a torn one), and decoder/meta-recovery robustness tests.
30+
31+
#### Changed
32+
- `crates/common/src/io.rs`: `FaultInjectingBackend::reset_counters` (target a
33+
fault at a specific later operation) and an `IoBackend` impl for `Arc<B>` (a
34+
test can hold a handle to arm faults while the `Pager` owns the backend).
35+
1136
### Phase 1 — Foundations & scaffolding
1237

1338
#### Added

DECISIONS.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,41 @@ Per `PLAN.md` §1 rule 6, every resolution of an ambiguity or deviation from
55

66
---
77

8+
## D4 — Seeded, model-based property tests instead of `proptest`
9+
10+
**Phase:** 2 · **Status:** accepted
11+
12+
`PLAN.md` §3.6 calls for randomized, reproducible-from-a-seed property tests.
13+
The obvious tool is the `proptest` crate, but it (and its transitive
14+
dependencies) would be the first third-party code to enter the dependency graph,
15+
and the CI gate `cargo deny` (licenses/advisories) is **CI-only** — not
16+
installed locally — so a new dependency's license/advisory status cannot be
17+
vetted before pushing.
18+
19+
**Decision:** write property tests in-house against the `common::SeededRng`
20+
(SplitMix64) host service already built in Phase 1. A test fixes a seed, drives a
21+
randomized op sequence (alloc/free/write/commit/reopen) against the pager while a
22+
simple in-memory model (`HashMap<page, tag>`) tracks expected contents, and
23+
asserts the two agree plus `validate()` passes. Seeds are listed explicitly so a
24+
failure is reproducible. This keeps the dependency graph empty of unvetted crates
25+
while satisfying §3.6. Revisit if shrinking (minimal counterexamples) becomes
26+
worth a dependency.
27+
28+
## D3 — In-house software CRC32C (Castagnoli), no dependency
29+
30+
**Phase:** 2 · **Status:** accepted
31+
32+
`ARCHITECTURE.md` specifies a CRC32C per-page checksum. Crates such as `crc32c`
33+
or `crc` would pull in third-party code that, per D4's reasoning, cannot be
34+
license/advisory-vetted locally (the `cargo deny` gate is CI-only).
35+
36+
**Decision:** implement CRC32C (Castagnoli polynomial `0x82F63B78`) in `pager`
37+
as a small, table-driven software routine (`crc::crc32c`), with the lookup table
38+
built by a `const fn` at compile time. Correctness is pinned by the standard
39+
check vector (`crc32c(b"123456789") == 0xE3069283`). No hardware-intrinsic
40+
(SSE4.2) path for now — portability and a zero-dependency graph over peak
41+
throughput; revisit if checksum cost shows up in profiling.
42+
843
## D2 — Public crate is `otf-dbms`; internal crates keep functional names
944

1045
**Phase:** 1 · **Status:** accepted

crates/common/src/io.rs

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,15 @@ impl<B: IoBackend> FaultInjectingBackend<B> {
259259
*self.armed.lock().unwrap_or_else(PoisonError::into_inner) = None;
260260
}
261261

262+
/// Reset all per-operation occurrence counters to zero, so a subsequent
263+
/// [`arm`](Self::arm) targets operations relative to this point. Useful for
264+
/// faulting a specific operation within a later phase of a workload.
265+
pub fn reset_counters(&self) {
266+
self.reads.store(0, Ordering::Relaxed);
267+
self.writes.store(0, Ordering::Relaxed);
268+
self.syncs.store(0, Ordering::Relaxed);
269+
}
270+
262271
/// Consume the wrapper and return the inner backend.
263272
pub fn into_inner(self) -> B {
264273
self.inner
@@ -309,6 +318,31 @@ impl<B: IoBackend> IoBackend for FaultInjectingBackend<B> {
309318
}
310319
}
311320

321+
/// Sharing an [`IoBackend`] behind an [`Arc`](std::sync::Arc) keeps it an
322+
/// `IoBackend`. This lets a test retain a handle (e.g. to arm faults) while a
323+
/// `Pager` owns another clone of the same backend.
324+
impl<B: IoBackend + ?Sized> IoBackend for std::sync::Arc<B> {
325+
fn read_at(&self, offset: u64, buf: &mut [u8]) -> IoResult<()> {
326+
(**self).read_at(offset, buf)
327+
}
328+
329+
fn write_at(&self, offset: u64, data: &[u8]) -> IoResult<()> {
330+
(**self).write_at(offset, data)
331+
}
332+
333+
fn sync(&self) -> IoResult<()> {
334+
(**self).sync()
335+
}
336+
337+
fn len(&self) -> IoResult<u64> {
338+
(**self).len()
339+
}
340+
341+
fn truncate(&self, len: u64) -> IoResult<()> {
342+
(**self).truncate(len)
343+
}
344+
}
345+
312346
/// A real-file [`IoBackend`] using positional reads/writes (`pread`/`pwrite`),
313347
/// so concurrent reads need no shared cursor.
314348
#[cfg(unix)]

crates/pager/Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,8 @@ description = "Fixed-size paged file I/O: checksums, page cache, meta pages, fre
1313
common.workspace = true
1414
thiserror.workspace = true
1515

16+
[dev-dependencies]
17+
common.workspace = true
18+
1619
[lints]
1720
workspace = true

crates/pager/src/cache.rs

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
//! A byte-budgeted page cache with LRU eviction.
2+
//!
3+
//! Entries are reference-counted ([`Arc`]) page frames. Dirty frames are pinned
4+
//! until the next commit flushes them, so eviction only reclaims clean entries;
5+
//! if every entry is dirty the cache may temporarily exceed its budget.
6+
//!
7+
//! Eviction currently scans for the least-recently-used clean entry (O(n)); the
8+
//! working sets here are small and this is not yet on a measured hot path. An
9+
//! O(1) intrusive LRU is a Phase 11 optimization candidate.
10+
11+
use std::collections::HashMap;
12+
use std::sync::Arc;
13+
14+
use crate::page::{Frame, PAGE_SIZE};
15+
16+
struct Entry {
17+
frame: Arc<Frame>,
18+
dirty: bool,
19+
tick: u64,
20+
}
21+
22+
pub(crate) struct PageCache {
23+
entries: HashMap<u64, Entry>,
24+
budget_bytes: usize,
25+
tick: u64,
26+
}
27+
28+
impl PageCache {
29+
pub(crate) fn new(budget_bytes: usize) -> Self {
30+
PageCache {
31+
entries: HashMap::new(),
32+
budget_bytes,
33+
tick: 0,
34+
}
35+
}
36+
37+
fn bump(&mut self) -> u64 {
38+
self.tick = self.tick.wrapping_add(1);
39+
self.tick
40+
}
41+
42+
/// Fetch a cached frame, marking it most-recently-used.
43+
pub(crate) fn get(&mut self, id: u64) -> Option<Arc<Frame>> {
44+
let tick = self.bump();
45+
let entry = self.entries.get_mut(&id)?;
46+
entry.tick = tick;
47+
Some(Arc::clone(&entry.frame))
48+
}
49+
50+
/// Insert or replace a frame. A `dirty` insert pins the entry until the
51+
/// next commit; re-inserting never clears an existing dirty flag.
52+
pub(crate) fn insert(&mut self, id: u64, frame: Arc<Frame>, dirty: bool) {
53+
let tick = self.bump();
54+
match self.entries.get_mut(&id) {
55+
Some(entry) => {
56+
entry.frame = frame;
57+
entry.tick = tick;
58+
entry.dirty = entry.dirty || dirty;
59+
}
60+
None => {
61+
self.entries.insert(id, Entry { frame, dirty, tick });
62+
}
63+
}
64+
self.evict_to_budget();
65+
}
66+
67+
fn evict_to_budget(&mut self) {
68+
while self.entries.len().saturating_mul(PAGE_SIZE) > self.budget_bytes {
69+
let victim = self
70+
.entries
71+
.iter()
72+
.filter(|(_, e)| !e.dirty)
73+
.min_by_key(|(_, e)| e.tick)
74+
.map(|(id, _)| *id);
75+
match victim {
76+
Some(id) => {
77+
self.entries.remove(&id);
78+
}
79+
None => break, // everything resident is dirty; keep it
80+
}
81+
}
82+
}
83+
84+
/// All currently-dirty frames, for the commit flush. Does not clear flags.
85+
pub(crate) fn dirty_pages(&self) -> Vec<(u64, Arc<Frame>)> {
86+
self.entries
87+
.iter()
88+
.filter(|(_, e)| e.dirty)
89+
.map(|(id, e)| (*id, Arc::clone(&e.frame)))
90+
.collect()
91+
}
92+
93+
/// Mark every entry clean (after a successful commit flush).
94+
pub(crate) fn clear_dirty(&mut self) {
95+
for entry in self.entries.values_mut() {
96+
entry.dirty = false;
97+
}
98+
}
99+
100+
#[cfg(test)]
101+
pub(crate) fn len(&self) -> usize {
102+
self.entries.len()
103+
}
104+
105+
#[cfg(test)]
106+
pub(crate) fn contains(&self, id: u64) -> bool {
107+
self.entries.contains_key(&id)
108+
}
109+
}
110+
111+
#[cfg(test)]
112+
mod tests {
113+
use super::*;
114+
use crate::page;
115+
116+
fn frame(id: u64) -> Arc<Frame> {
117+
let mut f = page::zeroed();
118+
page::finalize(&mut f, page::PageType::Data, crate::PageId::new(id));
119+
Arc::from(f)
120+
}
121+
122+
#[test]
123+
fn evicts_clean_lru_first() {
124+
// Budget for two pages.
125+
let mut cache = PageCache::new(2 * PAGE_SIZE);
126+
cache.insert(2, frame(2), false);
127+
cache.insert(3, frame(3), false);
128+
let _ = cache.get(2); // touch 2 → 3 is now LRU
129+
cache.insert(4, frame(4), false); // over budget → evict 3
130+
assert!(cache.contains(2));
131+
assert!(cache.contains(4));
132+
assert!(!cache.contains(3));
133+
assert_eq!(cache.len(), 2);
134+
}
135+
136+
#[test]
137+
fn dirty_pages_are_pinned_over_budget() {
138+
let mut cache = PageCache::new(PAGE_SIZE); // budget for one page
139+
cache.insert(2, frame(2), true);
140+
cache.insert(3, frame(3), true);
141+
// Both dirty → neither evicted even though over budget.
142+
assert_eq!(cache.len(), 2);
143+
assert_eq!(cache.dirty_pages().len(), 2);
144+
cache.clear_dirty();
145+
// Now clean; next insert can evict down to budget.
146+
cache.insert(4, frame(4), false);
147+
assert_eq!(cache.len(), 1);
148+
}
149+
}

crates/pager/src/crc.rs

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
//! CRC32C (Castagnoli) checksum — software, table-driven.
2+
//!
3+
//! Implemented in-house rather than pulled as a dependency: it is a small,
4+
//! well-understood, non-security-sensitive integrity check (used to detect page
5+
//! corruption, not to resist tampering). See `DECISIONS.md` D3.
6+
7+
/// Reflected Castagnoli polynomial (0x1EDC6F41 reflected).
8+
const POLY: u32 = 0x82F6_3B78;
9+
10+
const fn build_table() -> [u32; 256] {
11+
let mut table = [0u32; 256];
12+
let mut i = 0usize;
13+
while i < 256 {
14+
let mut crc = i as u32;
15+
let mut bit = 0;
16+
while bit < 8 {
17+
if crc & 1 == 1 {
18+
crc = (crc >> 1) ^ POLY;
19+
} else {
20+
crc >>= 1;
21+
}
22+
bit += 1;
23+
}
24+
table[i] = crc;
25+
i += 1;
26+
}
27+
table
28+
}
29+
30+
static TABLE: [u32; 256] = build_table();
31+
32+
/// Compute the CRC32C of `data`.
33+
///
34+
/// # Examples
35+
///
36+
/// ```
37+
/// // Standard CRC32C check value for the ASCII string "123456789".
38+
/// assert_eq!(pager::crc32c(b"123456789"), 0xE306_9283);
39+
/// ```
40+
pub fn crc32c(data: &[u8]) -> u32 {
41+
let mut crc = 0xFFFF_FFFFu32;
42+
for &byte in data {
43+
let idx = ((crc ^ byte as u32) & 0xFF) as usize;
44+
crc = (crc >> 8) ^ TABLE[idx];
45+
}
46+
crc ^ 0xFFFF_FFFF
47+
}
48+
49+
#[cfg(test)]
50+
mod tests {
51+
use super::*;
52+
53+
#[test]
54+
fn known_vectors() {
55+
assert_eq!(crc32c(b""), 0x0000_0000);
56+
assert_eq!(crc32c(b"123456789"), 0xE306_9283);
57+
}
58+
59+
#[test]
60+
fn detects_single_bit_flips() {
61+
let a = crc32c(b"the quick brown fox");
62+
let b = crc32c(b"the quick brown fox.");
63+
assert_ne!(a, b);
64+
}
65+
}

0 commit comments

Comments
 (0)