The block format provides layered protection against silent data corruption, whether from media degradation, controller firmware bugs, or bit flips in the storage path. Each block stores an xxHash32 checksum computed over the data payload at write time. On every read, the checksum is recomputed and compared against the stored value; any mismatch causes the read to fail immediately rather than return corrupt data. The block size field is stored twice, once in the header and once in the footer, so a single-bit corruption in either copy can be detected by cross-validation during backward cursor traversal. The footer magic number (0x42445442, "BTDB") acts as a high-entropy sentinel: random corruption is unlikely to produce it, so its absence reliably identifies torn writes and partial flushes. During recovery, permissive validation uses this structure to walk forward through WAL blocks, accepting blocks whose footer magic and header/footer size agree, and truncating at the first inconsistency. Forward cursor operations that encounter a checksum failure can call `block_manager_cursor_skip_corrupt()`, which distinguishes partial writes (footer magic absent, block extent known from the size field) from genuine corruption (footer magic present but data checksum fails), advancing past the former and rejecting the latter. The combination of per-block checksums, redundant size fields, and magic sentinels means that any single-point corruption, whether it hits the data, the metadata, or the framing, is detected before it can propagate to the application layer. SSTables inherit this protection directly since their klog and vlog files are block manager files. WAL files add an additional layer: entries are deserialized with bounds checking on every varint and field offset, so a corrupt WAL entry that passes the block checksum (for example, valid bytes rearranged by a controller bug) still fails deserialization rather than silently loading garbage into the memtable.
0 commit comments