This document is the authoritative reference for how OpenNANDLab is structured internally. Read it before contributing code.
OpenNANDLab follows three principles:
Correctness before performance. The simulator must produce outputs that match theory (BCH can correct exactly t errors, WAF ≥ 1.0, RBER grows monotonically with P/E count). Optimizations come second.
Layered abstraction. Each layer communicates with its neighbours through a well-defined interface. The FTL does not know whether ECC is BCH or LDPC. The analytics engine does not know which GC policy was used. Swapping algorithms requires only a config change.
Measurable results. Every subsystem exposes telemetry. If a benchmark cannot produce a number, it is not a benchmark.
╔═══════════════════════════════════════════════════════════╗
║ Host Interface Layer ║
║ CLI (Click) │ Python API │ Streamlit Dashboard ║
╚══════════════════════════╤════════════════════════════════╝
│ LogicalRequest(op, lba, size, data)
▼
╔══════════════════════════════════════════════════════════╗
║ Workload Engine ║
║ Sequential │ Random │ Mixed │ Trace Replay ║
╚══════════════════════════╤═══════════════════════════════╝
│ stream of LBA requests
▼
╔══════════════════════════════════════════════════════════╗
║ Flash Translation Layer (FTL) ║
║ ┌─────────────┐ ┌──────────────┐ ┌────────────────┐ ║
║ │ L2P Map │ │ Write Buffer │ │ Free-Block │ ║
║ │ (flat array)│ │ (64 pages) │ │ Pool (deque) │ ║
║ └─────────────┘ └──────────────┘ └────────────────┘ ║
║ ┌───────────────────────────────────────────────────┐ ║
║ │ Garbage Collector │ ║
║ │ Greedy │ Cost-Benefit │ Age-Threshold │ ║
║ └───────────────────────────────────────────────────┘ ║
║ ┌───────────────────────────────────────────────────┐ ║
║ │ Wear Leveling Engine (min-heap) │ ║
║ │ Dynamic │ Static │ Hybrid │ ║
║ └───────────────────────────────────────────────────┘ ║
╚══════════════════════════╤═══════════════════════════════╝
│ PhysicalPage(block_id, page_id, data)
▼
╔══════════════════════════════════════════════════════════╗
║ ECC + Compression Layer ║
║ ┌──────────────┐ ┌────────────────┐ ┌─────────────┐ ║
║ │ Compressor │ │ ECCHandler │ │ Scrambler │ ║
║ │ LZ4 / Zstd │ │ BCH or LDPC │ │ (optional) │ ║
║ └──────────────┘ └────────────────┘ └─────────────┘ ║
╚══════════════════════════╤═══════════════════════════════╝
│ codeword bytes
▼
╔══════════════════════════════════════════════════════════╗
║ NAND Device Model ║
║ Channel → Die → Plane → Block → Page ║
║ ┌────────────────┐ ┌─────────────────────────────────┐ ║
║ │ Timing Model │ │ Reliability Model │ ║
║ │ tR/tPROG/tBERS│ │ RBER(P/E), Retention, Disturb │ ║
║ └────────────────┘ └─────────────────────────────────┘ ║
╚══════════════════════════╤═══════════════════════════════╝
│ telemetry events
▼
╔══════════════════════════════════════════════════════════╗
║ Analytics Engine ║
║ WAF │ IOPS │ Latency Percentiles │ ECC Rate │ Lifetime ║
║ Wear Heatmap │ RBER Curve │ GC Timeline │ HTML Report ║
╚══════════════════════════════════════════════════════════╝
NANDDevice
└── channels: list[NANDChannel]
└── dies: list[NANDDie]
└── planes: list[NANDPlane]
└── blocks: list[NANDBlock]
└── pages: list[NANDPage]
NANDPage fields:
state: PageState—FREE | VALID | INVALIDdata: bytes— raw codeword bytes (post-ECC)write_count: int— number of times this page has been written
NANDBlock fields:
pe_count: int— erase countis_bad: boolrber: float— computed fromreliability.rber_model(pe_count)
NANDDevice.read_page(block_id, page_id) -> bytes — returns raw codeword or raises NANDReadError.
NANDDevice.write_page(block_id, page_id, data: bytes) -> None — writes codeword; increments write counter.
NANDDevice.erase_block(block_id) -> None — sets all pages to FREE; increments pe_count; recomputes RBER.
RBER model (Weibull variant):
def rber_model(pe_count: int, cfg: NANDConfig) -> float:
"""
RBER increases from rber_floor toward rber_ceil as erase count grows.
λ controls the characteristic lifetime (50% of rber_ceil at pe_count=λ).
"""
fraction = 1.0 - math.exp(-pe_count / cfg.rber_lambda)
return cfg.rber_floor + (cfg.rber_ceil - cfg.rber_floor) * fractionBit-error injection (used by simulator, not ECC):
def inject_errors(data: bytes, rber: float, rng: random.Random) -> bytes:
"""Flip each bit independently with probability rber."""
...Retention loss (optional, P1):
def retention_rber(base_rber: float, hours_since_write: float) -> float:
return base_rber * math.log1p(hours_since_write / 24.0)Invariants (enforced by Hypothesis):
- Every mapped LPN points to exactly one physical page.
- No physical page is pointed to by more than one LPN.
- After GC, free pages ≥ gc_trigger threshold.
- WAF ≥ 1.0 at all times.
Core data structures:
class PageFTL:
l2p: array.array # logical → physical page index; -1 = unmapped
p2l: array.array # physical → logical page index; -1 = free/invalid
page_state: bytearray # 2 bits per physical page: FREE=0, VALID=1, INVALID=2
free_pool: deque[int] # deque of free physical page indices
write_buffer: WriteBuffer # in-memory buffer before NAND commit
_host_writes: int # for WAF numerator
_nand_writes: int # for WAF denominatorWrite path:
FTL.write(lbn, data)
→ compress(data) if enabled
→ write_buffer.add(lbn, compressed)
→ if buffer full:
flush_buffer()
→ for each (lbn, payload) in buffer:
old_ppn = l2p[lbn]
new_ppn = free_pool.popleft()
ecc_data = ecc_handler.encode(payload)
nand.write_page(ppn_to_block(new_ppn), ppn_to_page(new_ppn), ecc_data)
l2p[lbn] = new_ppn
p2l[new_ppn] = lbn
if old_ppn != -1:
page_state[old_ppn] = INVALID
p2l[old_ppn] = -1
page_state[new_ppn] = VALID
nand_writes += 1
if free_pool size < gc_trigger:
gc.run()
Read path:
FTL.read(lbn) -> bytes
→ if lbn in write_buffer: return write_buffer[lbn]
→ ppn = l2p[lbn]
→ if ppn == -1: raise UnmappedLBNError
→ raw = nand.read_page(ppn_to_block(ppn), ppn_to_page(ppn))
→ decoded = ecc_handler.decode(raw) # raises UncorrectableECCError if needed
→ return decompress(decoded) if compression flag set
GreedyGC:
class GreedyGC:
def select_victim(self, ftl: PageFTL) -> int:
"""Return block_id with the most INVALID pages. O(B) scan, called infrequently."""
...
def run(self, ftl: PageFTL, nand: NANDDevice) -> GCStats:
block_id = self.select_victim(ftl)
# Copy all VALID pages in block to free pages
for page_id in range(pages_per_block):
ppn = block_to_ppn(block_id, page_id)
if page_state[ppn] == VALID:
lbn = p2l[ppn]
payload = nand.read_page(block_id, page_id)
ftl.write_to_free_page(lbn, payload) # internal, bypasses buffer
nand_writes += 1 # counted in WAF
nand.erase_block(block_id) # pe_count += 1
free_pool.extend(block_pages(block_id))
return GCStats(pages_moved=valid_count, erases=1)CostBenefitGC (P1): score = (1 - utilization) * age / (2 * utilization * age_weight + 1e-9)
import heapq
class WearLevelingEngine:
"""
Tracks per-block erase counts in a min-heap.
Triggers wear leveling when the range (max - min) exceeds threshold.
"""
def __init__(self, num_blocks: int, threshold: int):
self._heap: list[tuple[int, int]] = [(0, i) for i in range(num_blocks)]
heapq.heapify(self._heap)
self._counts = [0] * num_blocks
self.threshold = threshold
def record_erase(self, block_id: int) -> None:
self._counts[block_id] += 1
heapq.heappush(self._heap, (self._counts[block_id], block_id))
def least_worn_block(self) -> int:
"""O(1) peek of the min-heap."""
return self._heap[0][1]
def should_level(self) -> bool:
max_count = max(self._counts)
min_count = self._counts[self.least_worn_block()]
return (max_count - min_count) > self.thresholdThe BCH implementation must cover all four steps:
| Step | Algorithm | Output |
|---|---|---|
| Encode | Generator polynomial multiplication | Codeword with appended parity |
| Syndrome computation | Evaluate codeword at primitive roots | Syndrome vector |
| Error-locator polynomial | Berlekamp–Massey | σ(x) |
| Error location | Chien search | Error positions |
| Error value (non-binary) | Forney's algorithm | Error magnitudes |
Note on Forney: Forney's algorithm is required for m > 1 (GF(2^m) with non-binary symbols). Without it, the decoder returns incorrect data silently when m > 1 and t > 0. Binary BCH (m=1) does not need Forney.
Parity-check matrix generation: Progressive Edge-Growth (PEG) algorithm. The matrix is cached to disk as an .npz file keyed by (n, d_v, d_c) to avoid regeneration overhead.
Soft-decision belief propagation:
def decode(self, llr: np.ndarray, max_iter: int = 50) -> np.ndarray:
"""
llr: log-likelihood ratios from Gaussian Vth model.
llr[i] = log(P(bit=0|channel) / P(bit=1|channel))
Returns decoded bits (hard decision after convergence).
"""Gaussian threshold-voltage model (for LLR generation):
def compute_llr(raw_bits: bytes, rber: float, sigma: float = 1.0) -> np.ndarray:
"""
Model each cell's threshold voltage as Gaussian(μ=0 or 1, σ).
Returns LLR vector for the BP decoder.
"""All telemetry is captured via an event bus pattern: subsystems emit events; the analytics engine subscribes:
@dataclass
class WriteEvent:
timestamp_ns: int
lbn: int
ppn: int
is_gc: bool
@dataclass
class EraseEvent:
timestamp_ns: int
block_id: int
pe_count: int
@dataclass
class ECCEvent:
timestamp_ns: int
errors_corrected: int
uncorrectable: boolDerived metrics:
| Metric | Formula |
|---|---|
| WAF | nand_writes / host_writes |
| IOPS | host_ops / elapsed_seconds |
| P99 latency | 99th percentile of latency_samples |
| ECC correction rate | ecc_corrections / total_reads |
| UBER | uncorrectable_reads / total_reads |
| Estimated lifetime | max_pe_cycles / current_pe_rate_per_day |
Problem (identified by Deepseek): In v1.1.0, NANDController.write_page ends with a comment # Perform error correction coding / # ... (comment only). ECC encoding is never called. The physical write is never called. Every "write" in v1.1.0 is a no-op.
Fix:
def write_page(self, logical_block: int, page: int, data: bytes) -> None:
# 1. Validate inputs
if not self._initialized:
raise RuntimeError("Controller not initialized")
# 2. Check bad block
physical_block = self._bbm.get_replacement(logical_block)
if physical_block is None:
raise BadBlockError(f"No replacement for block {logical_block}")
# 3. Compress
payload = data
compressed = False
if self.compression_enabled:
c = self._compressor.compress(data)
if len(c) < len(data):
payload, compressed = c, True
# 4. ECC encode ← THIS WAS MISSING
codeword = self._ecc_handler.encode(payload)
# 5. Optional scramble ← THIS WAS MISSING
if self.data_scrambling:
codeword = self._scramble_data(codeword, physical_block, page)
# 6. Physical write ← THIS WAS MISSING
self._nand_interface.write_page(physical_block, page, codeword)
# 7. Update cache with original data
if self.cache_enabled:
self._cache.put((physical_block, page), data)
# 8. Telemetry
self._metrics.record_write(logical_block, physical_block, compressed)
self._wear_leveler.record_write(physical_block)| Decision | Choice | Rationale |
|---|---|---|
| FTL default | Page-level mapping | Simplest correct semantics; easiest to verify |
| Config | Pydantic BaseModel |
Type-safe, validated at startup, generates JSON schema |
| Heap for WL | heapq (min-heap) |
O(log n) insert, O(1) min lookup — far better than linear scan |
| LRU cache | OrderedDict |
Python stdlib, O(1) get/put/evict, no extra dependency |
| GC default | Greedy | Easiest to verify correct; cost-benefit is a config switch |
| GUI | Streamlit | Headless-CI-friendly, no Tkinter install pain, produces HTML reports |
| Parity matrix storage | .npz cache |
PEG matrix generation is expensive; cache it after first run |
| Test framework | pytest + Hypothesis | Property-based tests catch corner cases unit tests miss |
The v2.0 simulator is single-threaded by design. The event loop is synchronous; parallel_access.py exists for future work. Do not add locking to core data structures until the parallel model is designed.
| Operation | Target | Notes |
|---|---|---|
| FTL write (no GC) | < 5 µs | Dominated by array indexing |
| BCH encode (4096 B, t=4) | < 500 µs | GF polynomial arithmetic |
| LZ4 compress (4096 B) | < 10 µs | C extension via lz4 package |
| GC cycle (1 block, 256 pages) | < 50 ms | Acceptable tail latency spike |
| 1M simulated writes | < 60 s | End-to-end benchmark target |
For algorithm references, see docs/references.md. For configuration, see docs/configuration.md. For contributing, see CONTRIBUTING.md.