How data lives in memory while a database is open. For how it's laid out on disk, see file-format.md.
The canonical code is src/sql/db/table.rs.
Database
├── db_name: "users.sqlrite"
├── source_path: Some("/path/to/users.sqlrite")
├── pager: Some(Pager)
└── tables: HashMap<String, Table>
"users" -> Table
"notes" -> Table
...
Table
├── tb_name: "users"
├── columns: Vec<Column> ordered, schema
├── rows: Rc<RefCell<HashMap<String, Row>>>
│ keyed by column name
│ value = one column's full data
├── indexes: HashMap<String, String> (placeholder; not used yet)
├── last_rowid: i64
└── primary_key: "id" | "-1" "-1" = table has no explicit PK
Column
├── column_name: "name"
├── datatype: DataType::Text
├── is_pk: false
├── not_null: true
├── is_unique: true
├── is_indexed: true
└── index: Index::Text(BTreeMap<String, i64>)
Row (stored per column inside Table.rows)
├── Integer(BTreeMap<rowid: i64, value: i32>)
├── Text(BTreeMap<rowid: i64, value: String>)
├── Real(BTreeMap<rowid: i64, value: f32>)
├── Bool(BTreeMap<rowid: i64, value: bool>)
└── None
The surprising thing is that Table.rows is not keyed by rowid. It's keyed by column name, and each value is a Row enum holding a BTreeMap<rowid, value> for that one column.
In other words, to read "what's at row 5?" you have to ask each column's BTreeMap for key 5:
users.rows["id"] = Row::Integer({5 => 5})
users.rows["name"] = Row::Text({5 => "alice"})
users.rows["age"] = Row::Integer({5 => 30})
All columns share the same keyset. Insertion pads missing columns with NULL-ish values (see below), so every rowid has an entry in every column's map.
Historical rather than principled. The first pass of the codebase in 2020 used this shape, probably because it made per-column index maintenance straightforward. Phase 3c will replace it with row-oriented cells as part of moving to real B-Tree storage.
- Projecting a column (e.g.,
SELECT name FROM users) is one BTreeMap scan. Efficient. - Reassembling a row (
SELECT *) requires N map lookups for N columns. Each lookup is O(log R), so row reconstruction is O(C log R). Not terrible for small N but not ideal. - Deleting a row means removing the rowid from every column's BTreeMap and every index's BTreeMap. Handled by
Table::delete_row.
Every row has a rowid: i64. It's the implicit primary key when no explicit INTEGER PRIMARY KEY is declared, and it's the alias for the PK when one is. Table.last_rowid tracks the most recent assignment and is bumped on each insert.
When the user omits an INTEGER PRIMARY KEY column in an INSERT, insert_row auto-assigns last_rowid + 1. When the user supplies an explicit value, that value becomes the new rowid (and last_rowid advances to it, if it's larger).
Non-integer PRIMARY KEY columns are unusual; the code handles them but auto-assign only kicks in for the integer case.
Each Column carries an Index:
enum Index {
Integer(BTreeMap<i32, i64>), // value -> rowid
Text(BTreeMap<String, i64>), // value -> rowid
None, // not indexed
}Indexes exist only on columns marked PRIMARY KEY or UNIQUE, and only for Integer and Text types today. Real and Bool columns get Index::None even if declared unique, and UNIQUE enforcement falls back to a linear scan.
The index is maintained inline with every insert, update, and delete:
insert_rowinserts into the column's index after successfully inserting into the Row map.set_valueremoves the old index entry (aretain-based scan, since we look up by rowid) then inserts the new one.delete_rowstrips every index entry pointing at the rowid being deleted.
Indexes are also used on write paths for UNIQUE-constraint checks — validate_unique_constraint (called before every INSERT) does O(log R) contains_key lookups.
They are not yet used on read paths. Today's SELECT always does a full table scan via Table::rowids. A planner that can turn WHERE id = 5 into an index probe is Phase 3+ work.
Storage for NULL is inconsistent by type, which is a known wart:
- Text columns can store NULL by encoding the literal string
"Null"in the BTreeMap. Reads special-case this back toValue::NullinRow::get. Consequence: a user who actually inserts the string'Null'into a Text column will read backNULL. Acceptable for now, will be cleaned up in Phase 3c. - Integer / Real / Bool columns can't store NULL. If a user omits such a column in INSERT,
insert_rowreturns aType mismatcherror. This is stricter than SQL (which allows NULL by default unlessNOT NULLis declared) but safer than the old behavior of panicking on"Null".parse::<i32>().
A proper NULL-bitmap mechanism is on the Phase 3c to-do list.
When the executor evaluates an expression, it works with Value — a runtime enum with wider variants:
pub enum Value {
Integer(i64),
Text(String),
Real(f64),
Bool(bool),
Null,
}Conversion:
Row::get(rowid) → Option<Value>widens storagei32toValue::Integer(i64),f32toValue::Real(f64).Table::set_value(col, rowid, Value)narrows back withascasts. The declared column type enforces what's allowed (e.g., aValue::Textinto an Integer column is a type error, not a silent corruption).
This split keeps the executor type-agnostic — it just uses Value arithmetic — while storage stays compact.
executor::execute_insert(inline in theStatement::Insertarm): delegate toInsertQuery::newto get(table_name, columns, rows).- Look up the table via
db.get_table_mut. - For each row of values:
a. Check every value's column exists on the table.
b. Call
Table::validate_unique_constraint— uses the column indexes for O(log R) lookups. c. CallTable::insert_row— auto-assigns the PK if missing, then writes every column's BTreeMap + every index's BTreeMap in lockstep.
executor::execute_update: resolve assignment targets to column names, verify they exist.- Two passes to avoid borrow-checker fights:
- Read pass (
&db): walk every rowid, evaluate WHERE, for matching rows evaluate the assignment RHS expressions, collect planned writes. - Write pass (
&mut db): for each(rowid, [(col, value)]), callTable::set_value.
- Read pass (
set_valueenforces the column's declared type and UNIQUE constraint before mutation. It refreshes the index (old entry removed, new entry inserted).
executor::execute_delete: same two-pass pattern.- Read pass collects matching rowids.
- Write pass calls
Table::delete_row(rowid)for each.
CreateQuery::newproduces aParsedColumnlist with type + constraints.Table::newbuilds theTablestruct: allocates an emptyBTreeMapper column inrows, buildsColumnstructs including emptyIndexes on UNIQUE/PK columns.- Top-level dispatcher inserts the table into
db.tables.
executor::execute_selectlooks up the table.- Resolves projection to a concrete ordered column list.
- Walks every rowid from
Table::rowids(grabbed from the first column's BTreeMap, since all columns share the same keyset). - For each rowid:
- If a WHERE clause exists, evaluate
eval_predicateagainst the row's data. - If matched, keep the rowid for later.
- If a WHERE clause exists, evaluate
- Sort the matched rowids if ORDER BY is present.
- Truncate to LIMIT.
- Render as a prettytable string, column by column via
Table::get_value.
Full table scan, every time. Index-backed lookups are Phase 3+.
Changed (on-disk). Rows are no longer stored as one opaque bincode blob per table. Each row is serialized as a cell (length-prefixed, kind-tagged, null-bitmap + typed value blocks) and packed into TableLeaf pages behind a slot directory. Cells that don't fit spill into an overflow chain. The schema catalog is itself a table — sqlrite_master — stored in the same cell format. File format version is now 2. See file-format.md for the byte-level details.
Didn't change (in-memory). The Database / Table / Column / Row / Index structures described in the sections above are still the runtime representation. On save_database, Table::extract_row turns each in-memory row into a Cell; on open_database, each Cell is decoded and its values are handed to Table::restore_row, which populates the per-column BTreeMaps and rebuilds the indexes.
Phase 3d put a real B-Tree on top of the leaf layer. The on-disk file now has InteriorNode pages above TableLeaf pages; find_leftmost_leaf during open descends the tree, then the existing sibling-chain walk delivers cells in rowid order. Interior cells use the same cell_length | kind_tag | body prefix as local/overflow cells, so binary search for a rowid works uniformly across all page kinds.
Write path: save rebuilds the tree bottom-up from the in-memory sorted rows. No in-place splits or merges — the whole tree is emitted fresh on every commit. That's the educational and engineering trade-off to stay compatible with the "Table lives in memory, save flushes a full snapshot" model. Deterministic build (tables sorted by name, rows by rowid) keeps the Pager's diff commit effective: unchanged tables produce byte-identical pages and aren't rewritten.
Read path is still eager-load: load_table_rows walks every leaf and populates Table's in-memory maps up front. A cursor abstraction that streams rows through the pager without materializing the whole table is deferred to Phase 5 (the library-API split), where the cost of the bigger refactor also buys us the public Connection/Statement/Rows API.
Phase 3e added secondary indexes. Every UNIQUE / PRIMARY KEY column auto-creates a SecondaryIndex at CREATE TABLE time (named sqlrite_autoindex_<table>_<col>); CREATE [UNIQUE] INDEX name ON t (col) adds explicit ones. Column is now a pure schema descriptor — the per-column BTreeMap moved to Table::secondary_indexes, where it's shared infrastructure between auto and explicit indexes.
On disk. Every index persists as its own cell-based B-Tree using a new KIND_INDEX cell carrying (value, original_rowid). sqlrite_master grew a type column distinguishing 'table' rows from 'index' rows (file format v3).
Executor. The WHERE col = literal (and literal = col) shape now probes the matching index for an O(log N) lookup instead of scanning every row. Other predicate shapes still fall back to the full-scan path.
See roadmap.md for the next phase (WAL and file locks in Phase 4, library + WASM in Phase 5). The cursor / lazy-load refactor is parked explicitly in Phase 5 — it touches the executor, every Table accessor, and needs cursor state threaded through, so it pairs naturally with the public Connection / Statement / Rows API.