|
| 1 | +# Auto-ROWID: Automatic ULID Primary Key |
| 2 | + |
| 3 | +**Version:** 1.6.0 |
| 4 | +**Status:** ✅ Production-Ready |
| 5 | +**Last Updated:** July 2025 |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Overview |
| 10 | + |
| 11 | +SharpCoreDB automatically injects a hidden `_rowid` column as the primary key when a table is created **without an explicit `PRIMARY KEY`** definition. This follows the [SQLite rowid pattern](https://www.sqlite.org/rowidtable.html) but uses **ULID** (Universally Unique Lexicographically Sortable Identifier) instead of a monotonic integer. |
| 12 | + |
| 13 | +### Why ULID? |
| 14 | + |
| 15 | +| Property | ULID | Integer Auto-Increment | |
| 16 | +|----------|------|----------------------| |
| 17 | +| **Globally Unique** | ✅ Timestamp + random | ❌ Requires counter coordination | |
| 18 | +| **Lexicographically Sortable** | ✅ Time-ordered | ✅ Monotonic | |
| 19 | +| **Conflict-Free** | ✅ No coordination needed | ❌ Conflicts in distributed scenarios | |
| 20 | +| **B-Tree Friendly** | ✅ Compact, sortable | ✅ Sequential | |
| 21 | +| **External Dependencies** | ✅ None (built-in) | ✅ None | |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## How It Works |
| 26 | + |
| 27 | +### Table Creation |
| 28 | + |
| 29 | +When you create a table without a `PRIMARY KEY`: |
| 30 | + |
| 31 | +```sql |
| 32 | +CREATE TABLE logs ( |
| 33 | + message TEXT, |
| 34 | + level INTEGER, |
| 35 | + timestamp DATETIME |
| 36 | +) |
| 37 | +``` |
| 38 | + |
| 39 | +SharpCoreDB automatically injects a hidden `_rowid` column: |
| 40 | + |
| 41 | +``` |
| 42 | +Internal schema: _rowid (ULID, AUTO, PRIMARY KEY, NOT NULL), message (TEXT), level (INTEGER), timestamp (DATETIME) |
| 43 | +``` |
| 44 | + |
| 45 | +When you create a table **with** an explicit `PRIMARY KEY`, no `_rowid` is injected: |
| 46 | + |
| 47 | +```sql |
| 48 | +CREATE TABLE users ( |
| 49 | + id INTEGER PRIMARY KEY AUTO, |
| 50 | + name TEXT, |
| 51 | + email TEXT |
| 52 | +) |
| 53 | +-- No _rowid column is added; 'id' is the primary key |
| 54 | +``` |
| 55 | + |
| 56 | +### Querying |
| 57 | + |
| 58 | +#### `SELECT *` — `_rowid` is Hidden |
| 59 | + |
| 60 | +```sql |
| 61 | +SELECT * FROM logs |
| 62 | +``` |
| 63 | + |
| 64 | +Returns: |
| 65 | + |
| 66 | +| message | level | timestamp | |
| 67 | +|---------|-------|-----------| |
| 68 | +| "Server started" | 1 | 2025-07-01 12:00:00 | |
| 69 | +| "Request received" | 2 | 2025-07-01 12:00:01 | |
| 70 | + |
| 71 | +The `_rowid` column is **not included** in `SELECT *` results. |
| 72 | + |
| 73 | +#### Explicit `SELECT _rowid` — `_rowid` is Visible |
| 74 | + |
| 75 | +```sql |
| 76 | +SELECT _rowid, message, level FROM logs |
| 77 | +``` |
| 78 | + |
| 79 | +Returns: |
| 80 | + |
| 81 | +| _rowid | message | level | |
| 82 | +|--------|---------|-------| |
| 83 | +| 01J5ABCDEF0001GHIJKL000001 | "Server started" | 1 | |
| 84 | +| 01J5ABCDEF0001GHIJKL000002 | "Request received" | 2 | |
| 85 | + |
| 86 | +You can also use `_rowid` in `WHERE` clauses: |
| 87 | + |
| 88 | +```sql |
| 89 | +SELECT * FROM logs WHERE _rowid = '01J5ABCDEF0001GHIJKL000001' |
| 90 | +``` |
| 91 | + |
| 92 | +### INSERT Behavior |
| 93 | + |
| 94 | +When inserting into a table with an internal `_rowid`, you do **not** need to specify it: |
| 95 | + |
| 96 | +```sql |
| 97 | +-- Both of these work correctly: |
| 98 | +INSERT INTO logs VALUES ('Error occurred', 3, '2025-07-01 12:00:02') |
| 99 | +INSERT INTO logs (message, level, timestamp) VALUES ('Warning', 2, '2025-07-01 12:00:03') |
| 100 | +``` |
| 101 | + |
| 102 | +The `_rowid` is automatically generated using `Ulid.NewUlid()`. |
| 103 | + |
| 104 | +### DELETE and UPDATE |
| 105 | + |
| 106 | +The `_rowid` is used internally as the primary key for efficient DELETE and UPDATE operations. The B-Tree index on `_rowid` enables O(log n) lookups instead of full table scans: |
| 107 | + |
| 108 | +```sql |
| 109 | +-- Efficient: Uses _rowid B-Tree index internally |
| 110 | +DELETE FROM logs WHERE level = 3 |
| 111 | + |
| 112 | +-- Also efficient: Direct _rowid lookup |
| 113 | +DELETE FROM logs WHERE _rowid = '01J5ABCDEF0001GHIJKL000001' |
| 114 | +``` |
| 115 | + |
| 116 | +--- |
| 117 | + |
| 118 | +## Architecture |
| 119 | + |
| 120 | +### Storage Impact |
| 121 | + |
| 122 | +- **Column**: `_rowid` is stored as the first column in the table schema |
| 123 | +- **Type**: `DataType.Ulid` — 26 characters, Crockford Base32 encoded |
| 124 | +- **Storage overhead**: ~31 bytes per row (1 null flag + 4 length prefix + 26 chars) |
| 125 | +- **Index**: Automatic B-Tree index + hash index (same as explicit PKs) |
| 126 | + |
| 127 | +### Property: `HasInternalRowId` |
| 128 | + |
| 129 | +The `Table.HasInternalRowId` property (persisted in metadata) indicates whether a table has an auto-generated `_rowid`. This property controls: |
| 130 | + |
| 131 | +1. **SELECT behavior**: `_rowid` is stripped from `Select()` results, available via `SelectIncludingRowId()` |
| 132 | +2. **INSERT behavior**: SQL parser skips `_rowid` in user column mapping |
| 133 | +3. **Metadata**: `ColumnInfo.IsHidden = true` for `_rowid` columns |
| 134 | +4. **Persistence**: Saved and restored across database reopens |
| 135 | + |
| 136 | +### Metadata Schema |
| 137 | + |
| 138 | +The `HasInternalRowId` field is included in the table metadata JSON: |
| 139 | + |
| 140 | +```json |
| 141 | +{ |
| 142 | + "Name": "logs", |
| 143 | + "Columns": ["_rowid", "message", "level", "timestamp"], |
| 144 | + "ColumnTypes": [9, 2, 0, 6], |
| 145 | + "PrimaryKeyIndex": 0, |
| 146 | + "HasInternalRowId": true, |
| 147 | + "IsAuto": [true, false, false, false], |
| 148 | + ... |
| 149 | +} |
| 150 | +``` |
| 151 | + |
| 152 | +### Backward Compatibility |
| 153 | + |
| 154 | +- **Existing databases**: Tables created before this feature have `HasInternalRowId = false` (the default). No behavior change. |
| 155 | +- **Existing tables with explicit PKs**: Unaffected. `HasInternalRowId` is only `true` for tables created without a PK. |
| 156 | +- **Metadata format**: New field with default `false` — old versions can safely ignore it. |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | +## API Reference |
| 161 | + |
| 162 | +### Table Properties |
| 163 | + |
| 164 | +```csharp |
| 165 | +/// <summary> |
| 166 | +/// Gets whether this table has an auto-generated internal _rowid column. |
| 167 | +/// </summary> |
| 168 | +public bool HasInternalRowId { get; set; } |
| 169 | +``` |
| 170 | + |
| 171 | +### Select Methods |
| 172 | + |
| 173 | +```csharp |
| 174 | +// Standard: strips _rowid from results (default behavior) |
| 175 | +List<Dictionary<string, object>> Select(string? where, string? orderBy, bool asc, bool noEncrypt); |
| 176 | + |
| 177 | +// Raw: includes _rowid in results (for explicit _rowid queries) |
| 178 | +List<Dictionary<string, object>> SelectIncludingRowId(string? where, string? orderBy, bool asc, bool noEncrypt); |
| 179 | +``` |
| 180 | + |
| 181 | +### Metadata Discovery |
| 182 | + |
| 183 | +```csharp |
| 184 | +// Default: returns only user-visible columns (excludes _rowid) |
| 185 | +// Follows the SQLite PRAGMA table_info pattern. |
| 186 | +IReadOnlyList<ColumnInfo> GetColumns(string tableName); |
| 187 | + |
| 188 | +// Full: returns ALL columns including hidden _rowid (with IsHidden = true) |
| 189 | +// Use this when you need to inspect the complete internal schema. |
| 190 | +IReadOnlyList<ColumnInfo> GetColumnsIncludingHidden(string tableName); |
| 191 | +``` |
| 192 | + |
| 193 | +### ColumnInfo |
| 194 | + |
| 195 | +```csharp |
| 196 | +/// <summary> |
| 197 | +/// Whether this column is a hidden internal column (e.g., auto-generated _rowid). |
| 198 | +/// </summary> |
| 199 | +public bool IsHidden { get; init; } |
| 200 | +``` |
| 201 | + |
| 202 | +### Constants |
| 203 | + |
| 204 | +```csharp |
| 205 | +/// <summary> |
| 206 | +/// The name of the auto-generated internal row identifier column. |
| 207 | +/// </summary> |
| 208 | +public const string InternalRowIdColumnName = "_rowid"; |
| 209 | +``` |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +## Performance Characteristics |
| 214 | + |
| 215 | +| Operation | Without _rowid (old) | With _rowid (new) | |
| 216 | +|-----------|---------------------|-------------------| |
| 217 | +| **DELETE (columnar, no PK)** | Full storage scan O(n) | B-Tree lookup O(log n) ✅ | |
| 218 | +| **UPDATE (columnar, no PK)** | Full scan for position | B-Tree lookup O(log n) ✅ | |
| 219 | +| **INSERT** | Same | +1 ULID generation (~100ns) | |
| 220 | +| **SELECT *** | Same | +1 dict.Remove per row (~5ns) | |
| 221 | +| **Storage** | Same | +31 bytes per row | |
| 222 | + |
| 223 | +The DELETE/UPDATE performance improvement far outweighs the minimal INSERT and SELECT overhead. |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## Comparison with SQLite |
| 228 | + |
| 229 | +| Feature | SQLite `rowid` | SharpCoreDB `_rowid` | |
| 230 | +|---------|---------------|---------------------| |
| 231 | +| **Type** | 64-bit integer | ULID (26-char string) | |
| 232 | +| **Visibility** | Hidden in `SELECT *` | Hidden in `SELECT *` ✅ | |
| 233 | +| **Explicit query** | `SELECT rowid, ...` | `SELECT _rowid, ...` ✅ | |
| 234 | +| **Auto-generated** | Yes (monotonic) | Yes (timestamp + random) ✅ | |
| 235 | +| **Distributed-safe** | No | Yes ✅ | |
| 236 | +| **Tables with explicit PK** | rowid = PK alias | No _rowid injected ✅ | |
| 237 | + |
| 238 | +--- |
| 239 | + |
| 240 | +## FAQ |
| 241 | + |
| 242 | +**Q: Does `_rowid` affect my existing tables?** |
| 243 | +A: No. Only tables created **after** this feature without an explicit PRIMARY KEY get a `_rowid`. Existing tables are unaffected. |
| 244 | + |
| 245 | +**Q: Can I use `_rowid` in WHERE/ORDER BY?** |
| 246 | +A: Yes. The `_rowid` is a real column that can be queried explicitly. It's only hidden from `SELECT *`. |
| 247 | + |
| 248 | +**Q: What's the storage overhead?** |
| 249 | +A: ~31 bytes per row. For a table with 1 million rows, that's about 30 MB — a small price for efficient DELETE/UPDATE operations. |
| 250 | + |
| 251 | +**Q: Can I disable auto-`_rowid`?** |
| 252 | +A: Yes — simply define an explicit PRIMARY KEY on your table, and no `_rowid` will be injected. |
| 253 | + |
| 254 | +**Q: Is `_rowid` persisted across database restarts?** |
| 255 | +A: Yes. The `HasInternalRowId` flag and the `_rowid` column are fully persisted in metadata and data files. |
| 256 | + |
| 257 | +**Q: Does `_rowid` work with `ExecuteBatchSQL` and `BulkInsertAsync`?** |
| 258 | +A: Yes. All insert paths (SQL parsing, batch SQL, prepared statements, direct `InsertBatch`, optimized `BulkInsertAsync`, and `InsertBatchFromBuffer`) correctly skip the internal `_rowid` column during value mapping and auto-generate it during row validation. |
| 259 | + |
| 260 | +**Q: How does `GetColumns()` behave with `_rowid`?** |
| 261 | +A: `GetColumns()` (the `IMetadataProvider` interface method) follows the SQLite `PRAGMA table_info` pattern and **excludes** hidden `_rowid` columns. Use `GetColumnsIncludingHidden()` on the `Database` class to see all columns with `IsHidden = true` on internal ones. |
| 262 | + |
| 263 | +--- |
| 264 | + |
| 265 | +## Implementation Details |
| 266 | + |
| 267 | +### Insert Path Coverage |
| 268 | + |
| 269 | +The `_rowid` auto-generation is handled consistently across **all** insert paths: |
| 270 | + |
| 271 | +| Insert Path | Skip _rowid | Auto-Generate | Location | |
| 272 | +|------------|-------------|---------------|----------| |
| 273 | +| `ExecuteSQL("INSERT ...")` | ✅ SqlParser.DML.cs | Via `Table.Insert()` | `ExecuteInsert()` | |
| 274 | +| `ExecuteBatchSQL(...)` | ✅ Database.Batch.cs | Via `Table.InsertBatch()` | `ParseInsertStatement()` / `GetOrCreatePreparedInsert()` | |
| 275 | +| `BulkInsertAsync(...)` (< 5K rows) | N/A (dict API) | Via `Table.InsertBatch()` | `ValidateAndSerializeBatchOutsideLock()` | |
| 276 | +| `BulkInsertAsync(...)` (≥ 5K rows) | N/A (dict API) | Via `InsertBatchFromBuffer()` → `InsertBatch()` | `ValidateAndSerializeBatchOutsideLock()` | |
| 277 | +| `InsertBatch(rows)` direct | N/A (dict API) | ✅ Auto-generates when key missing or null | `ValidateAndSerializeBatchOutsideLock()` | |
| 278 | +| `InsertBatchFromBuffer(...)` | N/A (binary API) | ✅ Decoder produces null → auto-gen triggers | `ValidateAndSerializeBatchOutsideLock()` | |
| 279 | + |
| 280 | +### Internal Operations |
| 281 | + |
| 282 | +DELETE and UPDATE use `SelectInternal()` (instead of public `Select()`) to preserve the `_rowid` column in intermediate results. This ensures the B-Tree PK index lookup works correctly when locating storage positions for row mutation. |
| 283 | + |
| 284 | +### Schema Discovery |
| 285 | + |
| 286 | +``` |
| 287 | +┌─────────────────────────────────────────────────┐ |
| 288 | +│ GetColumns("logs") │ |
| 289 | +│ → [message (TEXT), level (INTEGER)] │ |
| 290 | +│ _rowid is EXCLUDED (SQLite PRAGMA pattern) │ |
| 291 | +├─────────────────────────────────────────────────┤ |
| 292 | +│ GetColumnsIncludingHidden("logs") │ |
| 293 | +│ → [_rowid (ULID, hidden), message (TEXT), │ |
| 294 | +│ level (INTEGER)] │ |
| 295 | +│ _rowid INCLUDED with IsHidden = true │ |
| 296 | +└─────────────────────────────────────────────────┘ |
| 297 | +``` |
| 298 | + |
| 299 | +### Auto-Generation Guard |
| 300 | + |
| 301 | +The `ValidateAndSerializeBatchOutsideLock()` method (used by all batch paths) handles three scenarios for auto-generated columns: |
| 302 | + |
| 303 | +1. **Key missing**: `!row.TryGetValue("_rowid", ...)` → auto-generate |
| 304 | +2. **Key present with null/DBNull**: Common when rows pass through `StreamingRowEncoder` → `BinaryRowDecoder` → auto-generate |
| 305 | +3. **Key present with valid value**: Use as-is (e.g., explicit `_rowid` in INSERT) |
| 306 | + |
| 307 | +This defensive approach prevents "Column '_rowid' cannot be NULL" errors across all insert paths. |
0 commit comments