Skip to content

Commit c09a162

Browse files
author
MPCoreDeveloper
committed
new auto _rowid like sql lite only we use a ULID
1 parent 06593cb commit c09a162

File tree

19 files changed

+830
-25
lines changed

19 files changed

+830
-25
lines changed

docs/CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
- **Auto-ROWID**: Tables created without an explicit `PRIMARY KEY` now receive a hidden `_rowid` column (ULID type, auto-generated). Follows the SQLite rowid pattern — invisible in `SELECT *`, visible when explicitly queried via `SELECT _rowid, ...`. See [`docs/features/AUTO_ROWID.md`](features/AUTO_ROWID.md) for full documentation.
12+
- `Table.HasInternalRowId` property (persisted in metadata) to track tables with auto-generated `_rowid`.
13+
- `Table.SelectIncludingRowId()` method for queries that explicitly request `_rowid`.
14+
- `Database.GetColumnsIncludingHidden()` for schema discovery including hidden columns (with `IsHidden` flag).
15+
- `ColumnInfo.IsHidden` property for metadata-driven schema tools.
16+
- `PersistenceConstants.InternalRowIdColumnName` constant (`"_rowid"`).
17+
- 9 dedicated tests for the Auto-ROWID feature in `AutoRowIdTests.cs`.
18+
1019
### Fixed
1120
- Unified `IS NULL` / `IS NOT NULL` behavior across runtime scan, join-helper, and compiled predicate paths.
1221
- Added parser support for scalar function expressions in SELECT columns (including `COALESCE(...)`) and parenthesized subquery expressions.

docs/FEATURE_MATRIX.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@
142142
| AES-256-GCM Encryption | ✅ Complete | 1.0.0 | SharpCoreDB | At-rest encryption |
143143
| Compression (LZ4, Brotli) | ✅ Complete | 1.0.0 | SharpCoreDB | Automatic |
144144
| Metadata Compression (Brotli) | ✅ Complete | 1.6.0 | SharpCoreDB | 60-80% reduction |
145+
| Auto-ROWID (ULID) | ✅ Complete | 1.6.0 | SharpCoreDB | Hidden PK for tables without explicit PK ([docs](features/AUTO_ROWID.md)) |
145146
| **Indexing** |
146147
| B-tree Index | ✅ Complete | 1.0.0 | SharpCoreDB | Range queries |
147148
| Hash Index | ✅ Complete | 1.0.0 | SharpCoreDB | Equality lookups |

docs/features/AUTO_ROWID.md

Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
# Auto-ROWID: Automatic ULID Primary Key
2+
3+
**Version:** 1.6.0
4+
**Status:** ✅ Production-Ready
5+
**Last Updated:** July 2025
6+
7+
---
8+
9+
## Overview
10+
11+
SharpCoreDB automatically injects a hidden `_rowid` column as the primary key when a table is created **without an explicit `PRIMARY KEY`** definition. This follows the [SQLite rowid pattern](https://www.sqlite.org/rowidtable.html) but uses **ULID** (Universally Unique Lexicographically Sortable Identifier) instead of a monotonic integer.
12+
13+
### Why ULID?
14+
15+
| Property | ULID | Integer Auto-Increment |
16+
|----------|------|----------------------|
17+
| **Globally Unique** | ✅ Timestamp + random | ❌ Requires counter coordination |
18+
| **Lexicographically Sortable** | ✅ Time-ordered | ✅ Monotonic |
19+
| **Conflict-Free** | ✅ No coordination needed | ❌ Conflicts in distributed scenarios |
20+
| **B-Tree Friendly** | ✅ Compact, sortable | ✅ Sequential |
21+
| **External Dependencies** | ✅ None (built-in) | ✅ None |
22+
23+
---
24+
25+
## How It Works
26+
27+
### Table Creation
28+
29+
When you create a table without a `PRIMARY KEY`:
30+
31+
```sql
32+
CREATE TABLE logs (
33+
message TEXT,
34+
level INTEGER,
35+
timestamp DATETIME
36+
)
37+
```
38+
39+
SharpCoreDB automatically injects a hidden `_rowid` column:
40+
41+
```
42+
Internal schema: _rowid (ULID, AUTO, PRIMARY KEY, NOT NULL), message (TEXT), level (INTEGER), timestamp (DATETIME)
43+
```
44+
45+
When you create a table **with** an explicit `PRIMARY KEY`, no `_rowid` is injected:
46+
47+
```sql
48+
CREATE TABLE users (
49+
id INTEGER PRIMARY KEY AUTO,
50+
name TEXT,
51+
email TEXT
52+
)
53+
-- No _rowid column is added; 'id' is the primary key
54+
```
55+
56+
### Querying
57+
58+
#### `SELECT *``_rowid` is Hidden
59+
60+
```sql
61+
SELECT * FROM logs
62+
```
63+
64+
Returns:
65+
66+
| message | level | timestamp |
67+
|---------|-------|-----------|
68+
| "Server started" | 1 | 2025-07-01 12:00:00 |
69+
| "Request received" | 2 | 2025-07-01 12:00:01 |
70+
71+
The `_rowid` column is **not included** in `SELECT *` results.
72+
73+
#### Explicit `SELECT _rowid``_rowid` is Visible
74+
75+
```sql
76+
SELECT _rowid, message, level FROM logs
77+
```
78+
79+
Returns:
80+
81+
| _rowid | message | level |
82+
|--------|---------|-------|
83+
| 01J5ABCDEF0001GHIJKL000001 | "Server started" | 1 |
84+
| 01J5ABCDEF0001GHIJKL000002 | "Request received" | 2 |
85+
86+
You can also use `_rowid` in `WHERE` clauses:
87+
88+
```sql
89+
SELECT * FROM logs WHERE _rowid = '01J5ABCDEF0001GHIJKL000001'
90+
```
91+
92+
### INSERT Behavior
93+
94+
When inserting into a table with an internal `_rowid`, you do **not** need to specify it:
95+
96+
```sql
97+
-- Both of these work correctly:
98+
INSERT INTO logs VALUES ('Error occurred', 3, '2025-07-01 12:00:02')
99+
INSERT INTO logs (message, level, timestamp) VALUES ('Warning', 2, '2025-07-01 12:00:03')
100+
```
101+
102+
The `_rowid` is automatically generated using `Ulid.NewUlid()`.
103+
104+
### DELETE and UPDATE
105+
106+
The `_rowid` is used internally as the primary key for efficient DELETE and UPDATE operations. The B-Tree index on `_rowid` enables O(log n) lookups instead of full table scans:
107+
108+
```sql
109+
-- Efficient: Uses _rowid B-Tree index internally
110+
DELETE FROM logs WHERE level = 3
111+
112+
-- Also efficient: Direct _rowid lookup
113+
DELETE FROM logs WHERE _rowid = '01J5ABCDEF0001GHIJKL000001'
114+
```
115+
116+
---
117+
118+
## Architecture
119+
120+
### Storage Impact
121+
122+
- **Column**: `_rowid` is stored as the first column in the table schema
123+
- **Type**: `DataType.Ulid` — 26 characters, Crockford Base32 encoded
124+
- **Storage overhead**: ~31 bytes per row (1 null flag + 4 length prefix + 26 chars)
125+
- **Index**: Automatic B-Tree index + hash index (same as explicit PKs)
126+
127+
### Property: `HasInternalRowId`
128+
129+
The `Table.HasInternalRowId` property (persisted in metadata) indicates whether a table has an auto-generated `_rowid`. This property controls:
130+
131+
1. **SELECT behavior**: `_rowid` is stripped from `Select()` results, available via `SelectIncludingRowId()`
132+
2. **INSERT behavior**: SQL parser skips `_rowid` in user column mapping
133+
3. **Metadata**: `ColumnInfo.IsHidden = true` for `_rowid` columns
134+
4. **Persistence**: Saved and restored across database reopens
135+
136+
### Metadata Schema
137+
138+
The `HasInternalRowId` field is included in the table metadata JSON:
139+
140+
```json
141+
{
142+
"Name": "logs",
143+
"Columns": ["_rowid", "message", "level", "timestamp"],
144+
"ColumnTypes": [9, 2, 0, 6],
145+
"PrimaryKeyIndex": 0,
146+
"HasInternalRowId": true,
147+
"IsAuto": [true, false, false, false],
148+
...
149+
}
150+
```
151+
152+
### Backward Compatibility
153+
154+
- **Existing databases**: Tables created before this feature have `HasInternalRowId = false` (the default). No behavior change.
155+
- **Existing tables with explicit PKs**: Unaffected. `HasInternalRowId` is only `true` for tables created without a PK.
156+
- **Metadata format**: New field with default `false` — old versions can safely ignore it.
157+
158+
---
159+
160+
## API Reference
161+
162+
### Table Properties
163+
164+
```csharp
165+
/// <summary>
166+
/// Gets whether this table has an auto-generated internal _rowid column.
167+
/// </summary>
168+
public bool HasInternalRowId { get; set; }
169+
```
170+
171+
### Select Methods
172+
173+
```csharp
174+
// Standard: strips _rowid from results (default behavior)
175+
List<Dictionary<string, object>> Select(string? where, string? orderBy, bool asc, bool noEncrypt);
176+
177+
// Raw: includes _rowid in results (for explicit _rowid queries)
178+
List<Dictionary<string, object>> SelectIncludingRowId(string? where, string? orderBy, bool asc, bool noEncrypt);
179+
```
180+
181+
### Metadata Discovery
182+
183+
```csharp
184+
// Default: returns only user-visible columns (excludes _rowid)
185+
// Follows the SQLite PRAGMA table_info pattern.
186+
IReadOnlyList<ColumnInfo> GetColumns(string tableName);
187+
188+
// Full: returns ALL columns including hidden _rowid (with IsHidden = true)
189+
// Use this when you need to inspect the complete internal schema.
190+
IReadOnlyList<ColumnInfo> GetColumnsIncludingHidden(string tableName);
191+
```
192+
193+
### ColumnInfo
194+
195+
```csharp
196+
/// <summary>
197+
/// Whether this column is a hidden internal column (e.g., auto-generated _rowid).
198+
/// </summary>
199+
public bool IsHidden { get; init; }
200+
```
201+
202+
### Constants
203+
204+
```csharp
205+
/// <summary>
206+
/// The name of the auto-generated internal row identifier column.
207+
/// </summary>
208+
public const string InternalRowIdColumnName = "_rowid";
209+
```
210+
211+
---
212+
213+
## Performance Characteristics
214+
215+
| Operation | Without _rowid (old) | With _rowid (new) |
216+
|-----------|---------------------|-------------------|
217+
| **DELETE (columnar, no PK)** | Full storage scan O(n) | B-Tree lookup O(log n) ✅ |
218+
| **UPDATE (columnar, no PK)** | Full scan for position | B-Tree lookup O(log n) ✅ |
219+
| **INSERT** | Same | +1 ULID generation (~100ns) |
220+
| **SELECT *** | Same | +1 dict.Remove per row (~5ns) |
221+
| **Storage** | Same | +31 bytes per row |
222+
223+
The DELETE/UPDATE performance improvement far outweighs the minimal INSERT and SELECT overhead.
224+
225+
---
226+
227+
## Comparison with SQLite
228+
229+
| Feature | SQLite `rowid` | SharpCoreDB `_rowid` |
230+
|---------|---------------|---------------------|
231+
| **Type** | 64-bit integer | ULID (26-char string) |
232+
| **Visibility** | Hidden in `SELECT *` | Hidden in `SELECT *`|
233+
| **Explicit query** | `SELECT rowid, ...` | `SELECT _rowid, ...`|
234+
| **Auto-generated** | Yes (monotonic) | Yes (timestamp + random) ✅ |
235+
| **Distributed-safe** | No | Yes ✅ |
236+
| **Tables with explicit PK** | rowid = PK alias | No _rowid injected ✅ |
237+
238+
---
239+
240+
## FAQ
241+
242+
**Q: Does `_rowid` affect my existing tables?**
243+
A: No. Only tables created **after** this feature without an explicit PRIMARY KEY get a `_rowid`. Existing tables are unaffected.
244+
245+
**Q: Can I use `_rowid` in WHERE/ORDER BY?**
246+
A: Yes. The `_rowid` is a real column that can be queried explicitly. It's only hidden from `SELECT *`.
247+
248+
**Q: What's the storage overhead?**
249+
A: ~31 bytes per row. For a table with 1 million rows, that's about 30 MB — a small price for efficient DELETE/UPDATE operations.
250+
251+
**Q: Can I disable auto-`_rowid`?**
252+
A: Yes — simply define an explicit PRIMARY KEY on your table, and no `_rowid` will be injected.
253+
254+
**Q: Is `_rowid` persisted across database restarts?**
255+
A: Yes. The `HasInternalRowId` flag and the `_rowid` column are fully persisted in metadata and data files.
256+
257+
**Q: Does `_rowid` work with `ExecuteBatchSQL` and `BulkInsertAsync`?**
258+
A: Yes. All insert paths (SQL parsing, batch SQL, prepared statements, direct `InsertBatch`, optimized `BulkInsertAsync`, and `InsertBatchFromBuffer`) correctly skip the internal `_rowid` column during value mapping and auto-generate it during row validation.
259+
260+
**Q: How does `GetColumns()` behave with `_rowid`?**
261+
A: `GetColumns()` (the `IMetadataProvider` interface method) follows the SQLite `PRAGMA table_info` pattern and **excludes** hidden `_rowid` columns. Use `GetColumnsIncludingHidden()` on the `Database` class to see all columns with `IsHidden = true` on internal ones.
262+
263+
---
264+
265+
## Implementation Details
266+
267+
### Insert Path Coverage
268+
269+
The `_rowid` auto-generation is handled consistently across **all** insert paths:
270+
271+
| Insert Path | Skip _rowid | Auto-Generate | Location |
272+
|------------|-------------|---------------|----------|
273+
| `ExecuteSQL("INSERT ...")` | ✅ SqlParser.DML.cs | Via `Table.Insert()` | `ExecuteInsert()` |
274+
| `ExecuteBatchSQL(...)` | ✅ Database.Batch.cs | Via `Table.InsertBatch()` | `ParseInsertStatement()` / `GetOrCreatePreparedInsert()` |
275+
| `BulkInsertAsync(...)` (< 5K rows) | N/A (dict API) | Via `Table.InsertBatch()` | `ValidateAndSerializeBatchOutsideLock()` |
276+
| `BulkInsertAsync(...)` (≥ 5K rows) | N/A (dict API) | Via `InsertBatchFromBuffer()``InsertBatch()` | `ValidateAndSerializeBatchOutsideLock()` |
277+
| `InsertBatch(rows)` direct | N/A (dict API) | ✅ Auto-generates when key missing or null | `ValidateAndSerializeBatchOutsideLock()` |
278+
| `InsertBatchFromBuffer(...)` | N/A (binary API) | ✅ Decoder produces null → auto-gen triggers | `ValidateAndSerializeBatchOutsideLock()` |
279+
280+
### Internal Operations
281+
282+
DELETE and UPDATE use `SelectInternal()` (instead of public `Select()`) to preserve the `_rowid` column in intermediate results. This ensures the B-Tree PK index lookup works correctly when locating storage positions for row mutation.
283+
284+
### Schema Discovery
285+
286+
```
287+
┌─────────────────────────────────────────────────┐
288+
│ GetColumns("logs") │
289+
│ → [message (TEXT), level (INTEGER)] │
290+
│ _rowid is EXCLUDED (SQLite PRAGMA pattern) │
291+
├─────────────────────────────────────────────────┤
292+
│ GetColumnsIncludingHidden("logs") │
293+
│ → [_rowid (ULID, hidden), message (TEXT), │
294+
│ level (INTEGER)] │
295+
│ _rowid INCLUDED with IsHidden = true │
296+
└─────────────────────────────────────────────────┘
297+
```
298+
299+
### Auto-Generation Guard
300+
301+
The `ValidateAndSerializeBatchOutsideLock()` method (used by all batch paths) handles three scenarios for auto-generated columns:
302+
303+
1. **Key missing**: `!row.TryGetValue("_rowid", ...)` → auto-generate
304+
2. **Key present with null/DBNull**: Common when rows pass through `StreamingRowEncoder``BinaryRowDecoder` → auto-generate
305+
3. **Key present with valid value**: Use as-is (e.g., explicit `_rowid` in INSERT)
306+
307+
This defensive approach prevents "Column '_rowid' cannot be NULL" errors across all insert paths.

src/SharpCoreDB/Constants/PersistenceConstants.cs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,12 @@ public static class PersistenceConstants
2121

2222
/// <summary>The key for tables in metadata.</summary>
2323
public const string TablesKey = "tables";
24+
25+
/// <summary>
26+
/// The name of the auto-generated internal row identifier column.
27+
/// Injected as primary key when a table is created without an explicit PRIMARY KEY.
28+
/// Uses ULID type for globally unique, lexicographically sortable identifiers.
29+
/// Hidden from SELECT * but queryable via explicit column reference.
30+
/// </summary>
31+
public const string InternalRowIdColumnName = "_rowid";
2432
}

src/SharpCoreDB/DataStructures/ColumnInfo.cs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,11 @@ public sealed record ColumnInfo
3636
/// ✅ COLLATE Phase 1: Exposed via metadata discovery for ADO.NET/EF Core providers.
3737
/// </summary>
3838
public string? Collation { get; init; }
39+
40+
/// <summary>
41+
/// Whether this column is a hidden internal column (e.g., auto-generated <c>_rowid</c>).
42+
/// Hidden columns are not returned by <c>SELECT *</c> but can be explicitly queried.
43+
/// ✅ AUTO-ROWID: Marks the internal ULID primary key as hidden.
44+
/// </summary>
45+
public bool IsHidden { get; init; }
3946
}

0 commit comments

Comments
 (0)