@@ -148,7 +148,6 @@ Records are stored in a **self-describing binary format**. This means type infor
148148│ │
149149│ ... (repeat for all columns) │
150150└──────────────────────────────────────────────────┘
151- ```
152151
153152#### Type Markers
154153
@@ -592,7 +591,7 @@ Free Space Map Layout:
592591├─────────────────────────────────────────────┤
593592│ L2 Extent Map (Variable) │
594593│ ├─ Each extent: [StartPage: 8B][Count: 8B] │
595- │ └─ Optimized for large allocations │
594+ │ └─ Optimized for large allocations │
596595└─────────────────────────────────────────────┘
597596```
598597
@@ -679,7 +678,7 @@ internal sealed class BlockRegistry
679678for (int i = 0 ; i < 1000 ; i ++ )
680679{
681680 registry .SetBlock (names [i ], entries [i ]);
682- registry .Flush (); // ← Flushes to disk EVERY time!
681+ registry .Flush (); // Flushes to disk EVERY time!
683682 }
684683// Result: 1000 disk writes
685684
@@ -932,8 +931,8 @@ Page Layout (4KB = 4096 bytes):
932931Offset 0-3: [ColumnCount: 4]
933932Offset 4-20: [Column 1 metadata + value]
934933Offset 21-60: [Column 2 metadata + value]
935- Offset 61-3200: [Column 3: Short string]
936- Offset 3201-4090: [Column 4: Long string (890 bytes) ]
934+ Offset 61-3200: [Column 3 metadata + value (large string) ]
935+ Offset 3201-4090: [Column 4: Short string]
937936Offset 4091-4095: [unused: 5 bytes]
938937 ↑ NO SPLITTING NEEDED
939938 Record fits entirely (4091 bytes < 4096)
@@ -1107,102 +1106,64 @@ var metrics = blockRegistry.GetMetrics();
11071106
11081107### Q2: How big can strings be?
11091108
1110- ** A:** Theoretically up to 2 GB (int32 limit per string). Practically:
1111- - Small strings (< 1 KB): Very fast
1112- - Medium strings (1-100 MB): Still efficient
1113- - Large strings (> 100 MB): Will fragment disk, consider BLOB storage
1109+ ** A:** Limited by the ** page size** , not theoretically unlimited:
11141110
1115- ### Q3: How do I know where a record ends?
1111+ ** Default (4KB page):**
1112+ - Page capacity: 4096 bytes total
1113+ - Page header overhead: 40 bytes
1114+ - ** Available for data: 4056 bytes**
1115+ - Minus serialization overhead for column metadata
1116+ - ** Practical limit: ~ 4000-4050 bytes per complete record** (all columns combined!)
11161117
1117- ** A :** Via Block Registry! Each record is stored as a block:
1118+ ** Example breakdown :**
11181119``` csharp
1119- BlockEntry entry = registry [" Users_Row_001" ];
1120- ulong startOffset = entry .Offset ;
1121- ulong endOffset = entry .Offset + entry .Length ;
1120+ // 4KB page (4096 bytes)
1121+ Page structure :
1122+ ├─ Header : 40 bytes
1123+ └─ Data : 4056 bytes
1124+
1125+ Record with multiple columns :
1126+ ├─ ColumnCount (4 bytes )
1127+ ├─ Column 1 metadata + value
1128+ ├─ Column 2 metadata + value
1129+ ├─ Column 3 metadata + value (large string )
1130+ └─ Must ALL fit in 4056 bytes !
1131+
1132+ If total > 4056 bytes → ERROR !
11221133```
11231134
1124- ### Q4: Can strings be NULL?
1125-
1126- ** A:** Yes, via type marker 0:
1127- ``` csharp
1128- case null :
1129- buffer [offset ++ ] = 0 ; // Type: Null
1130- // No value follows
1131- ```
1135+ ** For larger strings:**
1136+ - ✅ Increase page size: Use 8KB, 16KB, or 32KB pages
1137+ - ✅ Use BLOB storage: For data > page size
1138+ - ✅ Normalize schema: Split into multiple records
11321139
1133- ### Q5: What about Unicode?
1140+ ** What Happens If You Try to Store Too Much? **
11341141
1135- ** A:** UTF-8 encoding, automatic length adjustment:
11361142``` csharp
1137- " Café" → 5 bytes (C - a - f - [2 - byte é])
1138- " 日本" → 6 bytes (3 chars × 2 bytes each )
1139- " 🚀" → 4 bytes (1 char × 4 bytes )
1140- ```
1141-
1142- ### Q6: Can I modify strings directly without rewriting the record?
1143-
1144- ** A:** No, SharpCoreDB works immutably:
1145- 1 . Load record (deserialize)
1146- 2 . Modify in memory
1147- 3 . Serialize & write new block
1148- 4 . Update registry
1149- 5 . Mark old block as free (WAL handles recovery)
1150-
1151- ### Q7: What about compression?
1143+ // Example: Trying to store record > page size
11521144
1153- ** A:** Not currently implemented. Reserved in header for future use.
1154- Current focus: Zero-allocation serialization is faster than compression overhead.
1145+ var row = new Dictionary <string , object >
1146+ {
1147+ [" UserId" ] = 1 ,
1148+ [" LargeText" ] = new string ('X' , 4100 ), // 4100 bytes!
1149+ };
11551150
1156- ### Q8: How is free space distributed?
1151+ try
1152+ {
1153+ db .InsertRecord (row );
1154+ }
1155+ catch (InvalidOperationException ex )
1156+ {
1157+ // Exception message:
1158+ // "Record too large (4158 bytes) for page size (4096 bytes)"
1159+ // Serialized size is 4158, but max is 4056!
1160+
1161+ Console .WriteLine (ex .Message );
1162+ }
11571163
1158- ** A:** Non-contiguous! Records can be scattered throughout the file:
1159- ```
1160- File layout:
1161- [Block1: 4KB] [Block2: 8KB] [Free: 2KB] [Block3: 4KB] [Free: 1KB] [Block4: 2KB]
1164+ // Code that causes this:
1165+ // if (recordData.Length > MAX_RECORD_SIZE) // MAX_RECORD_SIZE ≈ 4056
1166+ // return Error("Record too large for page");
11621167```
1163- No fragmentation warning needed - FSM manages this transparently.
1164-
1165- ### Q9: Can I store an entire table in one "block"?
1166-
1167- ** A:** No, each row is a separate block. Advantages:
1168- - Finer-grained locking
1169- - Better cache-locality
1170- - Flexible sizing
1171-
1172- ### Q10: How do transactions work?
1173-
1174- ** A:** Managed via WAL (Write-Ahead Log):
1175- 1 . Begin transaction
1176- 2 . Writes go to WAL first
1177- 3 . On commit, registry updated
1178- 4 . On crash, WAL replayed
1179-
1180- ---
1181-
1182- ## 📚 Related Documentation
1183-
1184- - ` FILE_FORMAT_DESIGN.md ` - Low-level binary format details
1185- - ` SCHEMA_PERSISTENCE_TECHNICAL_DETAILS.md ` - Schema storage
1186- - ` CODING_STANDARDS_CSHARP14.md ` - Code style guide
1187- - Phase 3 completion reports - Performance benchmarks
1188-
1189- ---
1190-
1191- ## 🎓 Summary
1192-
1193- | Aspect | Answer |
1194- | --------| --------|
1195- | ** Fixed-length strings?** | ❌ No! Variable-length with 4-byte length prefix |
1196- | ** Max string size?** | 2 GB (int32 limit) |
1197- | ** Free space needed?** | ❌ No! Automatic exponential file growth |
1198- | ** Record boundaries?** | Via Block Registry (O(1) lookup) |
1199- | ** Column boundaries?** | Self-describing binary format (no fixed positions) |
1200- | ** Unicode support?** | ✅ Full UTF-8 support |
1201- | ** Performance?** | 3x faster than JSON, zero-allocation serialization |
1202-
1203- ---
12041168
1205- ** Last Updated:** January 2025
1206- ** Phase:** 3.3 (Serialization & Storage Optimization)
1207- ** Status:** Complete
12081169
0 commit comments