Skip to content

Commit 7c241b1

Browse files
author
MPCoreDeveloper
committed
docs: Update FAQ with accurate error handling and limits
1 parent 4d18074 commit 7c241b1

File tree

1 file changed

+51
-90
lines changed

1 file changed

+51
-90
lines changed

docs/serialization/SERIALIZATION_AND_STORAGE_GUIDE.md

Lines changed: 51 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,6 @@ Records are stored in a **self-describing binary format**. This means type infor
148148
│ │
149149
│ ... (repeat for all columns) │
150150
└──────────────────────────────────────────────────┘
151-
```
152151
153152
#### Type Markers
154153
@@ -592,7 +591,7 @@ Free Space Map Layout:
592591
├─────────────────────────────────────────────┤
593592
│ L2 Extent Map (Variable) │
594593
│ ├─ Each extent: [StartPage: 8B][Count: 8B] │
595-
│ └─ Optimized for large allocations
594+
│ └─ Optimized for large allocations │
596595
└─────────────────────────────────────────────┘
597596
```
598597

@@ -679,7 +678,7 @@ internal sealed class BlockRegistry
679678
for (int i = 0; i < 1000; i++)
680679
{
681680
registry.SetBlock(names[i], entries[i]);
682-
registry.Flush(); // Flushes to disk EVERY time!
681+
registry.Flush(); // Flushes to disk EVERY time!
683682
}
684683
// Result: 1000 disk writes
685684
@@ -932,8 +931,8 @@ Page Layout (4KB = 4096 bytes):
932931
Offset 0-3: [ColumnCount: 4]
933932
Offset 4-20: [Column 1 metadata + value]
934933
Offset 21-60: [Column 2 metadata + value]
935-
Offset 61-3200: [Column 3: Short string]
936-
Offset 3201-4090: [Column 4: Long string (890 bytes)]
934+
Offset 61-3200: [Column 3 metadata + value (large string)]
935+
Offset 3201-4090: [Column 4: Short string]
937936
Offset 4091-4095: [unused: 5 bytes]
938937
↑ NO SPLITTING NEEDED
939938
Record fits entirely (4091 bytes < 4096)
@@ -1107,102 +1106,64 @@ var metrics = blockRegistry.GetMetrics();
11071106

11081107
### Q2: How big can strings be?
11091108

1110-
**A:** Theoretically up to 2 GB (int32 limit per string). Practically:
1111-
- Small strings (< 1 KB): Very fast
1112-
- Medium strings (1-100 MB): Still efficient
1113-
- Large strings (> 100 MB): Will fragment disk, consider BLOB storage
1109+
**A:** Limited by the **page size**, not theoretically unlimited:
11141110

1115-
### Q3: How do I know where a record ends?
1111+
**Default (4KB page):**
1112+
- Page capacity: 4096 bytes total
1113+
- Page header overhead: 40 bytes
1114+
- **Available for data: 4056 bytes**
1115+
- Minus serialization overhead for column metadata
1116+
- **Practical limit: ~4000-4050 bytes per complete record** (all columns combined!)
11161117

1117-
**A:** Via Block Registry! Each record is stored as a block:
1118+
**Example breakdown:**
11181119
```csharp
1119-
BlockEntry entry = registry["Users_Row_001"];
1120-
ulong startOffset = entry.Offset;
1121-
ulong endOffset = entry.Offset + entry.Length;
1120+
// 4KB page (4096 bytes)
1121+
Page structure:
1122+
├─ Header: 40 bytes
1123+
└─ Data: 4056 bytes
1124+
1125+
Record with multiple columns:
1126+
├─ ColumnCount (4 bytes)
1127+
├─ Column 1 metadata + value
1128+
├─ Column 2 metadata + value
1129+
├─ Column 3 metadata + value (large string)
1130+
└─ Must ALL fit in 4056 bytes!
1131+
1132+
If total > 4056 bytesERROR!
11221133
```
11231134

1124-
### Q4: Can strings be NULL?
1125-
1126-
**A:** Yes, via type marker 0:
1127-
```csharp
1128-
case null:
1129-
buffer[offset++] = 0; // Type: Null
1130-
// No value follows
1131-
```
1135+
**For larger strings:**
1136+
- ✅ Increase page size: Use 8KB, 16KB, or 32KB pages
1137+
- ✅ Use BLOB storage: For data > page size
1138+
- ✅ Normalize schema: Split into multiple records
11321139

1133-
### Q5: What about Unicode?
1140+
**What Happens If You Try to Store Too Much?**
11341141

1135-
**A:** UTF-8 encoding, automatic length adjustment:
11361142
```csharp
1137-
"Café"5 bytes (C-a-f-[2-byte é])
1138-
"日本"6 bytes (3 chars × 2 bytes each)
1139-
"🚀"4 bytes (1 char × 4 bytes)
1140-
```
1141-
1142-
### Q6: Can I modify strings directly without rewriting the record?
1143-
1144-
**A:** No, SharpCoreDB works immutably:
1145-
1. Load record (deserialize)
1146-
2. Modify in memory
1147-
3. Serialize & write new block
1148-
4. Update registry
1149-
5. Mark old block as free (WAL handles recovery)
1150-
1151-
### Q7: What about compression?
1143+
// Example: Trying to store record > page size
11521144
1153-
**A:** Not currently implemented. Reserved in header for future use.
1154-
Current focus: Zero-allocation serialization is faster than compression overhead.
1145+
var row = new Dictionary<string, object>
1146+
{
1147+
["UserId"] = 1,
1148+
["LargeText"] = new string('X', 4100), // 4100 bytes!
1149+
};
11551150

1156-
### Q8: How is free space distributed?
1151+
try
1152+
{
1153+
db.InsertRecord(row);
1154+
}
1155+
catch (InvalidOperationException ex)
1156+
{
1157+
// Exception message:
1158+
// "Record too large (4158 bytes) for page size (4096 bytes)"
1159+
// Serialized size is 4158, but max is 4056!
1160+
1161+
Console.WriteLine(ex.Message);
1162+
}
11571163

1158-
**A:** Non-contiguous! Records can be scattered throughout the file:
1159-
```
1160-
File layout:
1161-
[Block1: 4KB] [Block2: 8KB] [Free: 2KB] [Block3: 4KB] [Free: 1KB] [Block4: 2KB]
1164+
// Code that causes this:
1165+
// if (recordData.Length > MAX_RECORD_SIZE) // MAX_RECORD_SIZE ≈ 4056
1166+
// return Error("Record too large for page");
11621167
```
1163-
No fragmentation warning needed - FSM manages this transparently.
1164-
1165-
### Q9: Can I store an entire table in one "block"?
1166-
1167-
**A:** No, each row is a separate block. Advantages:
1168-
- Finer-grained locking
1169-
- Better cache-locality
1170-
- Flexible sizing
1171-
1172-
### Q10: How do transactions work?
1173-
1174-
**A:** Managed via WAL (Write-Ahead Log):
1175-
1. Begin transaction
1176-
2. Writes go to WAL first
1177-
3. On commit, registry updated
1178-
4. On crash, WAL replayed
1179-
1180-
---
1181-
1182-
## 📚 Related Documentation
1183-
1184-
- `FILE_FORMAT_DESIGN.md` - Low-level binary format details
1185-
- `SCHEMA_PERSISTENCE_TECHNICAL_DETAILS.md` - Schema storage
1186-
- `CODING_STANDARDS_CSHARP14.md` - Code style guide
1187-
- Phase 3 completion reports - Performance benchmarks
1188-
1189-
---
1190-
1191-
## 🎓 Summary
1192-
1193-
| Aspect | Answer |
1194-
|--------|--------|
1195-
| **Fixed-length strings?** | ❌ No! Variable-length with 4-byte length prefix |
1196-
| **Max string size?** | 2 GB (int32 limit) |
1197-
| **Free space needed?** | ❌ No! Automatic exponential file growth |
1198-
| **Record boundaries?** | Via Block Registry (O(1) lookup) |
1199-
| **Column boundaries?** | Self-describing binary format (no fixed positions) |
1200-
| **Unicode support?** | ✅ Full UTF-8 support |
1201-
| **Performance?** | 3x faster than JSON, zero-allocation serialization |
1202-
1203-
---
12041168

1205-
**Last Updated:** January 2025
1206-
**Phase:** 3.3 (Serialization & Storage Optimization)
1207-
**Status:** Complete
12081169

0 commit comments

Comments
 (0)