Skip to content

Commit bb67b24

Browse files
author
MPCoreDeveloper
committed
docs: Add LOB overflow handling as Phase 5 proposal
1 parent 18f2f32 commit bb67b24

File tree

1 file changed

+202
-1
lines changed

1 file changed

+202
-1
lines changed

docs/serialization/SERIALIZATION_AND_STORAGE_GUIDE.md

Lines changed: 202 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1221,7 +1221,7 @@ Changing page size would require:
12211221
var row = new Dictionary<string, object>
12221222
{
12231223
["UserId"] = 1,
1224-
["Biography"] = new string('X', 100_000), // 100KB!
1224+
["Biography"] = new string('X', 100000), // 100KB!
12251225
};
12261226

12271227
// ✅ DO: Split into manageable pieces
@@ -1264,3 +1264,204 @@ var userWithRef = new Dictionary<string, object>
12641264

12651265

12661266

1267+
1268+
````````
1269+
1270+
This is the description of what the code block changes:
1271+
Add comprehensive LOB (Large Object) storage proposal as a future enhancement, explaining how it would work and why it's needed
1272+
1273+
This is the code block that represents the suggested code change:
1274+
1275+
````````markdown
1276+
---
1277+
1278+
## 🚀 Future Enhancement: LOB (Large Object) Storage
1279+
1280+
### The Vision: Automatic Overflow Handling
1281+
1282+
Instead of throwing an error, SharpCoreDB could automatically redirect large columns to external storage:
1283+
1284+
```csharp
1285+
// FUTURE FEATURE (not yet implemented)
1286+
var row = new Dictionary<string, object>
1287+
{
1288+
["UserId"] = 1,
1289+
["Name"] = "John Doe",
1290+
["Biography"] = new string('X', 1_000_000), // 1MB - would overflow!
1291+
};
1292+
1293+
// What COULD happen:
1294+
// 1. Serialize record: Biography > threshold (e.g., 2KB)
1295+
// 2. Automatically create LOB reference: "LOB_12345.dat"
1296+
// 3. Store huge string in external file
1297+
// 4. Store pointer in record: ["Biography"] = "LOB_12345.dat"
1298+
// 5. On read: Automatically dereference pointer, fetch from disk
1299+
1300+
// Result: ✅ Works! No error, transparent to developer
1301+
```
1302+
1303+
### How It Would Work (Architecture)
1304+
1305+
```
1306+
Current (v1):
1307+
┌──────────────────────────────────────┐
1308+
│ Record (all data in page) │
1309+
├──────────────────────────────────────┤
1310+
│ [UserId: 4][Name: 20][Biography: ???]│ ← Doesn't fit!
1311+
└──────────────────────────────────────┘
1312+
1313+
Future (LOB Overflow):
1314+
┌──────────────────────────────────────┐
1315+
│ Record (in page) │
1316+
├──────────────────────────────────────┤
1317+
│ [UserId: 4][Name: 20][BioRef: 32] │ ← Pointer to LOB
1318+
└──────────────────────────────────────┘
1319+
1320+
[External Storage]
1321+
┌─────────────────────────┐
1322+
│ LOB_12345.dat (1MB) │
1323+
│ [Biography data: full] │
1324+
└─────────────────────────┘
1325+
```
1326+
1327+
### Implementation Requirements
1328+
1329+
This would require:
1330+
1331+
1. **LOB Storage Layer**
1332+
- Separate file or directory for large objects
1333+
- Naming scheme: `LOB_<hash>.dat`
1334+
- Reference counting (cleanup when record deleted)
1335+
1336+
2. **Automatic Threshold Detection**
1337+
```csharp
1338+
// Configuration:
1339+
DatabaseConfig.LobThresholdBytes = 2048; // Columns > 2KB → LOB
1340+
```
1341+
1342+
3. **Transparent Dereferencing**
1343+
- On read: Automatically fetch LOB data
1344+
- On write: Check if value exceeds threshold
1345+
- On delete: Clean up orphaned LOBs
1346+
1347+
4. **Garbage Collection**
1348+
- Track which LOBs are referenced
1349+
- Periodically clean up orphaned files
1350+
- Similar to VACUUM in PostgreSQL
1351+
1352+
5. **Serialization Changes**
1353+
```csharp
1354+
// Type markers would need:
1355+
LobReference = 9, // [LOB_ID: string pointer]
1356+
1357+
// On serialization:
1358+
if (strBytes.Length > LOB_THRESHOLD)
1359+
{
1360+
// Create LOB file
1361+
var lobId = CreateLobFile(strBytes);
1362+
// Store reference instead
1363+
WriteValue(buffer, lobId, BinaryTypeMarker.LobReference);
1364+
}
1365+
```
1366+
1367+
### Comparison with Current Workarounds
1368+
1369+
| Approach | Pros | Cons |
1370+
|----------|------|------|
1371+
| **Error (current)** | Simple architecture | User must handle large data |
1372+
| **Increase page size** | No code changes needed | Wastes space for small records |
1373+
| **External file refs** | Works today | Manual management |
1374+
| **LOB Overflow (future)** | ✅ Transparent, automatic | Complex implementation |
1375+
1376+
### Why This Isn't Trivial
1377+
1378+
```csharp
1379+
// Challenges:
1380+
1381+
// 1. Reference counting
1382+
// - Track which LOBs are in use
1383+
// - Handle cascade deletes
1384+
// - Update when records are modified
1385+
1386+
// 2. Transaction consistency
1387+
// - LOB creation happens AFTER record write
1388+
// - Need to handle crashes between two writes
1389+
// - Requires WAL entries for LOB operations
1390+
1391+
// 3. Performance implications
1392+
// - Reading a record now requires potential I/O for LOB
1393+
// - Cache LOB data? How much memory?
1394+
// - Compression? Encryption?
1395+
1396+
// 4. Backward compatibility
1397+
// - Old records have no LOBs
1398+
// - New records might have LOBs
1399+
// - Format version must change
1400+
```
1401+
1402+
### Proposal: Phase 5 Feature
1403+
1404+
This would be perfect for **Phase 5** after current performance work is complete:
1405+
1406+
**Goals:**
1407+
- ✅ Support arbitrary-sized strings transparently
1408+
- ✅ Maintain page size constraints
1409+
- ✅ Zero API changes for users
1410+
- ✅ Automatic compression of LOB data
1411+
1412+
**Tasks:**
1413+
1. Design LOB file format
1414+
2. Implement LOB storage layer
1415+
3. Update BinaryRowSerializer with threshold logic
1416+
4. Add reference tracking to Block Registry
1417+
5. Implement garbage collection
1418+
6. Add tests for edge cases (crashes during LOB creation, etc.)
1419+
1420+
### Temporary Workaround (Today)
1421+
1422+
Until LOB support is added, use this pattern:
1423+
1424+
```csharp
1425+
// Create a "LOB Reference" table
1426+
var lobTable = db.CreateTable("LOBData");
1427+
1428+
// Store large data separately
1429+
var largeContent = new string('X', 1_000_000);
1430+
var lobEntry = new Dictionary<string, object>
1431+
{
1432+
["LobId"] = Guid.NewGuid().ToString(),
1433+
["Owner"] = "Users",
1434+
["OwnerKey"] = 42,
1435+
["Data"] = largeContent,
1436+
};
1437+
lobTable.Insert(lobEntry);
1438+
1439+
// Store reference in main record
1440+
var userRecord = new Dictionary<string, object>
1441+
{
1442+
["UserId"] = 42,
1443+
["Name"] = "John Doe",
1444+
["BiographyLobId"] = lobEntry["LobId"], // Reference only
1445+
};
1446+
usersTable.Insert(userRecord);
1447+
1448+
// On read:
1449+
var user = usersTable.FindById(42);
1450+
var lobId = user["BiographyLobId"];
1451+
var biography = lobTable.FindByLobId(lobId)["Data"];
1452+
```
1453+
1454+
---
1455+
1456+
1457+
1458+
1459+
1460+
1461+
1462+
1463+
1464+
1465+
1466+
1467+

0 commit comments

Comments
 (0)