Skip to content

Commit 5685e7d

Browse files
author
MPCoreDeveloper
committed
docs: Add record sizing and page boundaries section
1 parent c6c04f1 commit 5685e7d

File tree

1 file changed

+230
-4
lines changed

1 file changed

+230
-4
lines changed

docs/serialization/SERIALIZATION_AND_STORAGE_GUIDE.md

Lines changed: 230 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,9 @@ This document describes in detail how SharpCoreDB serializes, stores, and manage
1313
5. [Free Space Management](#free-space-management)
1414
6. [Block Registry](#block-registry)
1515
7. [Record & Column Boundaries](#record--column-boundaries)
16-
8. [Performance Considerations](#performance-considerations)
17-
9. [FAQ](#faq)
16+
8. [Record Sizing & Page Boundaries](#record-sizing--page-boundaries)
17+
9. [Performance Considerations](#performance-considerations)
18+
10. [FAQ](#faq)
1819

1920
---
2021

@@ -692,8 +693,6 @@ Step 5: Register in Block Registry
692693
**Columns don't have fixed boundaries!** They are self-describing:
693694

694695
```
695-
Record layout (no fixed column offsets):
696-
697696
Record in memory:
698697
┌──────────────────────────────────────┐
699698
│ [ColumnCount: 4] │ ← Always at offset 0
@@ -749,6 +748,233 @@ public static Dictionary<string, object> Deserialize(ReadOnlySpan<byte> data)
749748

750749
---
751750

751+
## 📄 Record Sizing & Page Boundaries
752+
753+
### Critical Constraint: Records Must Fit in a Single Page
754+
755+
**Important:** A record CANNOT be split across multiple pages.
756+
757+
#### Why?
758+
759+
```csharp
760+
// Records are atomic units stored in blocks
761+
BlockEntry entry = new BlockEntry
762+
{
763+
BlockName = "Users_Row_001",
764+
Offset = 1048576, // Start of page 256
765+
Length = 3950, // Entire record size (< 4096)
766+
Checksum = [...],
767+
// ...
768+
};
769+
770+
// The Block Registry stores:
771+
// - Start offset (byte position)
772+
// - Total length (entire record size)
773+
// - This makes lookups O(1) and atomic
774+
```
775+
776+
#### What Happens If a Record Would Exceed Page Size?
777+
778+
```csharp
779+
// Example: 4KB page size (default)
780+
781+
var row = new Dictionary<string, object>
782+
{
783+
["UserId"] = 1,
784+
["Biography"] = new string('X', 4100), // 4100 bytes!
785+
};
786+
787+
// Serialization:
788+
// ColumnCount (4) + "UserId" metadata (20) + 4 bytes (int32)
789+
// + "Biography" metadata (30) + 4100 bytes (string data)
790+
// ≈ 4 + 20 + 4 + 30 + 4100 = 4158 bytes
791+
//
792+
// Result: 4158 > 4096 (page size)
793+
// ❌ ERROR! Record too large for page!
794+
```
795+
796+
#### Solution 1: Increase Page Size
797+
798+
```csharp
799+
// Create database with larger pages
800+
var options = new DatabaseOptions
801+
{
802+
PageSize = 8192, // 8 KB pages → can hold bigger records
803+
CreateImmediately = true,
804+
};
805+
806+
var provider = SingleFileStorageProvider.Open("mydb.scdb", options);
807+
808+
// Now record of 4158 bytes fits in 8192-byte page ✅
809+
```
810+
811+
#### Solution 2: Use BLOB Storage for Large Strings
812+
813+
```csharp
814+
// Don't store huge strings as columns
815+
// Instead, use a reference/ID
816+
817+
var row = new Dictionary<string, object>
818+
{
819+
["UserId"] = 1,
820+
["Name"] = "John Doe",
821+
["BioFileId"] = "bio_12345", // Reference to external BLOB
822+
};
823+
824+
// Then separately store large file:
825+
var largeFile = new byte[10_000_000]; // 10 MB
826+
blobStorage.WriteLargeBlob("bio_12345", largeFile);
827+
```
828+
829+
### How Pages Are Allocated
830+
831+
SharpCoreDB allocates pages as **complete units**. You cannot split data across page boundaries:
832+
833+
```
834+
File Layout (4KB page size):
835+
836+
Page 0 (0-4095): [Header: 512 bytes][unused: 3584 bytes]
837+
Page 1 (4096-8191): [Block Registry data: 2000 bytes][unused: 2096]
838+
Page 2 (8192-12287): [FSM data: 1500 bytes][unused: 2596]
839+
Page 3 (12288-16383): [Users_Row_001: 50 bytes][unused: 4046] ← Wasted space!
840+
Page 4 (16384-20479): [Users_Row_002: 100 bytes][unused: 3996] ← Wasted space!
841+
...
842+
843+
Even though Row_001 is only 50 bytes, it occupies an entire 4096-byte page.
844+
```
845+
846+
**Why?** Because the Block Registry tracks:
847+
```csharp
848+
// Block boundaries are PAGE-aligned
849+
public ulong Offset; // Always a multiple of PageSize (4096)
850+
public ulong Length; // Actual data size (can be < PageSize)
851+
852+
// Example:
853+
// Offset = 12288 (Page 3 start, multiple of 4096)
854+
// Length = 50 (actual record bytes)
855+
```
856+
857+
### String Splitting: The Reality
858+
859+
If you have a long string that would exceed the page:
860+
861+
```csharp
862+
// BEFORE serialization - THIS DOESN'T HAPPEN
863+
// The entire record (including all strings) is serialized to binary
864+
byte[] binary = Serialize(row); // ← Complete binary in memory
865+
int recordSize = binary.Length;
866+
867+
// Check if record fits in a page
868+
if (recordSize > PageSize)
869+
{
870+
throw new InvalidOperationException(
871+
$"Record too large ({recordSize} bytes) for page size ({PageSize} bytes)");
872+
}
873+
874+
// If it fits, allocate ONE page and write entire record
875+
ulong pageOffset = FSM.AllocatePages(1); // ← Allocates 1 full page
876+
provider.WriteBytes(pageOffset, binary); // ← Write entire record at once
877+
```
878+
879+
### Example: Long String at End of Page
880+
881+
**Scenario:** You have a string that's close to the page boundary
882+
883+
```
884+
Page Layout (4KB = 4096 bytes):
885+
886+
Offset 0-3: [ColumnCount: 4]
887+
Offset 4-20: [Column 1 metadata + value]
888+
Offset 21-60: [Column 2 metadata + value]
889+
Offset 61-3200: [Column 3: Short string]
890+
Offset 3201-4090: [Column 4: Long string (890 bytes)]
891+
Offset 4091-4095: [unused: 5 bytes]
892+
↑ NO SPLITTING NEEDED
893+
Record fits entirely (4091 bytes < 4096)
894+
```
895+
896+
**What if record was 4097 bytes?**
897+
```
898+
❌ ERROR! Record doesn't fit in page.
899+
Must increase PageSize or reduce record size.
900+
```
901+
902+
### The Key Insight: No Padding, No Splitting
903+
904+
```csharp
905+
// 1. Records are serialized completely in memory
906+
byte[] recordBinary = Serialize(row);
907+
// recordBinary could be 50 bytes or 3000 bytes
908+
909+
// 2. FSM allocates ONE page (regardless of record size)
910+
ulong pageStart = FSM.AllocatePages(1);
911+
// pageStart = multiple of PageSize (e.g., 4096, 8192, 12288, ...)
912+
913+
// 3. Write record to that page
914+
provider.WriteBytes(pageStart, recordBinary);
915+
// Writes 50 bytes OR 3000 bytes
916+
// NO PADDING to reach 4096 bytes
917+
// NO SPLITTING across pages
918+
919+
// 4. Block Registry tracks exact length
920+
registry[recordName] = new BlockEntry
921+
{
922+
Offset = pageStart,
923+
Length = recordBinary.Length, // ← EXACT size, not padded
924+
};
925+
```
926+
927+
### Performance Implication
928+
929+
```csharp
930+
// With variable-length records:
931+
Page 1: 50-byte record → 4046 bytes wasted space per page
932+
Page 2: 100-byte record → 3996 bytes wasted space per page
933+
Page 3: 3000-byte record → 1096 bytes wasted space per page
934+
Page 4: 30-byte record → 4066 bytes wasted space per page
935+
```
936+
937+
**This is normal and acceptable because:**
938+
1. ✅ FSM tracks free space (can reuse partially-filled pages for small records)
939+
2. ✅ Compression not needed (data is already binary, not JSON overhead)
940+
3. ✅ Simpler architecture (no split-record complexity)
941+
4. ✅ Atomic writes (record written once, completely)
942+
943+
### How FSM Reuses Wasted Space
944+
945+
```csharp
946+
// FSM doesn't care about wasted space within a page
947+
// It tracks FREE PAGES, not free bytes
948+
949+
FSM State:
950+
├─ Page 0: Allocated (Header)
951+
├─ Page 1: Allocated (Registry)
952+
├─ Page 2: Allocated (FSM)
953+
├─ Page 3: Allocated (50-byte record) ← Still counts as ALLOCATED
954+
├─ Page 4: Allocated (100-byte record) ← Still counts as ALLOCATED
955+
├─ Page 5: FREECan reuse this
956+
└─ ...
957+
958+
// When inserting a small record (30 bytes):
959+
// Option 1: Reuse Page 3 (already allocated, has room)
960+
// Option 2: Allocate new Page 5
961+
962+
// SharpCoreDB behavior:
963+
// - Phase 1: Always allocate new pages (simpler)
964+
// - Phase 3: Could implement "sub-page allocation" (future optimization)
965+
```
966+
967+
### Summary: Page Boundaries & Strings
968+
969+
| Situation | What Happens | Result |
970+
|-----------|--------------|--------|
971+
| Small record (< page size) | Allocates 1 page, writes record, registers block | ✅ Works |
972+
| Large record (> page size) | Throws error during serialization | ❌ Error |
973+
| String at page end | String included in serialized record (no split) | ✅ Stays together |
974+
| Multiple pages needed | Not supported; use larger page size | ⚠️ Design limit |
975+
976+
---
977+
752978
## ⚡ Performance Considerations
753979

754980
### Zero Allocation Principles

0 commit comments

Comments
 (0)