@@ -13,8 +13,9 @@ This document describes in detail how SharpCoreDB serializes, stores, and manage
13135 . [ Free Space Management] ( #free-space-management )
14146 . [ Block Registry] ( #block-registry )
15157 . [ Record & Column Boundaries] ( #record--column-boundaries )
16- 8 . [ Performance Considerations] ( #performance-considerations )
17- 9 . [ FAQ] ( #faq )
16+ 8 . [ Record Sizing & Page Boundaries] ( #record-sizing--page-boundaries )
17+ 9 . [ Performance Considerations] ( #performance-considerations )
18+ 10 . [ FAQ] ( #faq )
1819
1920---
2021
@@ -692,8 +693,6 @@ Step 5: Register in Block Registry
692693** Columns don't have fixed boundaries!** They are self-describing:
693694
694695```
695- Record layout (no fixed column offsets):
696-
697696Record in memory:
698697┌──────────────────────────────────────┐
699698│ [ColumnCount: 4] │ ← Always at offset 0
@@ -749,6 +748,233 @@ public static Dictionary<string, object> Deserialize(ReadOnlySpan<byte> data)
749748
750749---
751750
751+ ## 📄 Record Sizing & Page Boundaries
752+
753+ ### Critical Constraint: Records Must Fit in a Single Page
754+
755+ ** Important:** A record CANNOT be split across multiple pages.
756+
757+ #### Why?
758+
759+ ``` csharp
760+ // Records are atomic units stored in blocks
761+ BlockEntry entry = new BlockEntry
762+ {
763+ BlockName = " Users_Row_001" ,
764+ Offset = 1048576 , // Start of page 256
765+ Length = 3950 , // Entire record size (< 4096)
766+ Checksum = [.. .],
767+ // ...
768+ };
769+
770+ // The Block Registry stores:
771+ // - Start offset (byte position)
772+ // - Total length (entire record size)
773+ // - This makes lookups O(1) and atomic
774+ ```
775+
776+ #### What Happens If a Record Would Exceed Page Size?
777+
778+ ``` csharp
779+ // Example: 4KB page size (default)
780+
781+ var row = new Dictionary <string , object >
782+ {
783+ [" UserId" ] = 1 ,
784+ [" Biography" ] = new string ('X' , 4100 ), // 4100 bytes!
785+ };
786+
787+ // Serialization:
788+ // ColumnCount (4) + "UserId" metadata (20) + 4 bytes (int32)
789+ // + "Biography" metadata (30) + 4100 bytes (string data)
790+ // ≈ 4 + 20 + 4 + 30 + 4100 = 4158 bytes
791+ //
792+ // Result: 4158 > 4096 (page size)
793+ // ❌ ERROR! Record too large for page!
794+ ```
795+
796+ #### Solution 1: Increase Page Size
797+
798+ ``` csharp
799+ // Create database with larger pages
800+ var options = new DatabaseOptions
801+ {
802+ PageSize = 8192 , // 8 KB pages → can hold bigger records
803+ CreateImmediately = true ,
804+ };
805+
806+ var provider = SingleFileStorageProvider .Open (" mydb.scdb" , options );
807+
808+ // Now record of 4158 bytes fits in 8192-byte page ✅
809+ ```
810+
811+ #### Solution 2: Use BLOB Storage for Large Strings
812+
813+ ``` csharp
814+ // Don't store huge strings as columns
815+ // Instead, use a reference/ID
816+
817+ var row = new Dictionary <string , object >
818+ {
819+ [" UserId" ] = 1 ,
820+ [" Name" ] = " John Doe" ,
821+ [" BioFileId" ] = " bio_12345" , // Reference to external BLOB
822+ };
823+
824+ // Then separately store large file:
825+ var largeFile = new byte [10_ 000_ 000 ]; // 10 MB
826+ blobStorage .WriteLargeBlob (" bio_12345" , largeFile );
827+ ```
828+
829+ ### How Pages Are Allocated
830+
831+ SharpCoreDB allocates pages as ** complete units** . You cannot split data across page boundaries:
832+
833+ ```
834+ File Layout (4KB page size):
835+
836+ Page 0 (0-4095): [Header: 512 bytes][unused: 3584 bytes]
837+ Page 1 (4096-8191): [Block Registry data: 2000 bytes][unused: 2096]
838+ Page 2 (8192-12287): [FSM data: 1500 bytes][unused: 2596]
839+ Page 3 (12288-16383): [Users_Row_001: 50 bytes][unused: 4046] ← Wasted space!
840+ Page 4 (16384-20479): [Users_Row_002: 100 bytes][unused: 3996] ← Wasted space!
841+ ...
842+
843+ Even though Row_001 is only 50 bytes, it occupies an entire 4096-byte page.
844+ ```
845+
846+ ** Why?** Because the Block Registry tracks:
847+ ``` csharp
848+ // Block boundaries are PAGE-aligned
849+ public ulong Offset ; // Always a multiple of PageSize (4096)
850+ public ulong Length ; // Actual data size (can be < PageSize)
851+
852+ // Example:
853+ // Offset = 12288 (Page 3 start, multiple of 4096)
854+ // Length = 50 (actual record bytes)
855+ ```
856+
857+ ### String Splitting: The Reality
858+
859+ If you have a long string that would exceed the page:
860+
861+ ``` csharp
862+ // BEFORE serialization - THIS DOESN'T HAPPEN
863+ // The entire record (including all strings) is serialized to binary
864+ byte [] binary = Serialize (row ); // ← Complete binary in memory
865+ int recordSize = binary .Length ;
866+
867+ // Check if record fits in a page
868+ if (recordSize > PageSize )
869+ {
870+ throw new InvalidOperationException (
871+ $" Record too large ({recordSize } bytes) for page size ({PageSize } bytes)" );
872+ }
873+
874+ // If it fits, allocate ONE page and write entire record
875+ ulong pageOffset = FSM .AllocatePages (1 ); // ← Allocates 1 full page
876+ provider .WriteBytes (pageOffset , binary ); // ← Write entire record at once
877+ ```
878+
879+ ### Example: Long String at End of Page
880+
881+ ** Scenario:** You have a string that's close to the page boundary
882+
883+ ```
884+ Page Layout (4KB = 4096 bytes):
885+
886+ Offset 0-3: [ColumnCount: 4]
887+ Offset 4-20: [Column 1 metadata + value]
888+ Offset 21-60: [Column 2 metadata + value]
889+ Offset 61-3200: [Column 3: Short string]
890+ Offset 3201-4090: [Column 4: Long string (890 bytes)]
891+ Offset 4091-4095: [unused: 5 bytes]
892+ ↑ NO SPLITTING NEEDED
893+ Record fits entirely (4091 bytes < 4096)
894+ ```
895+
896+ ** What if record was 4097 bytes?**
897+ ```
898+ ❌ ERROR! Record doesn't fit in page.
899+ Must increase PageSize or reduce record size.
900+ ```
901+
902+ ### The Key Insight: No Padding, No Splitting
903+
904+ ``` csharp
905+ // 1. Records are serialized completely in memory
906+ byte [] recordBinary = Serialize (row );
907+ // recordBinary could be 50 bytes or 3000 bytes
908+
909+ // 2. FSM allocates ONE page (regardless of record size)
910+ ulong pageStart = FSM .AllocatePages (1 );
911+ // pageStart = multiple of PageSize (e.g., 4096, 8192, 12288, ...)
912+
913+ // 3. Write record to that page
914+ provider .WriteBytes (pageStart , recordBinary );
915+ // Writes 50 bytes OR 3000 bytes
916+ // NO PADDING to reach 4096 bytes
917+ // NO SPLITTING across pages
918+
919+ // 4. Block Registry tracks exact length
920+ registry [recordName ] = new BlockEntry
921+ {
922+ Offset = pageStart ,
923+ Length = recordBinary .Length , // ← EXACT size, not padded
924+ };
925+ ```
926+
927+ ### Performance Implication
928+
929+ ``` csharp
930+ // With variable-length records:
931+ Page 1 : 50 - byte record → 4046 bytes wasted space per page
932+ Page 2: 100-byte record → 3996 bytes wasted space per page
933+ Page 3: 3000-byte record → 1096 bytes wasted space per page
934+ Page 4: 30-byte record → 4066 bytes wasted space per page
935+ ```
936+
937+ **This is normal and acceptable because:**
938+ 1. ✅ FSM tracks free space (can reuse partially-filled pages for small records)
939+ 2. ✅ Compression not needed (data is already binary, not JSON overhead)
940+ 3. ✅ Simpler architecture (no split-record complexity )
941+ 4. ✅ Atomic writes (record written once, completely)
942+
943+ ### How FSM Reuses Wasted Space
944+
945+ ```csharp
946+ // FSM doesn't care about wasted space within a page
947+ // It tracks FREE PAGES, not free bytes
948+
949+ FSM State:
950+ ├─ Page 0: Allocated (Header )
951+ ├─ Page 1: Allocated (Registry )
952+ ├─ Page 2: Allocated (FSM )
953+ ├─ Page 3: Allocated (50 - byte record ) ← Still counts as ALLOCATED
954+ ├─ Page 4: Allocated (100 - byte record ) ← Still counts as ALLOCATED
955+ ├─ Page 5: FREE ← Can reuse this
956+ └─ ...
957+
958+ // When inserting a small record (30 bytes):
959+ // Option 1: Reuse Page 3 (already allocated, has room)
960+ // Option 2: Allocate new Page 5
961+
962+ // SharpCoreDB behavior:
963+ // - Phase 1: Always allocate new pages (simpler)
964+ // - Phase 3: Could implement "sub-page allocation" (future optimization)
965+ ```
966+
967+ ### Summary: Page Boundaries & Strings
968+
969+ | Situation | What Happens | Result |
970+ |-----------|--------------|--------|
971+ | Small record (< page size ) | Allocates 1 page , writes record , registers block | ✅ Works |
972+ | Large record (> page size ) | Throws error during serialization | ❌ Error |
973+ | String at page end | String included in serialized record (no split ) | ✅ Stays together |
974+ | Multiple pages needed | Not supported ; use larger page size | ⚠️ Design limit |
975+
976+ -- -
977+
752978## ⚡ Performance Considerations
753979
754980### Zero Allocation Principles
0 commit comments