Skip to content

Commit 4d18074

Browse files
author
MPCoreDeveloper
committed
docs: CRITICAL FIX - Correct string size constraints based on page limits
1 parent 5685e7d commit 4d18074

File tree

2 files changed

+97
-63
lines changed

2 files changed

+97
-63
lines changed

docs/serialization/SERIALIZATION_AND_STORAGE_GUIDE.md

Lines changed: 79 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -369,15 +369,24 @@ Example: "日本" (2 characters = 6 bytes UTF-8)
369369
└──────────────────┴────────────────────────────────────────────────────┘
370370
```
371371

372-
### 3. **Size Constraints in SharpCoreDB**
372+
### ⚠️ CRITICAL: Actual Size Constraints in SharpCoreDB
373+
374+
**CORRECTION:** The actual constraint is NOT "2GB per string" but rather **"record must fit in one page"**.
373375

374376
| Constraint | Limit | Why |
375377
|-----------|-------|-----|
376-
| **Max string length** | 2,147,483,647 bytes (2GB per string) | Limited by int32 length field |
377-
| **Max record size** | Limited by page size (4KB default, can be 8KB, 16KB) | Record must fit in one block |
378-
| **Max block size** | Theoretically unlimited (file size dependent) | Blocks can span multiple pages |
379-
| **Max column count** | 2,147,483,647 columns | Limited by int32 column count |
380-
| **File size** | Limited by filesystem | ext4: 16TB, NTFS: 8EB (technically) |
378+
| **Max record size** | ~4056 bytes (default 4KB page) | Record must fit in one page (4096 - 40 header bytes) |
379+
| **Max page size** | Configurable 4KB-64KB | Can be increased at database creation |
380+
| **Max column count** | 2,147,483,647 | Limited by int32 column count in serialization |
381+
| **Max file size** | Limited by filesystem | ext4: 16TB, NTFS: 8EB (technically) |
382+
| **Single string in record** | ~4000-8000 bytes practical | Dependent on page size and other columns |
383+
384+
**WARNING:** If you have a record (including all columns) that exceeds the page size, you'll get an error:
385+
```csharp
386+
// This will fail if total serialized size > PageSize:
387+
if (recordData.Length > MAX_RECORD_SIZE) // MAX_RECORD_SIZE ≈ 4056 bytes
388+
return Error("Record too large for page");
389+
```
381390

382391
### 4. **Unicode Support**
383392

@@ -402,56 +411,95 @@ foreach (var str in testStrings)
402411
}
403412
```
404413

405-
### ⚠️ What About Record Size Limits?
414+
### ⚠️ What About Large Strings?
406415

407-
**Records CANNOT be larger than a block** (page size).
416+
**You CANNOT store arbitrarily large strings in a single record.**
408417

409418
```csharp
410-
// Example: Default 4KB page size
419+
// Example: 4KB page size (DEFAULT_PAGE_SIZE = 4096)
411420
412421
var row = new Dictionary<string, object>
413422
{
414-
["Name"] = new string('A', 3000), // ✅ Fits
415-
["Data"] = new string('B', 4000), // ❌ Might not fit!
423+
["UserId"] = 1,
424+
["Name"] = "John Doe",
425+
["Biography"] = new string('X', 4000), // 4000 bytes!
416426
};
417427

418-
// Why?
419-
// Total record:
420-
// - ColumnCount (4) + NameLength (4) + "Name" (4) + TypeMarker (1)
421-
// + StringLength (4) + 3000 bytes = ~3021 bytes ✅
422-
// - NameLength (4) + "Data" (4) + TypeMarker (1)
423-
// + StringLength (4) + 4000 bytes = ~4013 bytes
424-
// Total: ~7034 bytes > 4096 bytes ❌ ERROR
428+
// Serialization:
429+
// - ColumnCount (4 bytes)
430+
// - Column 1: NameLen(4) + "UserId"(6) + Type(1) + Value(4) = 15 bytes
431+
// - Column 2: NameLen(4) + "Name"(4) + Type(1) + StrLen(4) + "John Doe"(8) = 21 bytes
432+
// - Column 3: NameLen(4) + "Biography"(9) + Type(1) + StrLen(4) + 4000 bytes = 4018 bytes
433+
// TOTAL: 4 + 15 + 21 + 4018 = 4058 bytes
434+
//
435+
// Result: 4058 > 4056 (MAX_PAGE_DATA_SIZE)
436+
// ❌ ERROR! Record too large for page!
425437
```
426438

427-
**Solution:** Increase page size
439+
**What are your options?**
428440

441+
#### Option 1: Increase Page Size
429442
```csharp
443+
// Create database with larger pages
430444
var options = new DatabaseOptions
431445
{
432-
PageSize = 8192, // 8KB pages → supports larger records
446+
PageSize = 8192, // 8 KB pages (8192 - 40 = 8152 bytes data)
433447
CreateImmediately = true,
434448
};
435449

436450
var provider = SingleFileStorageProvider.Open("mydb.scdb", options);
437-
```
438451

439-
### 5. **No Free Space Waste**
452+
// Now record of 4058 bytes fits in 8KB page ✅
453+
```
440454

455+
#### Option 2: Use BLOB Storage for Large Data
441456
```csharp
442-
// Example: Table with 1000 rows
457+
// Don't store huge strings as regular columns
458+
// Instead, use a reference/ID
443459
444-
// Scenario 1: All short strings (100 bytes each)
445-
// File size: 1000 × (4 + 8 + 100) = ~112 KB
460+
var row = new Dictionary<string, object>
461+
{
462+
["UserId"] = 1,
463+
["Name"] = "John Doe",
464+
["BioFileId"] = "bio_12345", // Reference to external blob
465+
};
446466

447-
// Scenario 2: All long strings (10 MB each)
448-
// File size: 1000 × (4 + 8 + 10,485,760) ≈ 10.5 GB
467+
// Then separately store large file:
468+
var largeFile = File.ReadAllBytes("large_biography.txt"); // 10 MB
469+
blobStorage.WriteLargeBlob("bio_12345", largeFile);
449470

450-
// Scenario 3: Mixed strings
451-
// File size = sum of all actual record sizes (no padding)
471+
// On read:
472+
string bioFileId = (string)row["BioFileId"];
473+
byte[] largeBio = blobStorage.ReadLargeBlob(bioFileId);
452474
```
453475

454-
**No fixed overhead per record!** Only the bytes you use.
476+
#### Option 3: Normalize Your Schema
477+
```csharp
478+
// Split into multiple records instead of one large record
479+
480+
// INSTEAD OF:
481+
var row = new Dictionary<string, object>
482+
{
483+
["UserId"] = 1,
484+
["Name"] = "John Doe",
485+
["Biography"] = new string('X', 10000), // ❌ Too large!
486+
};
487+
488+
// DO THIS:
489+
var userRecord = new Dictionary<string, object>
490+
{
491+
["UserId"] = 1,
492+
["Name"] = "John Doe",
493+
};
494+
495+
var bioRecord = new Dictionary<string, object>
496+
{
497+
["UserId"] = 1,
498+
["BioContent"] = "Lorem ipsum...", // Smaller chunks
499+
};
500+
501+
// Store in separate table or with separate keys
502+
```
455503

456504
---
457505

@@ -799,13 +847,11 @@ var row = new Dictionary<string, object>
799847
// Create database with larger pages
800848
var options = new DatabaseOptions
801849
{
802-
PageSize = 8192, // 8 KB pages → can hold bigger records
850+
PageSize = 8192, // 8 KB pages → supports larger records
803851
CreateImmediately = true,
804852
};
805853

806854
var provider = SingleFileStorageProvider.Open("mydb.scdb", options);
807-
808-
// Now record of 4158 bytes fits in 8192-byte page ✅
809855
```
810856

811857
#### Solution 2: Use BLOB Storage for Large Strings

docs/serialization/SERIALIZATION_FAQ.md

Lines changed: 18 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -41,42 +41,30 @@ Example: "John Doe" (8 characters, 8 bytes in UTF-8)
4141

4242
---
4343

44-
### Q2: Do I need lots of free space in my data files?
44+
### Q2: How big can strings be?
4545

46-
**A: No. In fact - variable-length strings **save** space!**
46+
**A:** Limited by the **page size**, not theoretically unlimited:
4747

48-
Comparison:
48+
**Default (4KB page):**
49+
- Page data capacity: 4056 bytes (4096 - 40 header)
50+
- Minus serialization overhead for other columns
51+
- **Practical limit: 4000-4050 bytes per single string**
4952

50-
```
51-
Fixed-length approach (WASTE):
52-
┌─────────────────────────────────────────┐
53-
│ Name (255 bytes, fixed) │
54-
│ ├─ "John" (4 bytes) │
55-
│ └─ Padding (251 bytes of zeros) ❌ │
56-
└─────────────────────────────────────────┘
57-
Total: 255 bytes per record
58-
59-
SharpCoreDB variable-length (EFFICIENT):
60-
┌──────┬──────┐
61-
│ 04 │ John │
62-
├──────┴──────┤
63-
│ 4 + 4 = 8 bytes ✅
64-
└──────────────┘
65-
Total: 8 bytes per record
66-
67-
Savings: 255 - 8 = 247 bytes per record!
68-
```
53+
**For larger strings:**
54+
- ✅ Increase page size: Use 8KB, 16KB, or 32KB pages
55+
- ✅ Use BLOB storage: For data > page size
56+
- ✅ Normalize schema: Split into multiple records
6957

70-
**Real-world example:**
71-
```
72-
1,000,000 records with mostly short names:
58+
**Example:**
59+
```csharp
60+
// Default 4KB page:
61+
// ❌ Cannot fit 10MB string in one record!
7362
74-
Fixed-length (255 bytes):
75-
├─ 1,000,000 × 255 = 255 MB per name field
63+
// Solution: Either increase page size
64+
var options = new DatabaseOptions { PageSize = 16384 }; // 16KB
7665
77-
Variable-length (avg 20 bytes):
78-
├─ 1,000,000 × 20 = 20 MB per name field
79-
└─ Savings: 235 MB (92% reduction!) ✅
66+
// OR use BLOB storage
67+
blobStorage.WriteLargeBlob("doc_id", largeData);
8068
```
8169

8270
---

0 commit comments

Comments
 (0)