Skip to content

Commit c6c04f1

Browse files
author
MPCoreDeveloper
committed
docs: Translate all documentation to English
1 parent 4ac7ed8 commit c6c04f1

File tree

3 files changed

+91
-93
lines changed

3 files changed

+91
-93
lines changed

docs/DOCUMENTATION_CROSS_REFERENCE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ Allocation Strategy:
157157

158158
### How FSM Works
159159

160-
The **Free Space Map (FSM)** behaves vrije pagina's. Dit is een 2-level bitmap:
160+
The **Free Space Map (FSM)** tracks free pages. This is a 2-level bitmap:
161161

162162
[Detailed explanation with code examples]
163163

docs/serialization/SERIALIZATION_AND_STORAGE_GUIDE.md

Lines changed: 66 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,14 @@
11
# SharpCoreDB Serialization & Storage Format Guide
22

3-
> **Dutch Title:** SharpCoreDB Serialisatie en Opslag Format Gids
4-
5-
Dit document beschrijft in detail hoe SharpCoreDB records serialiseert, opslaat, en beheert in databestanden. Het antwoordt op alle vragen over string-constraints, free space management, en record/column boundaries.
3+
This document describes in detail how SharpCoreDB serializes, stores, and manages records in data files. It answers all questions about string constraints, free space management, and record/column boundaries.
64

75
---
86

9-
## 📋 Inhoudsopgave
7+
## 📋 Table of Contents
108

11-
1. [Overzicht](#overzicht)
9+
1. [Overview](#overview)
1210
2. [File Format (.scdb)](#file-format-scdb)
13-
3. [Record Serialisatie](#record-serialisatie)
11+
3. [Record Serialization](#record-serialization)
1412
4. [String Handling & Size Constraints](#string-handling--size-constraints)
1513
5. [Free Space Management](#free-space-management)
1614
6. [Block Registry](#block-registry)
@@ -20,19 +18,19 @@ Dit document beschrijft in detail hoe SharpCoreDB records serialiseert, opslaat,
2018

2119
---
2220

23-
## 🎯 Overzicht
21+
## 🎯 Overview
2422

25-
SharpCoreDB gebruikt een **single-file binary format** (`.scdb`) voor duurzame opslag. Het systeem is gebaseerd op deze principes:
23+
SharpCoreDB uses a **single-file binary format** (`.scdb`) for persistent storage. The system is based on these principles:
2624

2725
| Aspect | Details |
2826
|--------|---------|
29-
| **Format** | Binary (niet JSON, niet SQL) - 3x sneller dan JSON |
27+
| **Format** | Binary (not JSON, not SQL) - 3x faster than JSON |
3028
| **Layout** | Fixed header + variable regions (FSM, WAL, Registry, Tables) |
31-
| **Encoding** | UTF-8 voor strings; Little-Endian voor integers |
29+
| **Encoding** | UTF-8 for strings; Little-Endian for integers |
3230
| **String Storage** | Variable-length; prefixed with 4-byte length field |
33-
| **No Fixed-Length Requirement** | Strings kunnen willekeurig lang zijn (beperkt door beschikbare schijfruimte) |
31+
| **No Fixed-Length Requirement** | Strings can be arbitrarily long (limited by available disk space) |
3432
| **Encryption** | Optional AES-256-GCM |
35-
| **Compression** | Niet geïmplementeerd (reserved in header) |
33+
| **Compression** | Not implemented (reserved in header) |
3634

3735
---
3836

@@ -127,28 +125,28 @@ public struct ScdbFileHeader
127125

128126
---
129127

130-
## 🔄 Record Serialisatie
128+
## 🔄 Record Serialization
131129

132130
### Binary Format Specification
133131

134-
Records worden opgeslagen in een **self-describing binary format**. Dit betekent dat type-informatie **ingebedded** is in de data zelf.
132+
Records are stored in a **self-describing binary format**. This means type information is **embedded** in the data itself.
135133

136134
#### Record Layout
137135

138136
```
139-
┌──────────────────────────────────────────────┐
140-
│ Binary Record Format │
141-
├──────────────────────────────────────────────┤
137+
┌──────────────────────────────────────────────────
138+
│ Binary Record Format
139+
├──────────────────────────────────────────────────
142140
│ [ColumnCount: 4 bytes] ← int32, little-endian
143-
│ │
144-
│ For each column: │
145-
│ ├─ [NameLength: 4 bytes] ← int32 │
146-
│ ├─ [ColumnName: N bytes] ← UTF-8 string │
147-
│ ├─ [TypeMarker: 1 byte] ← Type indicator │
148-
│ └─ [Value: variable] ← Type-specific │
149-
│ │
150-
│ ... (repeat for all columns) │
151-
└──────────────────────────────────────────────┘
141+
142+
│ For each column:
143+
│ ├─ [NameLength: 4 bytes] ← int32
144+
│ ├─ [ColumnName: N bytes] ← UTF-8 string
145+
│ ├─ [TypeMarker: 1 byte] ← Type indicator
146+
│ └─ [Value: variable] ← Type-specific
147+
148+
│ ... (repeat for all columns)
149+
└──────────────────────────────────────────────────
152150
```
153151

154152
#### Type Markers
@@ -171,7 +169,7 @@ public enum BinaryTypeMarker : byte
171169

172170
#### Concrete Example
173171

174-
Stel je voor we hebben:
172+
Suppose we have:
175173

176174
```csharp
177175
var row = new Dictionary<string, object>
@@ -183,7 +181,7 @@ var row = new Dictionary<string, object>
183181
};
184182
```
185183

186-
Dit wordt geserialiseerd als:
184+
This is serialized as:
187185

188186
```
189187
Offset Size Value Explanation
@@ -340,16 +338,16 @@ public static class BinaryRowSerializer
340338

341339
## 🔤 String Handling & Size Constraints
342340

343-
### ❌ Misconception: "Je hebt veel vrije ruimte nodig"
341+
### ❌ Misconception: "You need lots of free space"
344342

345-
**Dit is NIET waar!** Hier is waarom:
343+
**This is NOT true!** Here's why:
346344

347-
#### 1. **Strings zijn variable-length**
348-
- Een record met 10 bytes strings hoeft maar 10 bytes schijfruimte
349-
- Een record met 10MB strings hoeft 10MB schijfruimte
350-
- **Geen vaste grootte per kolom**geen verspilling
345+
#### 1. **Strings are variable-length**
346+
- A record with 10-byte strings needs only 10 bytes of disk space
347+
- A record with 10MB strings needs 10MB of disk space
348+
- **No fixed size per column**no wasted space
351349

352-
#### 2. **Length-prefixing solve boundaries**
350+
#### 2. **Length-prefixing solves boundaries**
353351

354352
```
355353
String Layout:
@@ -405,7 +403,7 @@ foreach (var str in testStrings)
405403

406404
### ⚠️ What About Record Size Limits?
407405

408-
**Records kunnen NIET groter zijn dan een blok** (page size).
406+
**Records CANNOT be larger than a block** (page size).
409407

410408
```csharp
411409
// Example: Default 4KB page size
@@ -425,7 +423,7 @@ var row = new Dictionary<string, object>
425423
// Total: ~7034 bytes > 4096 bytes ❌ ERROR
426424
```
427425

428-
**Oplossing:** Verhoog page size
426+
**Solution:** Increase page size
429427

430428
```csharp
431429
var options = new DatabaseOptions
@@ -452,15 +450,15 @@ var provider = SingleFileStorageProvider.Open("mydb.scdb", options);
452450
// File size = sum of all actual record sizes (no padding)
453451
```
454452

455-
**Geen vaste overhead per record!** Alleen de bytes die je gebruikt.
453+
**No fixed overhead per record!** Only the bytes you use.
456454

457455
---
458456

459457
## 📊 Free Space Management
460458

461459
### How FSM Works
462460

463-
De **Free Space Map (FSM)** beheerd vrije pagina's. Dit is een 2-level bitmap:
461+
The **Free Space Map (FSM)** manages free pages. This is a 2-level bitmap:
464462

465463
```csharp
466464
internal sealed class FreeSpaceManager
@@ -657,9 +655,9 @@ registry.FlushAsync(); // ← Single batched flush!
657655

658656
```
659657
Step 1: User writes a row
660-
┌────────────────────────────────┐
661-
│ row = {Id: 42, Name: "John"} │
662-
└────────────────────────────────┘
658+
┌────────────────────────────────────
659+
│ row = {Id: 42, Name: "John"}
660+
└────────────────────────────────────
663661
664662
Step 2: Serialize to binary
665663
┌────────────────────────────────────────────────────┐
@@ -828,39 +826,39 @@ var metrics = blockRegistry.GetMetrics();
828826

829827
## ❓ FAQ
830828

831-
### Q1: Moet ik veel vrije ruimte reserveren?
829+
### Q1: Do I need to reserve lots of free space?
832830

833-
**A:** Nee! Vrije ruimte wordt automatisch beheerd via FSM. Bestanden groeien exponentieel:
834-
- Eerste groei: +10 MB
835-
- Volgende groeien: exponentieel (2x, 4x, ...)
831+
**A:** No! Free space is managed automatically via FSM. Files grow exponentially:
832+
- First growth: +10 MB
833+
- Subsequent growth: exponential (2x, 4x, ...)
836834
- No pre-allocation needed
837835

838-
### Q2: Hoe groot kunnen strings worden?
836+
### Q2: How big can strings be?
839837

840-
**A:** Theoretisch tot 2 GB (int32 limit per string). Praktisch:
838+
**A:** Theoretically up to 2 GB (int32 limit per string). Practically:
841839
- Small strings (< 1 KB): Very fast
842840
- Medium strings (1-100 MB): Still efficient
843841
- Large strings (> 100 MB): Will fragment disk, consider BLOB storage
844842

845-
### Q3: Hoe weet ik waar een record eindigt?
843+
### Q3: How do I know where a record ends?
846844

847-
**A:** Via Block Registry! Elk record is opgeslagen als een block:
845+
**A:** Via Block Registry! Each record is stored as a block:
848846
```csharp
849847
BlockEntry entry = registry["Users_Row_001"];
850848
ulong startOffset = entry.Offset;
851849
ulong endOffset = entry.Offset + entry.Length;
852850
```
853851

854-
### Q4: Kunnen strings NULL zijn?
852+
### Q4: Can strings be NULL?
855853

856-
**A:** Ja, via type marker 0:
854+
**A:** Yes, via type marker 0:
857855
```csharp
858856
case null:
859857
buffer[offset++] = 0; // Type: Null
860858
// No value follows
861859
```
862860

863-
### Q5: Wat gebeurt er met Unicode?
861+
### Q5: What about Unicode?
864862

865863
**A:** UTF-8 encoding, automatic length adjustment:
866864
```csharp
@@ -869,39 +867,39 @@ case null:
869867
"🚀"4 bytes (1 char × 4 bytes)
870868
```
871869

872-
### Q6: Kan ik strings direkt wijzigen zonder het record opnieuw te schrijven?
870+
### Q6: Can I modify strings directly without rewriting the record?
873871

874-
**A:** Nee, SharpCoreDB werkt immutable:
872+
**A:** No, SharpCoreDB works immutably:
875873
1. Load record (deserialize)
876874
2. Modify in memory
877875
3. Serialize & write new block
878876
4. Update registry
879877
5. Mark old block as free (WAL handles recovery)
880878

881-
### Q7: Hoe werkt compression?
879+
### Q7: What about compression?
882880

883-
**A:** Momenteel niet geïmplementeerd. Reserved in header voor toekomstige use.
884-
Huidige focus: Zero-allocation serialization is sneller dan compression overhead.
881+
**A:** Not currently implemented. Reserved in header for future use.
882+
Current focus: Zero-allocation serialization is faster than compression overhead.
885883

886-
### Q8: Hoe is de free space distributed?
884+
### Q8: How is free space distributed?
887885

888-
**A:** Non-contiguous! Records kunnen overal in het bestand staan:
886+
**A:** Non-contiguous! Records can be scattered throughout the file:
889887
```
890888
File layout:
891889
[Block1: 4KB] [Block2: 8KB] [Free: 2KB] [Block3: 4KB] [Free: 1KB] [Block4: 2KB]
892890
```
893-
Geen fragmentatie-waarschuwing nodig - FSM beheert dit transparant.
891+
No fragmentation warning needed - FSM manages this transparently.
894892

895-
### Q9: Kan ik een hele tabel in één "block" opslaan?
893+
### Q9: Can I store an entire table in one "block"?
896894

897-
**A:** Nee, iedere rij is een apart block. Voordelen:
898-
- Fijnere granulariteit bij locking
899-
- Betere cache-locality
900-
- Flexibel grootten
895+
**A:** No, each row is a separate block. Advantages:
896+
- Finer-grained locking
897+
- Better cache-locality
898+
- Flexible sizing
901899

902-
### Q10: Hoe zit het met transacties?
900+
### Q10: How do transactions work?
903901

904-
**A:** Beheerd via WAL (Write-Ahead Log):
902+
**A:** Managed via WAL (Write-Ahead Log):
905903
1. Begin transaction
906904
2. Writes go to WAL first
907905
3. On commit, registry updated

0 commit comments

Comments
 (0)