Skip to content

Commit 65aa1f3

Browse files
committed
pi
1 parent 4e0d354 commit 65aa1f3

98 files changed

Lines changed: 776 additions & 2525 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

_CONTENT/eng/data/acid.md

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
---
22
---
3+
## acid
4+
35
atomicity
4-
[[consistency]]
6+
consistency
57
[[isolation]]
6-
durability
8+
durability: WAL, replicas, fsync, power hardware
79

810
tx
911
group ops. to atomic units
@@ -13,7 +15,32 @@ fk integrity
1315
data sync
1416

1517

16-
WAL
17-
replicas
18-
fsync
19-
power hardware
18+
19+
## data flow
20+
db
21+
api
22+
msg passing
23+
24+
push: pubsub, ws, sse, webhook
25+
pull: query, poll
26+
27+
stream: events
28+
batch: cron
29+
30+
req-resp
31+
q
32+
33+
MPI: message passing interface
34+
no central coordinator
35+
nodes communicate directly
36+
37+
38+
## encoding
39+
backward comp: old data, new code
40+
41+
breaking:
42+
deleting required fields
43+
changing field types
44+
45+
keep unknown fields
46+
tags vs names: compact + rename later
File renamed without changes.

_CONTENT/eng/data/dataflow.md

Lines changed: 0 additions & 19 deletions
This file was deleted.

_CONTENT/eng/data/encoding.md

Lines changed: 0 additions & 28 deletions
This file was deleted.

_CONTENT/eng/data/indexing.md

Lines changed: 0 additions & 15 deletions
This file was deleted.

_CONTENT/eng/data/layout.md

Lines changed: 2 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -44,32 +44,5 @@ better compression and disk life
4444
less stable response times in higher percentiles
4545

4646

47-
## columnar vs wide column
48-
49-
both lsm
50-
51-
⏺ The key differences are in data layout and access patterns, not just the underlying LSM storage:
52-
53-
Data Layout:
54-
- Wide-column (Cassandra): Row-oriented within LSM - stores row_key → {col1: val1, col2: val2}
55-
- Columnar (ClickHouse): Column-oriented within LSM - stores col1 → [val1, val2, val3...]
56-
57-
LSM Implementation Details:
58-
- Cassandra: SSTables store rows, compaction merges by row key
59-
- ClickHouse: MergeTree parts store columns separately, merges entire column chunks
60-
- InfluxDB: TSM optimized for time-series compression patterns
61-
62-
Query Optimization:
63-
- Wide-column: Efficient for SELECT * WHERE row_key = X
64-
- Columnar: Efficient for SELECT col1, col2 WHERE condition (column pruning)
65-
66-
Compression:
67-
- Wide-column: Compresses mixed data types within rows
68-
- Columnar: Much better compression on homogeneous column data
69-
70-
Example:
71-
Wide-column LSM: user123 → {name: "John", age: 30, city: "NYC"}
72-
Columnar LSM: name_col → ["John", "Jane", "Bob"]
73-
age_col → [30, 25, 40]
74-
75-
Same LSM merge/compaction strategy, completely different data organization for different workloads.
47+
## columnar vs wide column
48+
both lsm

_CONTENT/eng/data/partitioning.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
---
3+
key range
4+
hash
5+
hybrid (hashkey, sortkey)
6+
7+
choose partition keys well
8+
hot keys: random prefix/suffix
9+
10+
## rebalancing
11+
fixed: many more parts upfront
12+
dynamic: split large, merge small. good for key-range
13+
hybrid
14+
15+
secondary indexes: local or global
16+
keep related data together to prevent scatter/gather
Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,18 @@ wal: low-level, version dependent
44
logical: version free. CDC
55
statement: compact but non-det.
66

7-
single leader
8-
eg. psql streaming rep
9-
10-
multi leader
11-
multi-datacenter
12-
offline apps
13-
docs
14-
7+
single leader: eg. psql streaming
8+
multi leader: multi-datacenter, offline apps, docs
159
leaderless
1610

1711
## conflicts
18-
avoid
19-
route to same leader
20-
crdts
12+
avoid by routing to same leader or crdts
13+
14+
or resolve by read repair, anti-entropy, or app logic
2115

22-
or resolve by
2316
read repair: compare replica responses
2417
background anti-entropy: detect using hashes of data parts and vector clocks
2518
last-write-wins
26-
app-logic
2719

2820
## concerns
2921
replication lag
File renamed without changes.

_CONTENT/eng/design/designlists.md

Lines changed: 0 additions & 100 deletions
This file was deleted.

0 commit comments

Comments
 (0)