Skip to content

Commit 3626c70

Browse files
committed
up
1 parent bfcbb96 commit 3626c70

55 files changed

Lines changed: 1047 additions & 1249 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

_CONTENT/eng/data/acid.md

Lines changed: 0 additions & 21 deletions
This file was deleted.

_CONTENT/eng/data/data.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,97 @@
11
---
22
---
3+
## schema
4+
```
5+
encoding
6+
text
7+
binary
8+
9+
schema evolution
10+
avro
11+
protobuf
12+
13+
keep unknown fields
14+
tags vs names: compact + rename later
15+
16+
breaking
17+
deleting required fields
18+
changing field types
19+
20+
old code -> new data
21+
old data <- new code
22+
```
23+
24+
## integrity
25+
```
26+
acid: TX, FK, CHECK, WAL, ..
27+
28+
isolation problems
29+
reads
30+
dirty
31+
non-repeatable
32+
phantom
33+
read-skew
34+
35+
writes
36+
lost
37+
decision skew
38+
39+
solutions
40+
1. read committed
41+
2. snapshot/repeatable read: MVCC, must for analytics, backups
42+
3. serializable MVCC + SSI
43+
44+
SSI: serializable snapshot isolation
45+
predicate locks + dep. cycles
46+
```
47+
48+
49+
## proc
50+
```
51+
batch
52+
atomic ops on seq. data
53+
delta lake: parquet + transaction log + metadata
54+
55+
stream
56+
immutable events
57+
58+
time
59+
event
60+
delivery
61+
processing
62+
63+
flow control
64+
backpressure
65+
circuit breaker
66+
67+
consumer lag
68+
checkpoint
69+
watermark
70+
grace period
71+
publish correction
72+
73+
windows: fixed, overlapping, sliding, session
74+
75+
log compaction
76+
joins
77+
probabilistic dsa
78+
```
79+
80+
## comm
81+
```
82+
ipc
83+
db, services, messages
84+
85+
push: pubsub, ws, sse, webhook
86+
pull: query, poll
87+
q: decouple, buffer
88+
89+
MPI: message passing interface
90+
no central coordinator
91+
nodes communicate directly
92+
93+
delivery semantics
94+
exactly once
95+
producer retry + consumer dedup
96+
producer outbox + consumer ack in tx
97+
```

_CONTENT/eng/data/dataflow.md

Lines changed: 0 additions & 53 deletions
This file was deleted.

_CONTENT/eng/data/dist.md

Lines changed: 34 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -49,37 +49,47 @@ rebalancing is expensive
4949
service discovery, request routing
5050
```
5151

52-
## consistency
53-
linearizable: single copy illusion, single leader + election consensus + sync replication
54-
55-
causal: vector clocks + dependency tracking
56-
57-
eventual: async replication + conflict resolution
58-
59-
## consensus
60-
raft: majority ack, term number fencing
61-
62-
## atomic commit
63-
2PC: ask all, commit if they all ack, like marriage, coordinator spof
64-
practical: 2pc + raft for coordinator failover
6552

6653
## time and order
67-
NTP, GPS
54+
```
55+
lamport clock
56+
single counter per process
57+
can only tell if A happens-before B
58+
59+
vector clocks
60+
list of counters per process
61+
can detect concurrency, detects conflicts
62+
63+
versions
64+
each replica tracks versions of replicated data objects
65+
66+
availability
67+
heartbeat pings with timeout
68+
adapt to network conditions
69+
lease with ttl
70+
gossip
71+
```
6872

69-
lamport clock: single counter per process, can only tell if A happens-before B
7073

71-
vector clocks: list of counters per process, can detect concurrency, detects conflicts
74+
## consistency
75+
```
76+
linearizable
77+
single copy illusion
78+
single leader + election consensus + (sync replication or raft quorum)
7279
73-
version vector: each replica tracks versions of replicated data objects
80+
causal: vector clocks + dependency tracking
7481
82+
eventual: async replication + conflict resolution
7583
```
76-
lamport int
77-
vector []int
78-
versions map[object]version
84+
85+
## consensus
7986
```
87+
raft: majority ack, term number fencing
8088
81-
## availability
82-
heartbeat pings with timeout, adapt to network conditions
83-
lease with ttl
84-
gossip
89+
atomic commit
90+
2PC
91+
ask all, commit if they all ack
92+
like marriage, coordinator spof
8593
94+
practical: 2pc + raft for coordinator failover
95+
```

_CONTENT/eng/data/ml.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
---
3+
## tradeoffs
4+
```
5+
Bias-variance: simple vs complex
6+
accurate vs interpretable
7+
complex vs general: simpler models generalize better
8+
```
9+
10+
## precision vs recall
11+
precise has few false positives but misses some true ones
12+
13+
high recall: more true positives but also more false ones
14+
15+
F1-score: harmonic mean of both
16+
17+
eg. spam detection - balance of catching more spam without marking good emails as spam
18+
19+
20+
## diffusion
21+
```
22+
training
23+
start with an image
24+
add random noise
25+
make the model predict how much noise is added and where
26+
27+
generating
28+
take a text prompt
29+
turn it into a text embedding
30+
start with random noise
31+
gradually remove noise in the direction of text embedding tensor
32+
```
33+
34+
## RL
35+
An **agent** takes **actions** in an **environment** to maximize **reward**
36+
37+
Agent can use
38+
its internal **state**
39+
its decision **policy**
40+
its **model** of environment etc.
41+
42+
<https://lilianweng.github.io/posts/2018-02-19-rl-overview/>
43+
44+
45+
## tf
46+
47+
<https://jalammar.github.io/illustrated-transformer/>
48+
49+
## deep
50+
51+
```
52+
A network is layers of weights
53+
[nums] -> network -> num
54+
55+
For each training pass:
56+
Multiply input tensor with weights and add bias, layer by layer
57+
Compare prediction to correct answer using a loss function
58+
Find gradients using backprop
59+
Minimize loss via gradient descent
60+
61+
For a 3-layer network
62+
63+
1. Forward pass
64+
65+
Input → Layer1 → Layer2 → Layer3 → Output → Loss
66+
67+
x = input data
68+
l1 = activation_func(W1.x + b1)
69+
l2 = activation_func(W2.x + b2)
70+
l3 = ..
71+
output = W3.l3 + b4
72+
73+
loss = loss_func(output, correct_answer)
74+
75+
2. Backprop
76+
77+
Update weights using chain rule
78+
79+
Loss → Layer3 → Layer2 → Layer1 → Input
80+
81+
d_loss/d3
82+
d_loss/d2 = d_loss/d3 . d2/d3
83+
d_loss/d1 = d_loss/d2 . d2/d1
84+
d_loss/input = d_loss/d1 . d1/input
85+
86+
W = W - learning_rate.d_loss/dW
87+
```
File renamed without changes.

0 commit comments

Comments
 (0)