Skip to content

Commit 24fd444

Browse files
committed
documentation updated
1 parent f83d4aa commit 24fd444

13 files changed

Lines changed: 206 additions & 161 deletions

README.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -149,9 +149,9 @@ See [docs/DLOCKSS_PROTOCOL.md](docs/DLOCKSS_PROTOCOL.md) for protocol details.
149149
D-LOCKSS acts as a self-healing, sharded storage cluster using the IPFS/Libp2p stack.
150150
151151
### Key Components
152-
1. **Shard Manager:** Dynamically splits responsibilities based on peer count to maintain scalability.
152+
1. **Shard Manager:** Dynamically splits responsibilities based on peer count to maintain scalability. Delegates lifecycle decisions (split/merge/discovery) to a `lifecycleManager` and replication to a `replicationManager`.
153153
2. **Cluster Manager:** Manages embedded **IPFS Cluster** instances (one per shard) using **CRDTs** for state consensus; nodes in a shard sync and pin content assigned to that shard.
154-
3. **File Watcher:** Monitors the data directory to automatically ingest content.
154+
3. **File Watcher:** Monitors the data directory to automatically ingest content (via `handleWatcherEvent` / `handleNewDirectory`).
155155
4. **Storage Monitor:** Protects nodes from disk exhaustion by rejecting custodial requests when full.
156156
5. **BadBits Manager:** Enforces content blocking (e.g., DMCA) based on configured country codes.
157157
@@ -186,14 +186,6 @@ go build -o dlockss-monitor ./cmd/dlockss-monitor
186186
```
187187
Open http://localhost:8080. The monitor displays each node's **name** (if configured via `DLOCKSS_NODE_NAME`), falling back to the Peer ID. Names propagate via HEARTBEAT/JOIN messages and appear in the node table, charts, and shard modals. Client-side aliases (EDIT button) override server-side names. Each node has **one peer ID**: when `DLOCKSS_IPFS_CONFIG` is set (e.g. in testnet), D-LOCKSS uses the IPFS repo identity so the same ID appears in the monitor and in `node_x.ipfs.log`.
188188

189-
For geographic region display, optionally provide a GeoIP database:
190-
```bash
191-
./dlockss-monitor --geoip-db /path/to/GeoLite2-City.mmdb
192-
# or via environment variable:
193-
export DLOCKSS_MONITOR_GEOIP_DB=/path/to/GeoLite2-City.mmdb
194-
```
195-
Without a local database, the monitor falls back to the ip-api.com batch API with permanent caching.
196-
197189
The monitor bootstrap-subscribes to all shards up to depth 6 (127 shards) so it can see nodes even when started late. Set `DLOCKSS_MONITOR_BOOTSTRAP_SHARD_DEPTH` (0–12) to tune.
198190

199191
Alternatively use: https://dlockss-monitor.wmcloud.org.
@@ -207,7 +199,7 @@ go test ./... -v
207199
```
208200
209201
### Project Status
210-
* **Current Phase:** Production — active refactoring for code quality and operational robustness (see [Code Elegance Plan](docs/CODE_ELEGANCE_PLAN.md)).
202+
* **Current Phase:** Production — structural refactoring complete (see [Code Elegance Plan](docs/CODE_ELEGANCE_PLAN.md)). Config uses nested sub-structs (`Sharding`, `Replication`, `Files`, `Security`, `Orphan`). ShardManager delegates to `replicationManager` and `lifecycleManager`.
211203
212204
---
213205

docs/DLOCKSS_PROTOCOL.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -171,15 +171,18 @@ Nodes in any shard (not just root) periodically run discovery to join existing d
171171

172172
### 6.2 Replication & Repair
173173

174-
Replication is handled by the **Cluster Manager** and **LocalPinTracker** per shard:
174+
Replication is handled by the **Cluster Manager**, **LocalPinTracker**, and **replicationManager** (a delegate of ShardManager):
175175

176176
1. **Pinning**: When a responsible node ingests or accepts a file, it calls `ClusterManager.Pin(ctx, shardID, cid, ...)` on the shard's embedded cluster. Allocations are chosen deterministically from Peers() (shard mesh via ShardPeerProvider).
177177
2. **State Sync**: The CRDT (Merkle-DAG based) propagates pin/unpin to all peers in the shard via PubSub (`dlockss-shard-<id>`).
178178
3. **Local Pin Tracker**:
179179
* Each node runs a `LocalPinTracker` per shard that polls CRDT State() (and on TriggerSync).
180180
* For each pin in state, if this node is in **Allocations** (or Allocations is empty), it pins the ManifestCID locally via IPFS. The **onPinSynced** callback then: (a) registers the file with StorageManager, (b) announces PINNED, (c) resolves the PayloadCID from the manifest, (d) **pins the PayloadCID as its own root** so Kubo's reprovider (`pinned` strategy) re-announces it, and (e) provides both ManifestCID and PayloadCID to the DHT. On the ingesting node the payload is already a pin root from `ImportFile`; on replicas only the ManifestCID was pinned, so step (d) adds the missing pin entry — blocks are already local from the manifest's recursive pin so this returns quickly.
181181
* Pins no longer in state or no longer allocated are unpinned locally and onPinRemoved is called.
182-
4. **Repair**: Under-replicated files trigger ReplicationRequest on the shard topic; peers that have the file JoinShard(targetShard), Pin, TriggerSync. CRDT sync and LocalPinTracker then replicate to allocated peers.
182+
4. **Repair (replicationManager)**: The `replicationManager` (in `shard_replication.go`) periodically broadcasts `ReplicationRequest` messages for all pinned manifests (cooldown: 5 minutes per manifest). Receiving peers handle requests as follows:
183+
* **Already pinned**: Ensure cluster membership and trigger CRDT sync.
184+
* **Not pinned (auto-replication)**: Fetch the manifest via `PinRecursive` (timeout: 5 minutes), add to cluster via `ClusterPinIfAbsent`, and trigger sync. Concurrency is bounded by a semaphore (default: 5 concurrent auto-replications).
185+
* Legacy manifests (containing a `ts` field) are silently ignored.
183186

184187
### 6.2.1 Heartbeat-Driven Re-Pin and Re-Provide
185188

@@ -260,8 +263,8 @@ volumes:
260263

261264
### 7.2 Integrity & Authenticity
262265
* **Content Addressing**: CIDs guarantee content integrity.
263-
* **Signatures**: All `ResearchObjects` and protocol messages are signed by the sender's private key.
264-
* **Nonces**: Protocol messages include nonces to prevent replay attacks.
266+
* **Signatures**: All `ResearchObjects` and protocol messages are signed by the sender's private key. The `Signer` type (in `internal/signing/`) handles signing and verification, with `verifySignedMessage` decomposed into focused steps: field validation (`validateMessageFields`), timestamp checking (`checkTimestamp`), public key retrieval (`fetchPublicKey`), and cryptographic verification (`verifySignatureBytes`).
267+
* **Nonces**: Protocol messages include nonces (generated by the signing package's internal `newNonce` function) to prevent replay attacks. A `nonceStore` tracks recently seen nonces.
265268

266269
### 7.3 Liar Detection
267270
* Nodes verify that the actual file size matches the `TotalSize` claimed in the `ResearchObject` manifest.

docs/REPLICATION_PERFORMANCE.md

Lines changed: 93 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -1,136 +1,122 @@
11
# Replication Performance Analysis
22

3+
## Architecture Overview
4+
5+
Replication in D-LOCKSS is driven by two complementary mechanisms:
6+
7+
1. **CRDT Cluster Sync** — Each shard runs an embedded IPFS Cluster with CRDT consensus. When a file is pinned to a shard's cluster, the `LocalPinTracker` on every peer in that shard automatically syncs and pins the content locally.
8+
9+
2. **ReplicationRequest Protocol** — The `replicationManager` (extracted from `ShardManager`) periodically broadcasts `ReplicationRequest` messages for pinned manifests. Peers that don't yet have the file perform **auto-replication**: fetch via `PinRecursive` and add to the cluster.
10+
11+
## Key Constants and Defaults
12+
13+
| Parameter | Default | Env Variable | Location |
14+
|-----------|---------|-------------|----------|
15+
| Replication Check Interval | 1 minute | `DLOCKSS_CHECK_INTERVAL` | `config.Replication.CheckInterval` |
16+
| Root Shard Check Interval | 20 seconds | (hardcoded) | `rootReplicationCheckInterval` |
17+
| Request Cooldown Per Manifest | 5 minutes | (hardcoded) | `replicationRequestCooldownDuration` |
18+
| Max Requests Per Cycle | 50 | (hardcoded) | `maxReplicationRequestsPerCycle` |
19+
| Auto-Replication Enabled | true | `DLOCKSS_AUTO_REPLICATION_ENABLED` | `config.Replication.AutoReplicationEnabled` |
20+
| Auto-Replication Timeout | 5 minutes | `DLOCKSS_AUTO_REPLICATION_TIMEOUT` | `config.Replication.AutoReplicationTimeout` |
21+
| Max Concurrent Checks | 5 | `DLOCKSS_MAX_CONCURRENT_CHECKS` | `config.Replication.MaxConcurrentReplicationChecks` |
22+
| Pin Reannounce Interval | 2 minutes | `DLOCKSS_PIN_REANNOUNCE_INTERVAL` | `config.Replication.PinReannounceInterval` |
23+
| Min Replication | 5 | `DLOCKSS_MIN_REPLICATION` | `config.Replication.MinReplication` |
24+
| Max Replication | 10 | `DLOCKSS_MAX_REPLICATION` | `config.Replication.MaxReplication` |
25+
26+
## Convergence Timeline
27+
28+
For a newly ingested file to reach full replication across a shard:
29+
30+
1. **Ingest** (immediate): File pinned locally, `IngestMessage` broadcast to shard, cluster `Pin()` called.
31+
2. **CRDT Sync** (seconds): Cluster state propagates to peers via PubSub; `LocalPinTracker` detects new pin and starts `PinRecursive`.
32+
3. **First Replication Check** (up to 20s at root, 1m elsewhere): `replicationManager.runChecker()` sends `ReplicationRequest` for all pinned manifests.
33+
4. **Auto-Replication** (seconds to minutes): Peers receiving the request that don't have the file fetch it via `PinRecursive` (up to 5-minute timeout).
34+
5. **Cooldown** (5 minutes): After sending a request for a manifest, no new request is sent for that manifest for 5 minutes.
35+
36+
**Typical convergence**: Most files replicate within 1-2 minutes via CRDT sync alone. Files that fail the initial sync (large DAGs, slow block propagation) recover on the next replication cycle after the 5-minute cooldown.
37+
338
## Current Bottlenecks
439

5-
Based on code analysis, the following factors contribute to slow replication convergence:
6-
7-
### 1. **Replication Check Interval** (Default: 1 minute)
8-
- **Location**: `CheckInterval = 1*time.Minute`
9-
- **Impact**: Replication levels are only checked once per minute
10-
- **Effect**: Minimum delay of 1 minute before detecting under-replication
11-
12-
### 2. **Hysteresis Verification Delay** (Default: 30 seconds)
13-
- **Location**: `ReplicationVerificationDelay = 30*time.Second`
14-
- **Impact**: When under-replication is detected, system waits ~30 seconds before triggering replication requests
15-
- **Effect**: Adds 30+ seconds delay before NEED messages are broadcast
16-
- **Rationale**: Prevents false alarms from transient DHT issues
17-
18-
### 3. **Replication Check Cooldown** (Default: 15 seconds)
19-
- **Location**: `ReplicationCheckCooldown = 15*time.Second`
20-
- **Impact**: Prevents checking the same file more than once every 15 seconds
21-
- **Effect**: Limits how quickly replication can be re-checked after a change
22-
23-
### 4. **Replication Cache TTL** (Default: 5 minutes)
24-
- **Location**: `ReplicationCacheTTL = 5*time.Minute`
25-
- **Impact**: Cached replication counts prevent frequent DHT queries
26-
- **Effect**: Replication counts may be stale for up to 5 minutes
27-
- **Trade-off**: Reduces DHT load but slows convergence detection
28-
29-
### 5. **DHT Query Timeout** (Default: 2 minutes)
30-
- **Location**: `context.WithTimeout(ctx, 2*time.Minute)` in `checkReplication()`
31-
- **Impact**: DHT queries can take up to 2 minutes to timeout
32-
- **Effect**: Slow DHT queries delay replication checks
33-
34-
### 6. **DHT Max Sample Size** (Default: 50)
35-
- **Location**: `DHTMaxSampleSize = 50`
36-
- **Impact**: Limits how many providers are queried per DHT lookup
37-
- **Effect**: May underestimate replication count in large networks
38-
39-
### 7. **Worker Pool Limit** (Default: 10 concurrent checks)
40-
- **Location**: `MaxConcurrentReplicationChecks = 10`
41-
- **Impact**: Limits parallelism of replication checks
42-
- **Effect**: With many files, checks are serialized
43-
44-
### 8. **Missing Automatic Replication**
45-
- **Issue**: When a `ReplicationRequest` is received, nodes only check replication - they don't automatically fetch and pin missing files
46-
- **Impact**: Nodes must already have the file to replicate it
47-
- **Effect**: Replication requests don't trigger new replication, only verify existing state
48-
49-
## Total Minimum Delay
50-
51-
For a new file to reach target replication:
52-
1. **Initial check**: Up to 1 minute (CheckInterval)
53-
2. **Verification delay**: ~30 seconds (ReplicationVerificationDelay)
54-
3. **Replication request broadcast**: Immediate
55-
4. **Other nodes check**: Up to 1 minute (their CheckInterval)
56-
5. **Re-check after replication**: Up to 1 minute + 15 seconds cooldown
57-
58-
**Minimum time to convergence**: ~3-4 minutes in ideal conditions
59-
**With DHT delays**: Can be 5-10 minutes or more
40+
### 1. Request Cooldown (5 minutes)
41+
42+
Once a `ReplicationRequest` is sent for a manifest, `replicationRequestCooldownDuration` prevents resending for 5 minutes. If the first request fails (e.g., the receiving peer's `PinRecursive` times out), the file appears "stuck" until the cooldown expires.
43+
44+
**Mitigation**: The cooldown prevents flooding but causes visible delays for files that fail on the first attempt.
45+
46+
### 2. Auto-Replication Timeout (5 minutes)
47+
48+
`PinRecursive` for large files or over slow links may hit the `AutoReplicationTimeout`. The file remains unreplicated until the next replication cycle.
49+
50+
**Mitigation**: Heartbeat-driven re-pin gradually fills in missing blocks (see below).
51+
52+
### 3. Concurrent Replication Limit (5)
53+
54+
The `replicationManager.sem` channel limits concurrent auto-replications to `MaxConcurrentReplicationChecks` (default 5). When all slots are occupied, additional `ReplicationRequest` messages are silently dropped.
55+
56+
**Mitigation**: Increase `DLOCKSS_MAX_CONCURRENT_CHECKS` for nodes with sufficient bandwidth.
57+
58+
### 4. Max Requests Per Cycle (50)
59+
60+
At most 50 `ReplicationRequest` messages are sent per checker cycle. With thousands of files, not all manifests are requested in a single cycle.
61+
62+
**Mitigation**: Subsequent cycles pick up remaining manifests. The cooldown map ensures already-sent requests aren't duplicated.
63+
64+
## Heartbeat-Driven Gradual DAG Completion (Built-In)
65+
66+
Every heartbeat (~10s), each node picks **one** pinned manifest CID (round-robin) and:
67+
68+
1. **Re-pins the ManifestCID recursively** (`PinRecursive`, 2-minute timeout). Idempotent — returns instantly when the DAG is already complete locally, and incrementally fetches missing blocks otherwise.
69+
2. **Pins the PayloadCID as its own root** so Kubo's reprovider (`pinned` strategy) re-announces it.
70+
3. **Provides both CIDs to the DHT** (only if the re-pin succeeded).
71+
72+
A `CompareAndSwap` guard prevents concurrent re-provides from piling up.
73+
74+
**Impact**: Resource-constrained nodes (e.g., Raspberry Pis) that failed the initial `PinRecursive` gradually complete the DAG over successive heartbeats without manual intervention. DHT provider records (which expire after ~24h) are kept fresh.
6075

6176
## Optimization Options
6277

63-
### Option 1: Reduce Check Interval (Quick Win)
78+
### Reduce Check Interval (Quick Win)
6479
```bash
6580
export DLOCKSS_CHECK_INTERVAL=15s # Default: 1m
6681
```
67-
**Pros**: Faster detection of under-replication
68-
**Cons**: More DHT queries, higher CPU usage
69-
**Recommendation**: Use 15-30s for testnets
82+
Faster detection at non-root shards. Root shards already check every 20s.
7083

71-
### Option 3: Reduce Replication Cooldown (Quick Win)
84+
### Increase Concurrent Checks (Moderate Impact)
7285
```bash
73-
export DLOCKSS_REPLICATION_COOLDOWN=5s # Default: 15s
86+
export DLOCKSS_MAX_CONCURRENT_CHECKS=10 # Default: 5
7487
```
75-
**Pros**: Faster re-checking after replication changes
76-
**Cons**: More frequent checks of same files
77-
**Recommendation**: Use 5s for testnets
88+
More parallel auto-replications. Higher bandwidth usage.
7889

79-
### Option 5: Increase Worker Pool (Moderate Impact)
90+
### Increase Auto-Replication Timeout (Large Files)
8091
```bash
81-
export DLOCKSS_MAX_CONCURRENT_CHECKS=20 # Default: 10
92+
export DLOCKSS_AUTO_REPLICATION_TIMEOUT=10m # Default: 5m
8293
```
83-
**Pros**: More parallel replication checks
84-
**Cons**: Higher CPU/memory usage
85-
**Recommendation**: Use 20-30 for testnets with many files
86-
87-
### Option 8: Implement Automatic Replication (Major Feature)
88-
**Code Change Required**: Add logic to fetch and pin files when receiving ReplicationRequest
89-
**Pros**: Actually triggers replication, not just checks
90-
**Cons**: Requires IPFS content fetching, bandwidth usage
91-
**Recommendation**: High priority for production
94+
Allows more time for large DAG fetches. Ties up semaphore slots longer.
9295

9396
## Recommended Testnet Configuration
9497

95-
For faster convergence in testnets, use:
98+
For faster convergence in testnets:
9699

97100
```bash
98101
export DLOCKSS_CHECK_INTERVAL=15s
99-
export DLOCKSS_REPLICATION_VERIFICATION_DELAY=5s
100-
export DLOCKSS_REPLICATION_COOLDOWN=5s
101-
export DLOCKSS_REPLICATION_CACHE_TTL=30s
102-
export DLOCKSS_MAX_CONCURRENT_CHECKS=20
103-
export DLOCKSS_DHT_MAX_SAMPLE_SIZE=100
102+
export DLOCKSS_MAX_CONCURRENT_CHECKS=10
104103
```
105104

106-
This reduces minimum convergence time from ~3-4 minutes to ~30-60 seconds.
107-
108-
### Heartbeat-Driven Gradual DAG Completion (Built-In)
109-
110-
Every heartbeat (~10s), each node picks one pinned manifest (round-robin) and calls `PinRecursive` with a 2-minute timeout. This is idempotent: if the DAG is already fully local it returns instantly, otherwise it incrementally fetches the missing blocks. On success, the manifest and payload CIDs are re-provided to the DHT.
111-
112-
**Impact on resource-constrained nodes (Raspberry Pis):**
113-
- Initial `PinRecursive` during replication may time out or OOM before fetching the full DAG.
114-
- Instead of leaving the file permanently incomplete, subsequent heartbeats gradually fetch the remaining blocks.
115-
- After all blocks are local, Kubo's reprovider stops emitting "block not found locally, cannot provide" errors.
116-
- DHT provider records (which expire after ~24h) are kept fresh without relying solely on Kubo's reprovider.
117-
118-
No configuration needed — this runs automatically on every node.
119-
120105
## Production Considerations
121106

122-
For production networks:
123-
- Keep `ReplicationVerificationDelay` at 30s to prevent false alarms
124-
- Keep `ReplicationCacheTTL` at 5m to reduce DHT load
125-
- Keep `CheckInterval` at 1m for reasonable resource usage
126-
- Consider implementing Option 8 (automatic replication) for better convergence
107+
- Keep `CheckInterval` at 1m for reasonable resource usage (root shards already use 20s).
108+
- Keep `AutoReplicationTimeout` at 5m unless dealing with consistently large files.
109+
- The 5-minute request cooldown is a deliberate trade-off between convergence speed and network overhead; files that fail on the first attempt self-heal after the cooldown expires.
127110

128111
## Monitoring
129112

130-
Watch these metrics to understand replication performance:
131-
- `replicationChecks`: Number of checks performed
132-
- `dhtQueries`: Number of DHT queries
133-
- `dhtQueryTimeouts`: DHT query failures
134-
- `filesAtTargetReplication`: Files with adequate replication
135-
- `lowReplicationFiles`: Files needing replication
136-
- `avgReplicationLevel`: Average replication across all files
113+
The monitor's `replication snapshot` log line reports:
114+
- `total_manifests`: Number of known manifests
115+
- `total_at_target`: Files with replica count >= min(MinReplication, shard_peer_count)
116+
- `avg_replication`: Average replica count across all manifests
117+
118+
Node daemon logs to watch:
119+
- `"auto-replication: fetched and pinned"` — successful auto-replication
120+
- `"auto-replication: failed to fetch/pin"``PinRecursive` timeout or failure
121+
- `"auto-replication skipped, concurrency limit reached"` — semaphore full
122+
- `"ReplicationRequest sent"` — outbound request (debug level)

0 commit comments

Comments
 (0)