Skip to content

Commit 7edc965

Browse files
committed
wip
1 parent 5d8aa5c commit 7edc965

1 file changed

Lines changed: 27 additions & 55 deletions

File tree

_posts/2025-10-13-Performance-Analysis-of-Parallel-Data-Replication-Between-Two-PostgreSQL-18-Instances-on-OVH.md

Lines changed: 27 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -215,53 +215,44 @@ One hypothesis is that network saturation at degree 128 acts as a pacing mechani
215215

216216
## 5. Disk I/O and Network Analysis
217217

218-
### 5.1 FIO Disk Benchmarks vs PostgreSQL Performance
218+
### 5.1 Source Disk I/O Analysis
219219

220-
<img src="/img/2025-10-13_01/disk_fio_vs_postgresql_bandwidth.png" alt="Disk Bandwidth Comparison." width="900">
221-
222-
**Figure 13a: FIO Benchmark vs PostgreSQL Actual Performance (Bandwidth)** - PostgreSQL achieves 3,759 MB/s peak (100.5% of FIO's 3,741 MB/s), demonstrating it can saturate disk during bursts. However, average is only 170 MB/s (4.5% of peak), revealing highly bursty behavior with long idle periods.
223-
224-
<img src="/img/2025-10-13_01/disk_fio_vs_postgresql_iops.png" alt="Disk IOPS Comparison." width="900">
225-
226-
**Figure 13b: FIO Benchmark vs PostgreSQL Actual Performance (IOPS)** - Peak IOPS reaches 55,501 operations per second during intensive bursts, but average IOPS is only 207 operations per second, further confirming the bursty pattern with most time spent idle or at low activity levels.
220+
The source instance has 256GB RAM, and the lineitem table is ~77GB. An important detail explains the disk behavior across test runs:
227221

228-
**FIO Results:**
222+
**Degree 1 was the first test run** with no prior warm-up or cold run to pre-load the table into cache. During this first run:
229223

230-
- Sequential Write (128K): 3,741 MB/s, 29,900 IOPS
231-
- Random Read (4K): 313 MB/s, 80,200 IOPS
232-
- Random Write (4K): 88.2 MB/s, 22,600 IOPS
224+
**At degree 1 (first ~10 seconds)**: Heavy disk activity (500 MB/s, 100% utilization) loads the table into memory (shared_buffers + OS page cache).
233225

234-
**PostgreSQL Actual:**
226+
**At degree 1 (remaining ~860 seconds)**: Near-zero disk activity; the table is fully cached in RAM, no disk reads needed.
235227

236-
- Peak Write: 3,759 MB/s, 55,501 IOPS (matches FIO sequential)
237-
- Mean Write: 170 MB/s, 207 IOPS (only 4.5% of peak)
238-
- Mean Disk Utilization: 14.6% (disk idle 85% of the time)
228+
**At degrees 2-128**: Essentially zero disk activity; the entire table remains cached in memory from the initial degree 1 load.
239229

240-
### 5.2 Source Disk I/O Analysis
230+
**This explains why degree 2 is more than twice as fast as degree 1**: The degree 1 run includes the initial table-loading overhead (~10 seconds of intensive disk I/O), while degree 2 benefits from the already-cached table with no disk loading required. The speedup from degree 1 to 2 reflects both the doubling of parallelism AND the elimination of the initial cache-loading penalty.
241231

242-
This section examines source PostgreSQL disk I/O behavior to provide concrete evidence that the source is experiencing backpressure from the target bottleneck rather than being disk-limited.
232+
<img src="/img/2025-10-13_01/source_disk_utilization_timeseries.png" alt="Source Disk Utilization Time Series." width="900">
243233

244-
#### The Cache Effect and First-Run Impact
234+
**Figure 13: Source Disk Utilization Over Time** - Shows disk utilization across all test runs (vertical lines mark test boundaries for degrees 1, 2, 4, 8, 16, 32, 64, 128). At degree 1, utilization peaks at ~50% during the initial table load, then drops to near-zero. At higher degrees (2-128), utilization remains below 1% throughout, confirming the disk is idle and not limiting performance.
245235

246-
The source instance has 256GB RAM, and the lineitem table is ~77GB. An important detail explains the disk behavior across test runs:
236+
**Source disk I/O is NOT a bottleneck at any parallelism degree.** The source exhibits different behavior depending on parallelism:
247237

248-
**Degree 1 was the first test run** with no prior warm-up or cold run to pre-load the table into cache. During this first run:
238+
- **Degree 1**: Brief initial disk load (~10 seconds), then reads from RAM cache
239+
- **Degrees 2-128**: Table fully cached in memory (256GB available, 77GB table)
249240

250-
**At degree 1 (first ~10 seconds)**: Heavy disk activity (500 MB/s, 100% utilization) loads the table into memory (shared_buffers + OS page cache).
241+
The near-zero disk utilization (<1%) at high parallelism degrees confirms the disk is idle and not limiting performance. This indicates that source processes are constrained by downstream backpressure rather than local disk I/O capacity. Resolving target CPU lock contention would automatically improve source performance, as the source has substantial unused capacity waiting to be unlocked.
251242

252-
**At degree 1 (remaining ~860 seconds)**: Near-zero disk activity; the table is fully cached in RAM, no disk reads needed.
243+
### 5.2 Target Disk I/O Time Series
253244

254-
**At degrees 2-128**: Essentially zero disk activity; the entire table remains cached in memory from the initial degree 1 load.
245+
<img src="/img/2025-10-13_01/target_disk_write_throughput_timeseries.png" alt="Target Disk Write Throughput Time Series." width="900">
255246

256-
**This explains why degree 2 is more than twice as fast as degree 1**: The degree 1 run includes the initial table-loading overhead (~10 seconds of intensive disk I/O), while degree 2 benefits from the already-cached table with no disk loading required. The speedup from degree 1 to 2 reflects both the doubling of parallelism AND the elimination of the initial cache-loading penalty.
247+
**Figure 14: Target Disk Write Throughput Over Time** - Vertical lines mark test boundaries (degrees 1, 2, 4, 8, 16, 32, 64, 128). Throughput exhibits bursty behavior with spikes to 2000-3759 MB/s followed by drops to near zero. Sustained baseline varies from ~100 MB/s (low degrees) to ~300 MB/s (degree 128) but never sustains disk capacity.
257248

258-
<img src="/img/2025-10-13_01/source_disk_utilization_timeseries.png" alt="Source Disk Utilization Time Series." width="900">
249+
<img src="/img/2025-10-13_01/target_disk_utilization_timeseries.png" alt="Target Disk Utilization Time Series." width="900">
259250

260-
**Figure 14: Source Disk Utilization Over Time** - Shows disk utilization across all test runs (vertical lines mark test boundaries for degrees 1, 2, 4, 8, 16, 32, 64, 128). At degree 1, utilization peaks at ~50% during the initial table load, then drops to near-zero. At higher degrees (2-128), utilization remains below 1% throughout, confirming the disk is idle and not limiting performance.
251+
**Figure 15: Target Disk Utilization Over Time** - Mean utilization remains below 25% across all degrees. Spikes reach 70-90% during bursts but quickly return to low baseline. This strongly suggests disk I/O is not the bottleneck.
261252

262-
#### The Evidence Chain
253+
Disk utilization measures the percentage of time the disk is busy serving I/O requests.
263254

264-
The source disk analysis reveals the interplay between caching and backpressure:
255+
The disk analysis reveals the interplay between caching and backpressure:
265256

266257
```
267258
Table Cached in RAM (256GB RAM, 77GB table)
@@ -277,51 +268,32 @@ FastTransfer Batches Block (waiting for target acknowledgment)
277268
Source Processes Sleep (0.11 cores/process, blocked in system calls)
278269
```
279270

280-
#### Conclusion
281-
282-
**Source disk I/O is NOT a bottleneck at any parallelism degree.** The source exhibits different behavior depending on parallelism:
283-
284-
- **Degree 1**: Brief initial disk load (~10 seconds), then reads from RAM cache
285-
- **Degrees 2-128**: Table fully cached in memory (256GB available, 77GB table)
286-
287-
The near-zero disk utilization (<1%) at high parallelism degrees confirms the disk is idle and not limiting performance. This indicates that source processes are constrained by downstream backpressure rather than local disk I/O capacity. Resolving target CPU lock contention would automatically improve source performance, as the source has substantial unused capacity waiting to be unlocked.
288-
289-
### 5.3 Target Disk I/O Time Series
290-
291-
<img src="/img/2025-10-13_01/target_disk_write_throughput_timeseries.png" alt="Target Disk Write Throughput Time Series." width="900">
292-
293-
**Figure 15: Target Disk Write Throughput Over Time** - Vertical lines mark test boundaries (degrees 1, 2, 4, 8, 16, 32, 64, 128). Throughput exhibits bursty behavior with spikes to 2000-3759 MB/s followed by drops to near zero. Sustained baseline varies from ~100 MB/s (low degrees) to ~300 MB/s (degree 128) but never sustains disk capacity.
294-
295-
<img src="/img/2025-10-13_01/target_disk_utilization_timeseries.png" alt="Target Disk Utilization Time Series." width="900">
296-
297-
**Figure 16: Target Disk Utilization Over Time** - Mean utilization remains below 25% across all degrees. Spikes reach 70-90% during bursts but quickly return to low baseline. This strongly suggests disk I/O is not the bottleneck.
298-
299-
### 5.4 Network Throughput Analysis
271+
### 5.3 Network Throughput Analysis
300272

301273
<img src="/img/2025-10-13_01/target_network_rx_timeseries.png" alt="Target Network RX Time Series." width="900">
302274

303-
**Figure 17: Target Network Ingress Over Time** - At degree 128, throughput plateaus at ~2,450 MB/s (98% of capacity) during active bursts, but averages only 1,088 MB/s (43.5%) due to alternating active/idle periods. At degrees 1-64, network remains well below capacity.
275+
**Figure 16: Target Network Ingress Over Time** - At degree 128, throughput plateaus at ~2,450 MB/s (98% of capacity) during active bursts, but averages only 1,088 MB/s (43.5%) due to alternating active/idle periods. At degrees 1-64, network remains well below capacity.
304276

305277
Network saturation occurs **only at degree 128** during active bursts. Therefore, network doesn't explain poor scaling from degree 1 through 64, target CPU lock contention remains the primary bottleneck.
306278

307-
### 5.5 Cross-Degree Scaling Analysis
279+
### 5.4 Cross-Degree Scaling Analysis
308280

309281
<img src="/img/2025-10-13_01/cross_degree_disk_write_mean.png" alt="Cross Degree Mean Disk Write." width="900">
310282

311-
**Figure 18: Mean Disk Write Throughput by Degree** - Scales from 90 MB/s (degree 1) to 1,099 MB/s (degree 128), only 12.3x improvement for 128x parallelism (9.6% efficiency).
283+
**Figure 17: Mean Disk Write Throughput by Degree** - Scales from 90 MB/s (degree 1) to 1,099 MB/s (degree 128), only 12.3x improvement for 128x parallelism (9.6% efficiency).
312284

313285
<img src="/img/2025-10-13_01/cross_degree_network_comparison.png" alt="Cross Degree Network Comparison." width="900">
314286

315-
**Figure 19: Network Throughput Comparison: Source TX vs Target RX** - At degree 128, source transmits 1,684 MB/s while target receives only 1,088 MB/s, creating a 596 MB/s (35%) deficit. This suggests the target cannot keep pace with source data production, likely due to CPU lock contention.
287+
**Figure 18: Network Throughput Comparison: Source TX vs Target RX** - At degree 128, source transmits 1,684 MB/s while target receives only 1,088 MB/s, creating a 596 MB/s (35%) deficit. This suggests the target cannot keep pace with source data production, likely due to CPU lock contention.
316288

317289
**Technical Note on TX/RX Discrepancy:** The apparent 35% violation of flow conservation is explained by TCP retransmissions. The source TX counter (measured via `sar -n DEV`) counts both original packets and retransmitted packets, while the target RX counter only counts successfully received unique packets. When the target is overloaded with CPU lock contention (83.9% system CPU at degree 64), it cannot drain receive buffers fast enough, causing packet drops that trigger TCP retransmissions. The 596 MB/s "deficit" is actually retransmitted data counted twice at the source but only once at the target, providing quantitative evidence of the target's inability to keep pace with source data production.
318290

319291
<img src="/img/2025-10-13_01/cross_degree_disk_utilization.png" alt="Cross Degree Disk Utilization." width="900">
320292

321293

322-
**Figure 20: Disk Utilization by Degree** - Mean utilization increases from 2.2% (degree 1) to only 24.3% (degree 128), remaining far below the 80% saturation threshold at all degrees. This strongly indicates disk I/O is not the bottleneck.
294+
**Figure 19: Disk Utilization by Degree** - Mean utilization increases from 2.2% (degree 1) to only 24.3% (degree 128), remaining far below the 80% saturation threshold at all degrees. This strongly indicates disk I/O is not the bottleneck.
323295

324-
### 5.6 I/O Analysis Conclusions
296+
### 5.5 I/O Analysis Conclusions
325297

326298
1. **Disk does not appear to be the bottleneck**: 24% average utilization at degree 128 with 76% idle capacity. PostgreSQL matches FIO peak (3,759 MB/s) but sustains only 170 MB/s average.
327299

0 commit comments

Comments
 (0)