You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-10-13-Performance-Analysis-of-Parallel-Data-Replication-Between-Two-PostgreSQL-18-Instances-on-OVH.md
+8-28Lines changed: 8 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -233,6 +233,8 @@ The source instance has 256GB RAM, and the lineitem table is ~77GB. An important
233
233
234
234
**Figure 13: Source Disk Utilization Over Time** - Shows disk utilization across all test runs (vertical lines mark test boundaries for degrees 1, 2, 4, 8, 16, 32, 64, 128). At degree 1, utilization peaks at ~50% during the initial table load, then drops to near-zero. At higher degrees (2-128), utilization remains below 1% throughout, confirming the disk is idle and not limiting performance.
235
235
236
+
Disk utilization measures the percentage of time the disk is busy serving I/O requests.
237
+
236
238
**Source disk I/O is NOT a bottleneck at any parallelism degree.** The source exhibits different behavior depending on parallelism:
237
239
238
240
-**Degree 1**: Brief initial disk load (~10 seconds), then reads from RAM cache
@@ -250,24 +252,6 @@ The near-zero disk utilization (<1%) at high parallelism degrees confirms the di
250
252
251
253
**Figure 15: Target Disk Utilization Over Time** - Mean utilization remains below 25% across all degrees. Spikes reach 70-90% during bursts but quickly return to low baseline. This strongly suggests disk I/O is not the bottleneck.
252
254
253
-
Disk utilization measures the percentage of time the disk is busy serving I/O requests.
254
-
255
-
The disk analysis reveals the interplay between caching and backpressure:
256
-
257
-
```
258
-
Table Cached in RAM (256GB RAM, 77GB table)
259
-
↓
260
-
Source Disk Becomes Idle (0.1 MB/s, 0.0% utilization at degrees 2-128)
261
-
↓
262
-
Meanwhile: Target Lock Contention (83.9% system CPU at degree 64)
263
-
↓
264
-
Target Can't Consume Data Fast Enough (1,088 MB/s vs 1,684 MB/s source TX)
265
-
↓
266
-
FastTransfer Batches Block (waiting for target acknowledgment)
267
-
↓
268
-
Source Processes Sleep (0.11 cores/process, blocked in system calls)
269
-
```
270
-
271
255
### 5.3 Network Throughput Analysis
272
256
273
257
<imgsrc="/img/2025-10-13_01/target_network_rx_timeseries.png"alt="Target Network RX Time Series."width="900">
@@ -286,12 +270,7 @@ Network saturation occurs **only at degree 128** during active bursts. Therefore
286
270
287
271
**Figure 18: Network Throughput Comparison: Source TX vs Target RX** - At degree 128, source transmits 1,684 MB/s while target receives only 1,088 MB/s, creating a 596 MB/s (35%) deficit. This suggests the target cannot keep pace with source data production, likely due to CPU lock contention.
288
272
289
-
**Technical Note on TX/RX Discrepancy:** The apparent 35% violation of flow conservation is explained by TCP retransmissions. The source TX counter (measured via `sar -n DEV`) counts both original packets and retransmitted packets, while the target RX counter only counts successfully received unique packets. When the target is overloaded with CPU lock contention (83.9% system CPU at degree 64), it cannot drain receive buffers fast enough, causing packet drops that trigger TCP retransmissions. The 596 MB/s "deficit" is actually retransmitted data counted twice at the source but only once at the target, providing quantitative evidence of the target's inability to keep pace with source data production.
290
-
291
-
<imgsrc="/img/2025-10-13_01/cross_degree_disk_utilization.png"alt="Cross Degree Disk Utilization."width="900">
292
-
293
-
294
-
**Figure 19: Disk Utilization by Degree** - Mean utilization increases from 2.2% (degree 1) to only 24.3% (degree 128), remaining far below the 80% saturation threshold at all degrees. This strongly indicates disk I/O is not the bottleneck.
273
+
The apparent 35% violation of flow conservation is explained by TCP retransmissions. The source TX counter (measured via `sar -n DEV`) counts both original packets and retransmitted packets, while the target RX counter only counts successfully received unique packets. When the target is overloaded with CPU lock contention (83.9% system CPU at degree 64), it cannot drain receive buffers fast enough, causing packet drops that trigger TCP retransmissions. The 596 MB/s "deficit" is actually retransmitted data counted twice at the source but only once at the target, providing quantitative evidence of the target's inability to keep pace with source data production.
295
274
296
275
### 5.5 I/O Analysis Conclusions
297
276
@@ -325,6 +304,11 @@ The bottleneck appears to be **architectural**, not configurational:
325
304
326
305
No configuration parameter appears able to eliminate these fundamental coordination requirements.
While this analysis relied on system-level metrics, a follow-up study will use PostgreSQL's internal instrumentation to provide direct evidence of lock contention and wait events. This will validate the hypotheses presented in this analysis using database engine-level metrics.
310
+
311
+
328
312
## Appendix A: PostgreSQL Configuration
329
313
330
314
Both PostgreSQL 18 instances were tuned for maximum bulk loading performance.
@@ -465,10 +449,6 @@ This represents the absolute minimum overhead possible. The fact that lock conte
465
449
466
450
These PostgreSQL 18 enhancements provide measurable I/O efficiency improvements, but the fundamental architectural limitation of concurrent writes to a single table persists.
While this analysis relied on system-level metrics, a follow-up study will use PostgreSQL's internal instrumentation to provide direct evidence of lock contention and wait events. This will validate the hypotheses presented in this analysis using database engine-level metrics.
0 commit comments