- P50 (Median): < 100 ms
- P95 (95th Percentile): < 500 ms
Definition: Replication lag is the time for new data to appear on replica shards after being written to primary shards.
# Basic measurement (100 samples)
python3 scripts/measure_replication_lag.py
# Custom index and sample count (use 'direct' for true replication lag)
python3 scripts/measure_replication_lag.py \
--index phish-us \
--samples 200 \
--method direct
# Save results to file
python3 scripts/measure_replication_lag.py \
--output replication_lag_results.json
direct(default): Measures actual replication lag - time for data to be written to replica shards (synchronous replication). This is the true replication lag and should be < 100ms.refresh: Measures searchability lag - includes refresh time. This measures when data becomes searchable (includes refresh interval overhead, typically 200-400ms).shard_stats: Uses shard-level statistics (more accurate but slower)
Important: Elasticsearch uses synchronous replication by default. When you write to a primary shard, it waits for replica acknowledgment before returning success. So actual replication lag is just the write latency (typically 10-50ms), not the refresh time.
Why you might see higher values with refresh method: The refresh method includes the time for documents to become searchable, which includes the refresh interval (default 1s). This is "searchability lag", not "replication lag". Use --method direct to measure true replication lag.
Elasticsearch uses synchronous replication by default, which means:
- Write operations wait for replica acknowledgment
- Replication happens immediately after primary write
- No data loss risk
Configuration (already default):
{
"index": {
"number_of_replicas": 1
}
}
The transaction log (translog) controls durability and replication:
{
"index": {
"translog": {
"durability": "request", // Wait for translog sync (default: "request")
"sync_interval": "5s", // Sync translog every 5s (default: 5s)
"flush_threshold_size": "512mb" // Flush when translog reaches this size
}
}
}
For lower replication lag:
durability: "request"- Ensures immediate replication (default, good)- Smaller
sync_interval- More frequent syncs (but higher I/O) - Smaller
flush_threshold_size- More frequent flushes
Control how often indices are refreshed (made searchable):
{
"index": {
"refresh_interval": "1s" // Refresh every 1 second (default: 1s)
}
}
For lower replication lag:
- Smaller refresh interval (e.g.,
"500ms"or"1s") - Trade-off: More frequent refreshes = higher CPU usage
{
"settings": {
"index": {
"number_of_replicas": 1,
"refresh_interval": "1s",
"translog": {
"durability": "request",
"sync_interval": "5s"
}
}
}
}
{
"persistent": {
"cluster.routing.allocation.cluster_concurrent_rebalance": 2,
"cluster.routing.allocation.node_concurrent_recoveries": 2
}
}
- How it works: Primary waits for replica acknowledgment before returning success
- Benefit: Ensures data is replicated immediately
- Trade-off: Slightly higher write latency, but guarantees consistency
- How it works: Transaction log ensures writes are durable before acknowledgment
- Benefit: Prevents data loss and ensures replication consistency
- Configuration:
durability: "request"(default) ensures immediate replication
- How it works: Controls how often new data becomes searchable
- Benefit: Frequent refreshes (1s) ensure low replication lag
- Trade-off: More CPU usage, but acceptable for most use cases
- How it works: Replica shards on different nodes in same cluster
- Benefit: Low network latency between nodes
- Cloud deployment: Elastic Cloud nodes are in same region, minimizing latency
- How it works: Primary and replica shards on different nodes
- Benefit: Parallel replication across multiple nodes
- Configuration: 3-node cluster ensures primary and replicas are on different nodes
Elasticsearch uses synchronous replication, so replication completes during the write operation:
- P50: ~10-50 ms ✅ (well below 100ms target)
- P95: ~50-100 ms ✅ (well below 500ms target)
- P99: ~100-200 ms
This includes refresh time, which is separate from replication:
- P50: ~200-300 ms (includes refresh overhead)
- P95: ~300-500 ms
- P99: ~500-1000 ms
Note: Replication lag (data on replica shards) is different from searchability lag (when data becomes searchable). The target metrics refer to replication lag, not searchability lag.
# Check index settings
curl "http://localhost:9200/phish-us/_settings?pretty"
# Check cluster settings
curl "http://localhost:9200/_cluster/settings?pretty"
# Update refresh interval
curl -X PUT "http://localhost:9200/phish-*/_settings" \
-H "Content-Type: application/json" \
-d '{
"index": {
"refresh_interval": "1s"
}
}'
python3 scripts/measure_replication_lag.py --samples 200
If P95 > 500ms:
- Reduce refresh interval to
"500ms"(if CPU allows) - Check network latency between nodes
- Verify translog settings
- Check for cluster resource constraints
# Check index stats
curl "http://localhost:9200/phish-us/_stats?pretty"
# Check shard stats
curl "http://localhost:9200/phish-us/_stats/level=shards?pretty"
Run periodic measurements:
# Every 5 minutes
while true; do
python3 scripts/measure_replication_lag.py --samples 50 --output lag_$(date +%s).json
sleep 300
done
Possible Causes:
-
Network latency: Nodes too far apart
- Solution: Ensure nodes in same region/datacenter
-
High cluster load: Too many concurrent operations
- Solution: Increase cluster resources or reduce load
-
Slow disk I/O: Replica writes are slow
- Solution: Use faster storage (SSD), check disk health
-
Large documents: Big documents take longer to replicate
- Solution: Optimize document size, use compression
-
Too many replicas: More replicas = more replication work
- Solution: Use 1 replica for most cases (balance vs. redundancy)
# Check cluster health
curl "http://localhost:9200/_cluster/health?pretty"
# Check node stats
curl "http://localhost:9200/_nodes/stats?pretty"
# Check index refresh stats
curl "http://localhost:9200/phish-us/_stats/refresh?pretty"
| Design Choice | How It Supports Low Replication Lag |
|---|---|
| Synchronous Replication | Immediate replication, no delay |
| Translog Durability | Ensures replication consistency |
| 1s Refresh Interval | Frequent refreshes make data searchable quickly |
| 3-Node Cluster | Primary and replicas on different nodes, parallel replication |
| Regional Distribution | Nodes in same region minimize network latency |
| Hash-Based Sharding | Even distribution prevents hot spots that slow replication |
With proper configuration:
- P50: 50-80 ms (well below 100ms target) ✅
- P95: 200-400 ms (well below 500ms target) ✅
- P99: 500-1000 ms
These targets are achievable with default Elasticsearch settings on a properly configured cluster.