Skip to content

fix: Memory-efficient sync and reconciliation for large tables#76

Merged
taariq merged 5 commits into
mainfrom
fix/memory-efficient-sync-reconciler
Dec 10, 2025
Merged

fix: Memory-efficient sync and reconciliation for large tables#76
taariq merged 5 commits into
mainfrom
fix/memory-efficient-sync-reconciler

Conversation

@taariq
Copy link
Copy Markdown
Contributor

@taariq taariq commented Dec 10, 2025

Summary

This PR fixes critical memory and timeout issues in the xmin-based sync daemon that were causing failures on tables with millions of rows:

  • Sync daemon: Was loading ALL rows into memory before processing, causing 10GB+ memory usage and connection timeouts
  • Reconciler: Was loading ALL primary keys from both databases into memory (~2.4GB for 14M row tables)

Changes

  1. Batched sync processing (sync_table)

    • Use existing read_changes_batched() + fetch_batch() instead of loading all rows
    • Process and write each batch immediately (memory = O(batch_size))
    • Update sync state after each batch for resume capability
    • Add progress logging every 10 batches
  2. Auto-detect batch size based on available memory

    • Cross-platform memory detection (Linux, macOS, Windows)
    • Calculate optimal batch size using 25% of available memory
    • Range: 1K-50K rows, default 10K if detection fails
    • Enables same code to work on t3.nano (512MB) through r6i.24xlarge (768GB)
  3. Batched reconciliation (reconcile_table_batched)

    • Implement merge-join comparison on sorted primary keys
    • Use keyset pagination (WHERE pk > last_pk) for efficient batching
    • Fetch PKs in batches from both source and target
    • Delete orphans in batches as discovered
    • Progress logging every 100K comparisons

Memory Impact

Operation Before After
Sync (14M rows) ~10 GB ~20 MB
Reconciliation (14M rows) ~2.4 GB ~20 MB

Testing

  • All 228 unit tests pass
  • Clippy passes with no warnings
  • Manual testing recommended on production-scale tables

Test plan

  • Unit tests pass
  • Clippy lints pass
  • Code formatted with cargo fmt
  • Manual test: Sync table with millions of rows
  • Manual test: Reconcile table with millions of rows
  • Verify memory stays bounded on t3.nano instance

Closes #74
Closes #75

The sync_table method was loading entire tables into memory before
processing, causing:
- 10GB+ memory usage for tables with millions of rows
- Connection timeouts when queries exceeded ELB idle timeouts
- Failed syncs with "connection closed" errors

Changes:
- Use existing batched reader (read_changes_batched + fetch_batch)
  instead of loading all rows at once
- Process and write each batch immediately (memory = O(batch_size))
- Update sync state after each batch for resume capability
- Add progress logging every 10 batches
- Increase default batch_size from 1000 to 10000 for better throughput
- Check for xmin wraparound at start rather than during read

This reduces memory from O(total_rows) to O(batch_size), enabling
sync of tables with millions of rows without OOM or timeouts.

Closes #74
Add cross-platform memory detection and automatic batch size calculation
to prevent OOM on small instances while maximizing throughput on larger ones.

New functions in utils.rs:
- get_available_memory(): Cross-platform (Linux, macOS, Windows)
  - Linux: Reads MemAvailable from /proc/meminfo
  - macOS: Uses sysctl + vm_stat for free/inactive pages
  - Windows: Uses GlobalMemoryStatusEx Win32 API
- calculate_optimal_batch_size(): Auto-calculates based on memory
  - Uses 25% of available memory as working budget
  - Assumes 2KB per row (conservative estimate)
  - Clamps between 1,000 and 50,000 rows

Expected batch sizes by instance type:
- t3.nano (512MB): ~1,000 rows
- t3.small (2GB): ~10,000 rows
- t3.large (8GB+): 50,000 rows (capped)

Refs #74
The reconciler was loading ALL primary keys from both source and target
tables into memory before comparing them. For tables with millions of
rows (e.g., 14M rows), this caused:
- 2-3 GB memory usage just for PKs
- Potential OOM on memory-constrained instances
- Connection timeouts during long-running PK fetch queries

Changes:
- Add reconcile_table_batched() using merge-join comparison
- Implement PkBatchReader with keyset pagination (WHERE pk > last_pk)
- Fetch PKs in sorted batches from both databases
- Compare using single-pass merge-join (both streams sorted)
- Delete orphans in batches as they're discovered
- Add progress logging every 100K comparisons

This reduces memory from O(total_rows) to O(batch_size), enabling
reconciliation of tables with millions of rows without OOM.

Closes #75
This commit fixes critical correctness issues identified in PR #76 review:

## Critical Fix 1: xmin batching skipping rows with same xmin

The batched xmin reader was using `WHERE xmin > $1` which skips rows
when multiple rows share the same xmin (bulk inserts in single transaction).

Fix: Use (xmin, ctid) as compound pagination key. ctid provides a stable
tie-breaker for rows with identical xmin values.

- Add `last_ctid` field to BatchReader
- Use `WHERE (xmin, ctid) > ($1, $2::tid)` for subsequent batches
- Include `ctid::text` in SELECT and ORDER BY

## Critical Fix 2: Reconciler PK ordering mismatch

PKs were cast to ::text in SELECT but ORDER BY used native column types.
For numeric PKs: "10" < "2" lexicographically but 10 > 2 numerically.
This caused false orphan detection and data loss.

Fix: Use ::text cast in both SELECT and ORDER BY to ensure SQL stream
order matches Rust's lexicographic string comparison.

- Change ORDER BY from `"col"` to `"col"::text`
- Change WHERE from `"col" > $1` to `"col"::text > $1`

## Moderate Fix: macOS page size detection

Apple Silicon uses 16KB pages, not 4KB. Hardcoded 4KB underestimated
available memory by 4x, leading to unnecessarily small batch sizes.

Fix: Use `sysctl hw.pagesize` to get actual page size.
@taariq
Copy link
Copy Markdown
Contributor Author

taariq commented Dec 10, 2025

Review Fixes Applied

Addressed all findings from the code review:

Critical Fixes

1. xmin batching now handles duplicate xmin values (reader.rs)

  • Added ctid as tie-breaker for pagination
  • Uses WHERE (xmin, ctid) > ($1, $2::tid) for subsequent batches
  • Prevents skipping rows when bulk inserts share the same xmin

2. Reconciler PK ordering now consistent (reconciler.rs)

  • Both SELECT and ORDER BY now use ::text cast
  • SQL stream order matches Rust lexicographic comparison
  • Prevents false orphan detection for numeric PKs

Moderate Fix

3. macOS page size detection (utils.rs)

  • Now uses sysctl hw.pagesize instead of hardcoded 4KB
  • Apple Silicon machines correctly report 16KB pages
  • Batch sizes now accurate on M1/M2/M3 Macs

All 228 tests pass, clippy clean.

@taariq taariq merged commit e4f63d1 into main Dec 10, 2025
7 checks passed
@taariq taariq deleted the fix/memory-efficient-sync-reconciler branch December 10, 2025 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reconciler loads all PKs into memory causing OOM on large tables sync command loads entire table into memory causing OOM and connection timeouts

1 participant