Skip to content

feat: Add batched processing for SQLite converter to prevent OOM#78

Merged
taariq merged 1 commit into
mainfrom
fix/sqlite-batched-processing-77
Dec 11, 2025
Merged

feat: Add batched processing for SQLite converter to prevent OOM#78
taariq merged 1 commit into
mainfrom
fix/sqlite-batched-processing-77

Conversation

@taariq
Copy link
Copy Markdown
Contributor

@taariq taariq commented Dec 11, 2025

Summary

  • Adds memory-efficient batched processing for SQLite to PostgreSQL migration
  • Uses BatchedTableReader with rowid-based pagination to read tables in chunks
  • Automatically calculates optimal batch size based on available system memory
  • Prevents OOM errors when migrating large SQLite databases (7M+ rows)

Problem

The SQLite converter previously loaded all rows into memory before inserting to PostgreSQL. For large tables (e.g., 7.2M rows in a 2.3GB file), this required 8-16GB of RAM, making the tool unusable on typical machines.

Solution

Now processes rows in batches:

  1. Read batch from SQLite (default: auto-calculated based on memory)
  2. Convert batch to JSONB format
  3. Insert batch to PostgreSQL
  4. Repeat until all rows processed

Memory usage stays constant regardless of table size.

Changes

  • src/sqlite/reader.rs: Add BatchedTableReader struct and read_table_batch() function
  • src/sqlite/converter.rs: Add convert_table_batched() async function
  • src/commands/init.rs: Update init_sqlite_to_postgres() to use batched processing
  • Add comprehensive unit tests for batched reader

Testing

  • ✅ All unit tests pass (14 new tests for batched reader)
  • ✅ Clippy clean
  • ✅ Formatter clean

Fixes #77

The SQLite to PostgreSQL converter now processes rows in batches
instead of loading the entire table into memory. This enables
migration of large SQLite databases (7M+ rows) without running
out of memory.

Changes:
- Add BatchedTableReader struct for rowid-based pagination
- Add read_table_batch() function for memory-efficient reading
- Add convert_table_batched() async function that reads, converts,
  and inserts in batches
- Update init_sqlite_to_postgres() to use batched processing
- Use calculate_optimal_batch_size() for memory-based batch sizing
- Add comprehensive tests for batched reader functionality

Memory usage now stays constant regardless of table size. Batch size
automatically adjusts based on available system memory (25% of free
RAM, clamped between 1,000 and 50,000 rows).

Fixes #77
@taariq taariq force-pushed the fix/sqlite-batched-processing-77 branch from b74b840 to a9f466d Compare December 11, 2025 21:27
@taariq taariq merged commit 6ab5d12 into main Dec 11, 2025
7 checks passed
@taariq taariq deleted the fix/sqlite-batched-processing-77 branch December 11, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQLite converter loads all rows into memory, causing OOM on large tables

1 participant