You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: Add batch processing for Postgres sync optimization
Implements streaming batch processing to reduce database roundtrips from 50K-80K to ~4K-6K for large projects (10K files).
**Phase 1: Scan Optimization**
- Add entity_repository.get_by_file_paths_batch() for bulk entity fetching
- Reduces scan phase from N queries to 1 batched query
- Impact: 427 files scanned with 2 queries vs 427 before
**Phase 2: Batch Infrastructure**
- Add sync_batch_size config (default: 100 files per batch)
- Add chunks() utility for streaming batch processing
- Add entity_repository.upsert_entities() for bulk inserts/updates
- Add observation_repository.delete_by_entity_ids() for batch deletes
- Add relation_repository.delete_outgoing_relations_from_entities() for batch deletes
**Phase 3: Sync Phase Optimization**
- Add sync_markdown_batch() method with 3-phase processing:
1. Parse all files in batch (no DB operations)
2. Bulk upsert entities in single transaction
3. Post-process relations, checksums, search indexing per file
- Update new/modified file loops to use batch processing
- Add exception handling for circuit breaker and fatal errors
- Separate markdown/regular file processing in batches
**Test Updates**
- Update circuit breaker tests to work with batch architecture
- Change mocks from sync_markdown_file to sync_markdown_batch
- Update fatal error test to mock upsert_entities
- All circuit breaker tests passing (8/8)
**Expected Performance**
- Initial bulk import: ~10-15 queries/file (vs 43 before)
- Incremental sync: Massive scan improvement + batch upsert benefits
- Handles both new files and existing files efficiently
Addresses N+1 query patterns and transaction overhead with remote Postgres databases while maintaining circuit breaker functionality and proper error handling.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: src/basic_memory/config.py
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -132,6 +132,12 @@ class BasicMemoryConfig(BaseSettings):
132
132
gt=0,
133
133
)
134
134
135
+
sync_batch_size: int=Field(
136
+
default=100,
137
+
description="Number of files to process in a single database transaction during sync. Higher values improve performance with remote databases (Postgres) but increase memory usage. Typical values: 100 (conservative), 500 (balanced), 1000 (aggressive).",
138
+
gt=0,
139
+
)
140
+
135
141
kebab_filenames: bool=Field(
136
142
default=False,
137
143
description="Format for generated filenames. False preserves spaces and special chars, True converts them to hyphens for consistency with permalinks",
0 commit comments