Skip to content

fix: Use filesystem timestamps for entity sync instead of database operation time (#138)#369

Merged
phernandez merged 1 commit intomainfrom
fix/sync-file-timestamps-138
Oct 16, 2025
Merged

fix: Use filesystem timestamps for entity sync instead of database operation time (#138)#369
phernandez merged 1 commit intomainfrom
fix/sync-file-timestamps-138

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

Fixes issue #138 from basic-memory-cloud where modified times were showing as the database creation date in the cloud instead of actual file modification times.

Root Cause

The sync service was using datetime.now() (database operation time) instead of capturing filesystem timestamps when syncing files. This affected:

  1. Markdown files: Never captured file timestamps during sync
  2. Regular files: Only captured timestamps for NEW files, not when updating existing files
  3. Both file types: updated_at wasn't refreshed with the file's actual modification time when re-syncing modified files

Changes Made

src/basic_memory/sync/sync_service.py:

  • Added file timestamp capture using st_mtime and st_ctime for markdown files
  • Added file timestamp capture for regular files when syncing existing files
  • Updated entity created_at and updated_at fields with filesystem timestamps

tests/sync/test_sync_service.py:

  • Added comprehensive test test_sync_updates_timestamps_on_file_modification that validates:
    • Creating a file and initial sync
    • Modifying the file content
    • Re-syncing the modified file
    • Verifying updated_at matches file's modification time, not db operation time

Impact

This fix ensures that temporal ordering in search and recent_activity queries accurately reflects when files were actually modified on the filesystem, not when they were processed by the database.

Test Plan

  • All 36 sync tests pass (34 passed, 2 skipped)
  • New test specifically validates the fix for modified files
  • Existing test test_sync_preserves_timestamps continues to pass (validates new files)

🤖 Generated with Claude Code

…eration time (#138)

Modified sync_service.py to capture and use actual file modification times
(st_mtime) and creation times (st_ctime) from the filesystem instead of
using datetime.now() when syncing entities. This ensures that the updated_at
and created_at fields in the database reflect when files were actually
modified on disk, not when the database operation occurred.

Changes:
- sync_markdown_file: Capture file timestamps and update entity with them
- sync_regular_file: Capture file mtime when updating existing files
- Added comprehensive test to validate timestamp preservation on file modification

This fix ensures accurate temporal ordering in search and recent_activity
queries, resolving the issue where modified times showed as db creation
dates in the cloud.

Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 16, 2025

Claude finished @phernandez's task —— View job


Code Review Complete ✅

I've completed a thorough review of PR #369 against the Basic Memory team checklist. Here are my findings:


✅ Code Quality & Standards

Passes all requirements:

  • Follows Basic Memory coding conventions per CLAUDE.md
  • Python 3.12+ type annotations present (datetime, Path, etc.)
  • Async patterns with SQLAlchemy 2.0 maintained
  • 100-character line length maintained
  • No commented-out code blocks
  • Clear variable naming (file_stats, created, modified)

✅ Testing & Documentation

Excellent test coverage:

  • New comprehensive test test_sync_updates_timestamps_on_file_modification (lines 648-730) validates:
    • Initial file creation and sync
    • File modification detection
    • Timestamp preservation from filesystem
    • Cross-platform handling (Windows tolerance)
  • Test directly addresses the root cause described in the PR
  • Existing test test_sync_preserves_timestamps continues to validate new file timestamps
  • All 36 sync tests passing confirms no regressions

Documentation:

  • Clear inline comments explaining the "why" (# This ensures temporal ordering in search...)
  • Commit message is detailed and explains the fix thoroughly
  • PR description clearly documents root cause and impact

✅ Basic Memory Architecture

Maintains architectural integrity:

  • Preserves local-first architecture (files remain source of truth)
  • Uses repository pattern correctly (entity_repository.update())
  • Maintains SQLite consistency with filesystem state
  • No changes to MCP tools or API layers (service-level fix only)

Implementation quality:

  • Markdown files (lines 550-597): Captures timestamps at start of sync, applies after checksum computation
  • Regular files - NEW (lines 623-626): Already captured timestamps correctly ✅
  • Regular files - EXISTING (lines 670-682): Now captures mtime when updating (this was the bug!)

✅ Security & Performance

No security concerns:

  • No hardcoded secrets or credentials
  • Uses standard library os.stat() for filesystem operations
  • Proper error handling maintained
  • No sensitive data exposure

Performance considerations:

  • Minimal overhead: 2 extra stat() calls per file sync (negligible)
  • file_service.file_stats() is efficient wrapper around Path.stat()
  • No additional I/O beyond what's already required
  • Timezone-aware timestamps using .astimezone() prevents DST issues

🎯 Specific Code Review Comments

src/basic_memory/sync/sync_service.py

Lines 550-553 - Excellent placement:

# Get file timestamps for tracking modification times
file_stats = self.file_service.file_stats(path)
created = datetime.fromtimestamp(file_stats.st_ctime).astimezone()
modified = datetime.fromtimestamp(file_stats.st_mtime).astimezone()

✅ Good: Captures timestamps early, before any file modifications
✅ Good: Uses .astimezone() for timezone-aware timestamps
✅ Good: Clear variable names

Lines 593-597 - Critical fix for markdown files:

# Update checksum and timestamps from file system
# This ensures temporal ordering in search and recent activity uses actual file modification times
await self.entity_repository.update(
    entity.id, {"checksum": final_checksum, "created_at": created, "updated_at": modified}
)

✅ Good: Updates both created_at and updated_at consistently
✅ Good: Clear comment explains the "why"
✅ Good: Applied after final checksum computation (correct ordering)

Lines 670-682 - Critical fix for regular files (was missing!):

# Get file timestamps for updating modification time
file_stats = self.file_service.file_stats(path)
modified = datetime.fromtimestamp(file_stats.st_mtime).astimezone()

# Update checksum and modification time from file system
updated = await self.entity_repository.update(
    entity.id, {"file_path": path, "checksum": checksum, "updated_at": modified}
)

✅ Good: Now captures filesystem mtime when updating existing files
✅ Good: Only updates modified time (not created_at) for existing files
✅ Good: Consistent pattern with markdown file handling

tests/sync/test_sync_service.py

Lines 648-730 - Comprehensive test:
✅ Good: Tests the exact scenario from issue #138
✅ Good: Validates timestamps match filesystem, not db operation time
✅ Good: Cross-platform support with Windows tolerance
✅ Good: Validates content was actually updated (not just timestamps)

tests/markdown/test_entity_parser_error_handling.py

Lines 1-2 - Minor cleanup (unrelated):

-import pytest
from textwrap import dedent
+import pytest

✅ Good: Import ordering fix (consistent with style guide)
ℹ️ Note: This appears to be an unrelated formatting fix (acceptable)


📋 Checklist Summary

Code Quality & Standards

  • ✅ Follows Basic Memory's coding conventions in CLAUDE.md
  • ✅ Python 3.12+ type annotations and async patterns
  • ✅ SQLAlchemy 2.0 best practices
  • ✅ FastAPI and Typer conventions followed (N/A - service layer only)
  • ✅ 100-character line length limit maintained
  • ✅ No commented-out code blocks

Testing & Documentation

  • ✅ Unit tests for new functions/methods
  • ✅ Integration tests for new MCP tools (N/A - no new tools)
  • ✅ Test coverage for edge cases
  • ✅ Documentation updated (inline comments, commit message)
  • ⚠️ CLAUDE.md updated if conventions change (N/A - no convention changes)

Basic Memory Architecture

  • ✅ MCP tools follow atomic, composable design (N/A - no MCP changes)
  • ✅ Database changes include Alembic migrations (N/A - no schema changes)
  • ✅ Preserves local-first architecture principles
  • ✅ Knowledge graph operations maintain consistency
  • ✅ Markdown file handling preserves integrity
  • ✅ AI-human collaboration patterns followed

Security & Performance

  • ✅ No hardcoded secrets or credentials
  • ✅ Input validation for MCP tools (N/A - no MCP changes)
  • ✅ Proper error handling and logging
  • ✅ Performance considerations addressed
  • ✅ No sensitive data in logs or commits

🎉 Final Assessment

APPROVED

This is an excellent fix that directly addresses issue #138. The implementation is:

  • Correct: Uses filesystem timestamps instead of datetime.now()
  • Complete: Handles both markdown and regular files, both new and existing
  • Well-tested: Comprehensive test validates the exact bug scenario
  • Well-documented: Clear comments and thorough PR description
  • Backward-compatible: No breaking changes, pure bug fix

The code quality is high, follows all project conventions, and the test coverage is thorough. No issues found.

Recommendation: Ready to merge 🚀

--- | Branch: fix/sync-file-timestamps-138

@phernandez phernandez merged commit b7497d7 into main Oct 16, 2025
16 checks passed
@phernandez phernandez deleted the fix/sync-file-timestamps-138 branch October 16, 2025 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant