Skip to content

Fix/checkpoint lock reliability#44

Merged
beersoccer merged 4 commits into
mainfrom
fix/checkpoint-lock-reliability
Apr 21, 2026
Merged

Fix/checkpoint lock reliability#44
beersoccer merged 4 commits into
mainfrom
fix/checkpoint-lock-reliability

Conversation

@beersoccer
Copy link
Copy Markdown
Owner

No description provided.

龚震宇 and others added 4 commits April 21, 2026 15:45
- Fix P0: AsyncCheckpointManager.load() now restores resume_conversation_cursor,
  resume_run_at, and resume_start_time fields (previously omitted, breaking
  async-mode resume after max_conversations_reached)

- Fix P1: Checkpoint save() now uses add-first-then-delete order so a failed
  add never silently destroys the existing checkpoint; temporary duplicate is
  handled safely by load() which picks the newest entry

- Fix P1: Distributed lock acquire_lock() adds read-after-write verification
  after persisting; a new _load_all_locks() method (limit=5) re-reads all
  active locks and the earliest acquired_at wins, with the loser self-deleting

- Fix P2: forget_memories now cleans up expired distributed lock records via
  _clean_expired_locks(); result payload gains a locks_cleaned field

Tests: 28 new unit tests covering all four fixes; 355 total unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, and rerankers

89 parametrized tests verifying canonical provider name strings and critical
config fields across all mainstream mem0 providers. Catches provider name drift
during version upgrades before it reaches production.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove mistral from LLM_CONFIGS (not in mem0 LlmFactory registry)
- Add TestProviderNamesInMem0Registry: 28 parametrized tests that
  cross-check every provider name against mem0's live factory maps,
  failing immediately on mem0 upgrades that drop or rename a provider

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llution

In pytest-asyncio AUTO mode, CPython's C-level _get_running_loop() can
return the session's running loop even inside a freshly spawned thread,
because the C TSS value is inherited from the parent thread. This caused
run_until_complete() to fail with "Cannot run the event loop while another
loop is running" when the full test suite ran test_extraction_async.py and
test_bg_task_tracking.py before the async checkpoint tests.

Fix:
- Replace @pytest.mark.asyncio + @pytest.mark.forked (caused pytest-forked
  teardown ERROR on subsequent tests) with plain sync def tests
- Add _run_async() helper that spawns a dedicated thread and explicitly
  clears the inherited running-loop via asyncio.events._set_running_loop(None)
  before creating a fresh event loop, fully isolating from the test session

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@beersoccer beersoccer merged commit f55a5eb into main Apr 21, 2026
4 checks passed
@beersoccer beersoccer deleted the fix/checkpoint-lock-reliability branch April 21, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant