Commit 7cd377c
Brian Sam-Bodden
fix(checkpoint): make TTL operations best-effort to prevent write loss (#174)
* fix(checkpoint): make all TTL operations best-effort to prevent write loss
EXPIRE commands mixed into critical write pipelines caused complete
pipeline failure on Redis Enterprise proxy when TTL commands were routed
to different shards than the JSON module commands. This lost interrupt
writes silently, breaking human-in-the-loop workflows.
Changes:
- Remove EXPIRE from all put_writes pipelines (aio, sync, shallow)
- Use raise_on_error=False for pipeline execution with per-result
inspection to preserve JSON.MERGE fallback logic
- Apply TTL individually per key after pipeline success, wrapped in
try/except so failures only log warnings
- Wrap all remaining bare EXPIRE/PERSIST calls across the codebase
(put, get_tuple, key_registry, _apply_ttl_to_keys) in try/except
- Simplify _apply_ttl_to_keys to use individual calls instead of
building EXPIRE-only pipelines that share the same failure mode
* test(checkpoint): add HitL+TTL regression tests and update mocked TTL tests
- Add test_hitl_ttl_regression.py with 9 tests covering:
- HitL+TTL integration guards on real Redis (5 tests)
- Pipeline no longer contains EXPIRE commands
- Writes survive simulated TTL failure
- TTL failure logs warning via caplog
- TTL applied separately after pipeline succeeds
- Update test_base_client_info_and_ttl.py to expect individual expire
calls instead of pipeline-based TTL
- Update test_cluster_mode.py to remove cluster/non-cluster branching
since both modes now use individual expire calls
* fix(checkpoint): address PR review comments on TTL warning diagnostics
- Add exc_info=True and key names to all TTL warning logs for
diagnosability (aio.py, shallow.py, ashallow.py, key_registry.py)
- Remove redundant outer try/except around _key_registry.apply_ttl()
in __init__.py since it's already best-effort internally
- Remove dead cluster_mode variable in base.py _apply_ttl_to_keys
- Remove unused imports in test_hitl_ttl_regression.py
- Remove xfail marker from test_sync_hitl_with_ttl (was masking a
local environment issue, not a real bug)
* fix(serializer): support both old and new Interrupt field layouts
The Interrupt dataclass changed between langgraph versions:
- langgraph <=1.0.x: Interrupt(value, id)
- langgraph >=1.1.x: Interrupt(value, resumable, ns, when)
The serializer now introspects the installed Interrupt class at runtime
via dataclasses.fields() and serializes/deserializes whichever fields
are present. This ensures checkpoints written by either version can be
read by either version without data loss.
* fix(test): use state.next for sync interrupt detection
Sync get_state() in langgraph 1.0.x does not populate
task.interrupts (only aget_state does). Use state.next to
verify the graph is paused at the interrupt node instead.
Also reorder the sync test before async tests to avoid any
potential event loop contamination.1 parent 5832fe5 commit 7cd377c
11 files changed
Lines changed: 1113 additions & 383 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
581 | 581 | | |
582 | 582 | | |
583 | 583 | | |
584 | | - | |
| 584 | + | |
585 | 585 | | |
586 | | - | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
587 | 593 | | |
588 | 594 | | |
589 | 595 | | |
| |||
618 | 624 | | |
619 | 625 | | |
620 | 626 | | |
621 | | - | |
622 | | - | |
623 | | - | |
624 | | - | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
625 | 635 | | |
| 636 | + | |
626 | 637 | | |
627 | 638 | | |
628 | 639 | | |
| |||
635 | 646 | | |
636 | 647 | | |
637 | 648 | | |
638 | | - | |
639 | 649 | | |
640 | | - | |
641 | | - | |
642 | | - | |
643 | | - | |
644 | | - | |
645 | | - | |
646 | | - | |
| 650 | + | |
647 | 651 | | |
648 | 652 | | |
649 | 653 | | |
650 | 654 | | |
651 | | - | |
652 | 655 | | |
653 | 656 | | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
654 | 671 | | |
655 | | - | |
656 | | - | |
657 | | - | |
658 | | - | |
659 | | - | |
660 | | - | |
661 | | - | |
662 | | - | |
663 | | - | |
664 | | - | |
665 | | - | |
666 | | - | |
667 | | - | |
668 | | - | |
669 | | - | |
670 | | - | |
671 | | - | |
672 | | - | |
673 | | - | |
674 | | - | |
675 | | - | |
676 | | - | |
677 | | - | |
678 | | - | |
679 | | - | |
680 | | - | |
681 | | - | |
682 | | - | |
683 | | - | |
684 | | - | |
685 | | - | |
686 | | - | |
687 | | - | |
688 | | - | |
689 | | - | |
690 | | - | |
691 | | - | |
692 | | - | |
693 | | - | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
698 | | - | |
699 | | - | |
700 | | - | |
701 | | - | |
702 | | - | |
703 | | - | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
704 | 693 | | |
705 | 694 | | |
706 | 695 | | |
707 | 696 | | |
708 | 697 | | |
709 | 698 | | |
710 | 699 | | |
711 | | - | |
| 700 | + | |
712 | 701 | | |
713 | 702 | | |
714 | 703 | | |
| |||
0 commit comments