Commit 2cee8eb
[Store] L2->L1 promotion-on-hit: reviewer feedback (orphan reaper, global cap, const, misconfig log)
Addresses reviewer feedback on PR kvcache-ai#2071.
Issue 1 -- orphaned PROCESSING MEMORY replica leak. The promotion task
reaper only dropped the source LOCAL_DISK refcnt and erased the task
entry; it never popped the staged PROCESSING MEMORY replica added by
PromotionAllocStart. That replica is not in shard->processing_keys, so
DiscardExpiredProcessingReplicas could not sweep it and the buffer
leaked until the object was removed or evicted. Fix: in the reaper,
when alloc_id != 0, call metadata.EraseReplicaByID(alloc_id) to pop
the staged replica and return its buffer to the allocator.
Issue 2 -- per-shard cap was wrong for skewed workloads. The old gate
was 'shard->size() * kNumShards >= limit', approximately right for
uniform workloads but ~1024x too eager on skewed workloads where hot
keys cluster in few shards. Replace with a cluster-wide
std::atomic<uint64_t> promotion_in_flight_ counter. Incremented in
TryPushPromotionQueue after successful emplace; decremented in
NotifyPromotionSuccess and in the reaper. memory_order_relaxed since
the value is advisory; the per-shard mutex already serializes inserts
within a shard and the dedup gate prevents duplicate work.
Issue 3 -- const_cast smell. The promotion_tasks map held
const PromotionTask values for "generic safety", forcing a
const_cast<PromotionTask&> in PromotionAllocStart to set alloc_id
under the shard write lock. Drop the const; PromotionAllocStart now
sets task_it->second.alloc_id = new_id directly.
Misconfig log -- emit LOG(WARNING) at startup when
config.promotion_on_hit=true but enable_offload=false. Promotion
requires offload to produce LOCAL_DISK replicas, so it is silently
disabled in that combination; the log makes the disablement
discoverable to operators.
Tests
-----
- New ReaperPopsStagedMemoryReplicaOnExpiry: regression for Issue 1.
Uses QuerySegments(seg).first (used bytes) to observe that the
staged PROCESSING MEMORY replica's buffer is freed back to the DRAM
allocator after the reaper sweeps the expired task.
- New QueueLimitRejectsCrossShard: regression for Issue 2. With
queue_limit=1, proves a second admission attempt on a *different
shard* is rejected -- exactly the case the old per-shard cap
admitted incorrectly.
- Updated comment on existing QueueLimitRejectsBeyondCap to reflect
the cluster-wide counter semantics.
Verification
------------
- promotion_on_hit_test: 15/15 pass (5 consecutive clean runs)
- file_storage_promotion_test: 9/9 pass
- master_service_promotion_test_for_snapshot: 5/5 pass
- offload_on_evict_test: 9/9 pass
- code_format.sh --base upstream/main (clang-format-20 in container):
3 files reformatted (master_service.h, master_service.cpp,
promotion_on_hit_test.cpp); all others "Already formatted"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent e0490e9 commit 2cee8eb
3 files changed
Lines changed: 194 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
902 | 902 | | |
903 | 903 | | |
904 | 904 | | |
905 | | - | |
| 905 | + | |
906 | 906 | | |
907 | 907 | | |
908 | 908 | | |
| |||
1248 | 1248 | | |
1249 | 1249 | | |
1250 | 1250 | | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
1251 | 1263 | | |
1252 | 1264 | | |
1253 | 1265 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
188 | 195 | | |
189 | 196 | | |
190 | 197 | | |
| |||
2404 | 2411 | | |
2405 | 2412 | | |
2406 | 2413 | | |
2407 | | - | |
2408 | | - | |
2409 | | - | |
2410 | | - | |
2411 | | - | |
| 2414 | + | |
| 2415 | + | |
| 2416 | + | |
| 2417 | + | |
| 2418 | + | |
| 2419 | + | |
| 2420 | + | |
| 2421 | + | |
| 2422 | + | |
2412 | 2423 | | |
2413 | 2424 | | |
2414 | 2425 | | |
| |||
2443 | 2454 | | |
2444 | 2455 | | |
2445 | 2456 | | |
| 2457 | + | |
2446 | 2458 | | |
2447 | 2459 | | |
2448 | 2460 | | |
| |||
2527 | 2539 | | |
2528 | 2540 | | |
2529 | 2541 | | |
2530 | | - | |
2531 | | - | |
2532 | | - | |
2533 | | - | |
| 2542 | + | |
2534 | 2543 | | |
2535 | 2544 | | |
2536 | 2545 | | |
| |||
2571 | 2580 | | |
2572 | 2581 | | |
2573 | 2582 | | |
| 2583 | + | |
2574 | 2584 | | |
2575 | 2585 | | |
2576 | 2586 | | |
| |||
2765 | 2775 | | |
2766 | 2776 | | |
2767 | 2777 | | |
2768 | | - | |
2769 | | - | |
2770 | | - | |
2771 | | - | |
2772 | | - | |
2773 | | - | |
2774 | | - | |
| 2778 | + | |
| 2779 | + | |
| 2780 | + | |
| 2781 | + | |
| 2782 | + | |
| 2783 | + | |
| 2784 | + | |
| 2785 | + | |
| 2786 | + | |
| 2787 | + | |
| 2788 | + | |
| 2789 | + | |
| 2790 | + | |
| 2791 | + | |
| 2792 | + | |
| 2793 | + | |
| 2794 | + | |
2775 | 2795 | | |
2776 | 2796 | | |
2777 | 2797 | | |
| |||
2787 | 2807 | | |
2788 | 2808 | | |
2789 | 2809 | | |
| 2810 | + | |
| 2811 | + | |
| 2812 | + | |
2790 | 2813 | | |
2791 | 2814 | | |
2792 | 2815 | | |
| 2816 | + | |
2793 | 2817 | | |
2794 | 2818 | | |
2795 | 2819 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
559 | 559 | | |
560 | 560 | | |
561 | 561 | | |
562 | | - | |
563 | | - | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
564 | 565 | | |
565 | 566 | | |
566 | 567 | | |
| |||
569 | 570 | | |
570 | 571 | | |
571 | 572 | | |
572 | | - | |
573 | | - | |
| 573 | + | |
| 574 | + | |
574 | 575 | | |
575 | 576 | | |
576 | 577 | | |
| |||
650 | 651 | | |
651 | 652 | | |
652 | 653 | | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
653 | 790 | | |
654 | 791 | | |
655 | 792 | | |
| |||
0 commit comments