[Enhancement](ms) Add sharded LRU cache for tablet index metadata to reduce FDB IO#61666
[Enhancement](ms) Add sharded LRU cache for tablet index metadata to reduce FDB IO#61666wyxxxcat wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
4311778 to
6b47a05
Compare
|
run buildall |
3065073 to
e15e9bc
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
Code Review Summary
This PR adds a sharded LRU cache for meta_tablet_idx_key lookups in both MetaService and Recycler, and also refactors the LRU cache from be/src/util/ to common/cpp/ to share it with the Cloud module. The refactoring to separate metrics from the core cache is well-structured.
However, I found several issues ranging from correctness to performance concerns.
Critical Checkpoint Conclusions
1. Goal and Correctness: The goal is to cache immutable tablet index metadata to reduce FDB reads. The core caching logic works, but there are missing cache invalidation paths in the MetaService (see below), and the LOG(INFO) on every cache hit is a performance anti-pattern that will flood logs.
2. Minimal and Focused: The PR mixes two independent concerns: (a) LRU cache refactoring from BE to common/cpp with metrics extraction, and (b) adding KvCache for cloud. These should ideally be separate PRs, but this is a style concern.
3. Concurrency: The ShardedLRUCache handles concurrency via per-shard mutexes. The g_ms_cache_manager and g_recycler_cache_manager globals are initialized once and read concurrently thereafter (the cache methods are thread-safe). No concurrency issues found.
4. Lifecycle Management: Both g_ms_cache_manager and g_recycler_cache_manager are allocated with new but never deleted. This is an intentional process-lifetime pattern but creates issues: (a) If MetaServiceImpl is constructed multiple times (e.g., in tests), the old cache pointer leaks. (b) ASAN/LeakSanitizer will flag these.
5. Configuration: New configs (enable_tablet_index_cache, ms_tablet_index_cache_capacity, etc.) are CONF_mBool/CONF_mInt64 (mutable), but the cache is only created once at startup. Changing these configs at runtime has no effect - the cache won't be resized or recreated. Either make them non-mutable or implement runtime reconfiguration.
6. Incompatible Changes: The LRU cache insert() now requires an explicit CacheValueDeleter parameter where it previously defaulted to nullptr. All existing callers are updated. No incompatibility concern.
7. Parallel Code Paths: The MS cache lacks invalidation in repair_tablet_index (meta_service_txn.cpp:2150) and fix_tablet_db_id (meta_service.cpp:5719) which modify TabletIndexPB.db_id via txn->put() without invalidating the cache.
8. Test Coverage: Unit tests cover basic get/put, eviction, invalidation, clear, and concurrent access. Missing: TTL expiration test, test for the key encoding edge cases, integration test verifying cache correctness with actual MS operations.
9. Observability: Cache hit/miss metrics are not added for the cloud KvCache (only for BE LRU caches via BELRUCacheMetricsRecorder). The LOG(INFO) on every cache hit is not a substitute for proper metrics.
10. Performance: The LOG(INFO) on every cache hit is the most significant performance issue - this function is called on hot paths like commit_txn, and the whole point is to reduce overhead.
11. Missing blank line between init_recycler_cache() closing brace and namespace config { in recycler/util.cpp.
|
run buildall |
TPC-H: Total hot run time: 26844 ms |
TPC-DS: Total hot run time: 169470 ms |
|
run p0 |
d70b6c8 to
4c0db13
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 26581 ms |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29461 ms |
TPC-DS: Total hot run time: 180413 ms |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29454 ms |
TPC-DS: Total hot run time: 180364 ms |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29474 ms |
TPC-DS: Total hot run time: 181857 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
cloud/src/common/kv_cache.h
Outdated
| _cache->release(handle); | ||
| _cache->erase(CacheKey(encoded_key)); |
There was a problem hiding this comment.
After releasing and then erasing here, is there a possibility of a race condition?
For example, multiple threads might obtain the same handle, all determine that the TTL has expired, and then attempt to erase simultaneously.
If, during this process, a thread executes a put operation, it could be erased unexpectedly. This doesn’t seem like a critical issue, but perhaps we can explore a better solution.
There was a problem hiding this comment.
The solution is to release the data and return false immediately upon detecting an expired TTL during a get op instead of deleting it. This continues until a subsequent put operation evicts the data
61ab109 to
913d227
Compare
|
run buildall |
11c72dc to
f56e0b0
Compare
|
run buildall |
Summary
This commit mainly does three things:
lru_cachefrombe/src/util/tocommon/cpp/so Cloud components can reuse the sameShardedLRUCacheimplementation.LRUCachePolicyinto a unified cache metrics recorder, including capacity, usage, hit ratio, lookup/hit/miss, and stampede metrics.MetaServiceandRecycler, introduce capacity and TTL configs, invalidate cache entries when tablet index records are removed, and add Cloud-side unit tests.What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary:
Cloud tablet index lookups in meta service and recycler repeatedly read from txn kv, while
the LRU cache implementation was only available under BE utility code and could not be reused
directly by cloud components. This change moves the shared LRU cache into common code, adds
a cloud KV cache wrapper for tablet index metadata with configurable capacity and TTL, wires
cache invalidation into recycler cleanup, and hooks BE LRU caches into unified cache metrics.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)