refactor: refactor instance managers and load balance policy.#53
refactor: refactor instance managers and load balance policy.#53magicheng0816 wants to merge 3 commits intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the InstanceMgr by decomposing it into three specialized components: InstanceTopologyImpl, InstanceMetricsImpl, and InstanceKVCache. This improves modularity and clarifies the responsibilities within the scheduler's management layer. I have identified two significant issues: the current implementation relies on InstanceMgr directly accessing and locking private mutexes of its components, which violates encapsulation, and the use of -1 as a sentinel value for an unsigned uint64_t index is non-portable and error-prone.
| std::shared_lock<std::shared_mutex> topo_lock(topology_impl_->cluster_mutex_); | ||
| std::shared_lock<std::shared_mutex> metrics_lock( | ||
| metrics_impl_->metrics_mutex_); |
There was a problem hiding this comment.
This implementation directly accesses and locks private mutexes (cluster_mutex_, metrics_mutex_) from child components (topology_impl_, metrics_impl_). While this works because InstanceMgr is a friend class, it breaks encapsulation and creates a strong, hidden coupling between InstanceMgr and the internal implementation of its components.
This makes the code harder to maintain and reason about. Future changes to the locking strategy in InstanceTopologyImpl or InstanceMetricsImpl could break InstanceMgr in non-obvious ways.
A cleaner approach would be for InstanceMgr to coordinate the components through their public interfaces. For example, the mutexes could be owned by InstanceMgr and passed by reference to the components' constructors. This would make the ownership and locking strategy explicit and centralized in the InstanceMgr facade.
| const std::string& name, | ||
| const InstanceMetaInfo& info) { | ||
| uint64_t index = info.instance_index; | ||
| if (index == static_cast<uint64_t>(-1)) return; |
There was a problem hiding this comment.
Using -1 as a sentinel value for an unsigned type like uint64_t is problematic. The conversion of -1 to an unsigned type is implementation-defined before C++20. While it might work on systems using two's complement arithmetic, it's not portable and can lead to subtle bugs.
It's better to use std::numeric_limits<uint64_t>::max() as the sentinel value. This requires including the <limits> header.
I recommend changing the check here, and also updating the default value for instance_index in InstanceMetaInfo (in common/types.h) to std::numeric_limits<uint64_t>::max() for consistency and correctness. You will need to #include <limits> in this file.
| if (index == static_cast<uint64_t>(-1)) return; | |
| if (index == std::numeric_limits<uint64_t>::max()) return; |
Currently, the responsibilities of the instance manager are too heavy, with increasing logic being added, and there is no clear boundary between various parts of the logic. Therefore, the instance manager needs to be split up:
instance_topology, responsible for instance registration and status maintenance; 2. instance_metrics, responsible for collecting and reporting instance metrics; 3. scheduling strategy, which is gradually being migrated to loadbalance_policy, but not completely yet, and requires further refactoring in the future.