diff --git a/docs/features/global_cache_pooling.md b/docs/features/global_cache_pooling.md index 72314567393..aee31b36f0d 100644 --- a/docs/features/global_cache_pooling.md +++ b/docs/features/global_cache_pooling.md @@ -48,7 +48,8 @@ Ready-to-use example scripts are available in [examples/cache_storage/](../../.. |--------|----------|-------------| | `run.sh` | Multi-Instance | Two standalone instances sharing cache | | `run_03b_pd_storage.sh` | PD Disaggregation | P+D instances with global cache pooling | -| `run_ha.sh` | High Availability | etcd + multi-master leader election, verifies failover after killing the leader | +| `run_ha.sh` | High Availability (etcd) | etcd + multi-master leader election, verifies failover after killing the leader | +| `run_ha_redis.sh` | High Availability (redis) | single redis + multi-master leader election, verifies failover after killing the leader | ## Prerequisites @@ -287,14 +288,19 @@ curl -X POST "http://0.0.0.0:52700/v1/chat/completions" \ ### Scenario 3: High-Availability (HA) Deployment -A single master is a single point of failure; if it crashes, cluster operations pause. For production, use the **etcd + multi-master** mode: multiple `mooncake_master` instances perform leader election through etcd. When the leader fails, a standby is automatically re-elected, transparently to clients. +A single master is a single point of failure; if it crashes, cluster operations pause. For production, run multiple `mooncake_master` instances that perform leader election through a coordination backend. When the leader fails, a standby is automatically re-elected, transparently to clients. + +Two coordination backends are supported: + +- **etcd** (`run_ha.sh`): a 3-node etcd cluster does election and metadata storage. +- **redis** (`run_ha_redis.sh`): a single redis instance does lease-based election. Use this to avoid introducing etcd as an extra component. **Architecture:** ``` ┌──────────────────────────────────────┐ - │ etcd cluster (3 nodes) │ - │ leader election / metadata store │ + │ coordination backend (etcd / redis) │ + │ leader election (master_view) │ └───────────────────┬──────────────────┘ │ election (master_view) ┌─────────────────────┼─────────────────────┐ @@ -304,47 +310,30 @@ A single master is a single point of failure; if it crashes, cluster operations │ rpc:8081 │ │ rpc:8082 │ │ rpc:8083 │ │ (leader) │ │ (standby) │ │ (standby) │ └──────┬──────┘ └─────────────┘ └─────────────┘ - │ FastDeploy clients discover the current leader via etcd + │ FastDeploy clients discover the current leader via the backend ┌──────┴───────┐ ▼ ▼ server_0 server_1 ``` -#### Prerequisites - -**1. Install etcd** - -Download and extract etcd (v3.5.30 in this example), then add `etcd` / `etcdctl` to `PATH`: - -```bash -ETCD_VER=v3.5.30 -curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \ - -o etcd-${ETCD_VER}-linux-amd64.tar.gz -tar -xzf etcd-${ETCD_VER}-linux-amd64.tar.gz -export PATH=$PWD/etcd-${ETCD_VER}-linux-amd64:$PATH -etcd --version -``` +#### Build Mooncake from source -**2. Build Mooncake from source (with etcd support)** +HA mode requires Mooncake built with the matching backend enabled: -HA mode requires Mooncake built with etcd support (`-DSTORE_USE_ETCD=ON -DUSE_ETCD=ON`). Install dependencies first, then build: +- etcd: `-DSTORE_USE_ETCD=ON -DUSE_ETCD=ON` +- redis: `-DSTORE_USE_REDIS=ON -DUSE_REDIS=ON` (build dependency: `libhiredis-dev`) ```bash -# Download the source git clone https://github.com/kvcache-ai/Mooncake.git cd Mooncake - -# Install system & third-party dependencies bash dependencies.sh -# Build C++ components (including mooncake_master, with etcd enabled) mkdir -p build && cd build -cmake .. -DSTORE_USE_ETCD=ON -DUSE_ETCD=ON +cmake .. -DSTORE_USE_ETCD=ON -DUSE_ETCD=ON # add -DSTORE_USE_REDIS=ON -DUSE_REDIS=ON for redis make -j sudo make install cd .. -# Build and install the Python wheel ./scripts/build_wheel.sh pip install mooncake-wheel/dist/*.whl ``` @@ -358,9 +347,24 @@ export CU13_BUILD=1 pip install mooncake-wheel/dist/mooncake_transfer_engine_cuda13-*.whl ``` -#### HA Client Configuration +#### Option A: etcd backend (`run_ha.sh`) + +**1. Install etcd** + +Download and extract etcd (v3.5.30 in this example), then add `etcd` / `etcdctl` to `PATH`: + +```bash +ETCD_VER=v3.5.30 +curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \ + -o etcd-${ETCD_VER}-linux-amd64.tar.gz +tar -xzf etcd-${ETCD_VER}-linux-amd64.tar.gz +export PATH=$PWD/etcd-${ETCD_VER}-linux-amd64:$PATH +etcd --version +``` + +**2. Client configuration** (`ha_mooncake_config.json`) -In HA mode, both `metadata_server` and `master_server_addr` use the `etcd://` prefix pointing to the etcd cluster; clients discover the current leader through etcd (`ha_mooncake_config.json`): +Both `metadata_server` and `master_server_addr` use the `etcd://` prefix; clients discover the current leader through etcd: ```json { @@ -373,45 +377,77 @@ In HA mode, both `metadata_server` and `master_server_addr` use the `etcd://` pr } ``` -#### One-Command Launch & Failover Verification +**3. Run** + +```bash +cd examples/cache_storage +bash run_ha.sh +``` -A single self-contained script `examples/cache_storage/run_ha.sh` handles the whole flow — it starts the etcd cluster and the HA master cluster inline (each via a 3-iteration loop), so no separate `start_*.sh` is needed. +The script starts a 3-node etcd cluster (client ports 12379/22379/32379), 3 HA masters (rpc 8081/8082/8083), and 2 FastDeploy instances; the leader address is written to the etcd key `mooncake-store/mooncake_cluster/master_view`. -Run directly: +Inspect the current leader manually: + +```bash +etcdctl --endpoints=http://127.0.0.1:12379,http://127.0.0.1:22379,http://127.0.0.1:32379 \ + get "mooncake-store/mooncake_cluster/master_view" --print-value-only +``` + +#### Option B: redis backend (`run_ha_redis.sh`) + +**1. Client configuration** (`ha_redis_mooncake_config.json`) + +Both `metadata_server` and `master_server_addr` use the `redis://` prefix pointing to the single redis instance: + +```json +{ + "metadata_server": "redis://127.0.0.1:6399", + "global_segment_size": 1000000000, + "local_buffer_size": 134217728, + "protocol": "rdma", + "rdma_devices": "", + "master_server_addr": "redis://127.0.0.1:6399" +} +``` + +**2. Run** ```bash cd examples/cache_storage -bash run_ha.sh +bash run_ha_redis.sh ``` -What `run_ha.sh` does: +The script starts a single redis instance (port 6399), 3 HA masters (rpc 8081/8082/8083) launched with `--ha_backend_type redis --ha_backend_connstring redis://127.0.0.1:6399`, and 2 FastDeploy instances. The master_view is a redis HASH at `mooncake-store/{mooncake_cluster}/master_view`. + +Inspect the current leader manually: + +```bash +redis-cli -p 6399 hget "mooncake-store/{mooncake_cluster}/master_view" leader_address +``` + +#### What the HA scripts verify + +Both scripts run the same flow and verify failover: -1. **Start the etcd cluster**: a loop launches 3 etcd nodes (client ports 12379/22379/32379) forming a raft cluster, after a port check. -2. **Start 3 HA masters**: a loop launches 3 `mooncake_master` (rpc 8081/8082/8083, metrics 9091/9092/9093), each with `--enable_ha --etcd_endpoints ... --rpc_port ...`, electing one leader via etcd. The leader address is written to the etcd key `mooncake-store/mooncake_cluster/master_view`. -3. **Start 2 FastDeploy instances**, both joining the same cache pool with `--kvcache-storage-backend mooncake`. -4. **Verify pooling (before failover)**: warm up prompt **A** on `server_0`, then send the same prompt to `server_1`, which should hit the global cache. -5. **Kill the leader**: the script reads the current leader's `rpc_port` from etcd, `kill -9`s that process, triggering re-election. -6. **Verify pooling (after failover)**: once etcd's `master_view` is updated to the new leader, warm up a **brand-new** prompt **B** (never sent before) on `server_0`, then reuse it on `server_1`. Using a fresh prompt ensures the hit on `server_1` can only come from the new leader's global pool, rather than stale local cache from step 4. +1. Start the coordination backend (etcd cluster / single redis). +2. Start 3 HA masters; one is elected leader and published to `master_view`. +3. Start 2 FastDeploy instances, both joining the same cache pool with `--kvcache-storage-backend mooncake`. +4. **Before failover**: warm up prompt **A** on `server_0`, then send the same prompt to `server_1`, which should hit the global cache. +5. **Kill the leader**: read the current leader's `rpc_port` from the backend and `kill -9` it, triggering re-election. +6. **After failover**: once `master_view` updates to the new leader, warm up a **brand-new** prompt **B** on `server_0`, then reuse it on `server_1`. Using a fresh prompt ensures the hit can only come from the new leader's global pool, not stale local cache from step 4. -> Check the election state manually: -> -> ```bash -> # Current leader (rpc_address:rpc_port) -> etcdctl --endpoints=http://127.0.0.1:12379,http://127.0.0.1:22379,http://127.0.0.1:32379 \ -> get "mooncake-store/mooncake_cluster/master_view" --print-value-only -> ``` -> -> Per-master roles can be seen in `log_master_1` / `log_master_2` / `log_master_3` (`role=leader` / `role=standby`), and etcd logs in `log_etcd_1` / `log_etcd_2` / `log_etcd_3`. +Per-master roles can be seen in `log_master_1` / `log_master_2` / `log_master_3` (`role=leader` / `role=standby`). #### Key HA Master Parameters | Parameter | Description | |-----------|-------------| | `--enable_ha` | Enable HA mode | +| `--ha_backend_type` | Coordination backend: `etcd` (default) or `redis` | | `--etcd_endpoints` | etcd endpoints, semicolon separated (when `ha_backend_type=etcd`) | +| `--ha_backend_connstring` | Backend connection string, e.g. `redis://127.0.0.1:6399` (when `ha_backend_type=redis`) | | `--rpc_address` / `--rpc_port` | This master's reachable RPC address and port (must be unique per instance) | | `--cluster_id` | Cluster identifier; masters in the same cluster must match | -| `--root_fs_dir` | Storage root directory in HA mode (unique per instance) | ## FastDeploy Parameters for Mooncake diff --git a/docs/zh/features/global_cache_pooling.md b/docs/zh/features/global_cache_pooling.md index 2680a5b393d..4fd15b663f6 100644 --- a/docs/zh/features/global_cache_pooling.md +++ b/docs/zh/features/global_cache_pooling.md @@ -48,7 +48,8 @@ |------|------|------| | `run.sh` | 多实例缓存共享 | 两个独立实例共享缓存 | | `run_03b_pd_storage.sh` | PD 分离 | P+D 实例配合全局缓存池 | -| `run_ha.sh` | 高可用(HA) | etcd + 多 Master 选主,杀掉 leader 后验证 failover | +| `run_ha.sh` | 高可用(etcd) | etcd + 多 Master 选主,杀掉 leader 后验证 failover | +| `run_ha_redis.sh` | 高可用(redis) | 单 redis + 多 Master 选主,杀掉 leader 后验证 failover | ## 环境要求 @@ -286,14 +287,19 @@ curl -X POST "http://0.0.0.0:52700/v1/chat/completions" \ ### 场景三:高可用(HA)部署 -单 Master 是单点,崩溃后集群操作会暂停。生产环境推荐使用 **etcd + 多 Master** 模式:多个 `mooncake_master` 通过 etcd 进行 leader 选举,leader 故障后由备节点自动重新选主,客户端无感切换。 +单 Master 是单点,崩溃后集群操作会暂停。生产环境建议运行多个 `mooncake_master`,通过协调后端进行 leader 选举;leader 故障后由备节点自动重新选主,客户端无感切换。 + +支持两种协调后端: + +- **etcd**(`run_ha.sh`):3 节点 etcd 集群负责选主与元数据存储。 +- **redis**(`run_ha_redis.sh`):单个 redis 实例做基于租约(lease)的选主。用它可以避免额外引入 etcd 组件。 **架构图:** ``` ┌──────────────────────────────────────┐ - │ etcd 集群 (3 节点) │ - │ leader 选举 / 元数据存储 │ + │ 协调后端 (etcd / redis) │ + │ leader 选举 (master_view) │ └───────────────────┬──────────────────┘ │ 选主 (master_view) ┌─────────────────────┼─────────────────────┐ @@ -303,47 +309,30 @@ curl -X POST "http://0.0.0.0:52700/v1/chat/completions" \ │ rpc:8081 │ │ rpc:8082 │ │ rpc:8083 │ │ (leader) │ │ (standby) │ │ (standby) │ └──────┬──────┘ └─────────────┘ └─────────────┘ - │ FastDeploy 客户端通过 etcd 发现当前 leader + │ FastDeploy 客户端通过协调后端发现当前 leader ┌──────┴───────┐ ▼ ▼ server_0 server_1 ``` -#### 前置准备 - -**1. 安装 etcd** - -下载并解压 etcd(示例为 v3.5.30),将 `etcd` / `etcdctl` 加入 `PATH`: - -```bash -ETCD_VER=v3.5.30 -curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \ - -o etcd-${ETCD_VER}-linux-amd64.tar.gz -tar -xzf etcd-${ETCD_VER}-linux-amd64.tar.gz -export PATH=$PWD/etcd-${ETCD_VER}-linux-amd64:$PATH -etcd --version -``` +#### 源码编译 Mooncake -**2. 源码编译安装 Mooncake(支持 etcd)** +HA 模式需要 Mooncake 在编译时开启对应后端: -HA 模式需要 Mooncake 在编译时开启 etcd 支持(`-DSTORE_USE_ETCD=ON -DUSE_ETCD=ON`)。先安装依赖再编译: +- etcd:`-DSTORE_USE_ETCD=ON -DUSE_ETCD=ON` +- redis:`-DSTORE_USE_REDIS=ON -DUSE_REDIS=ON`(编译依赖:`libhiredis-dev`) ```bash -# 下载源码 git clone https://github.com/kvcache-ai/Mooncake.git cd Mooncake - -# 安装系统及第三方依赖 bash dependencies.sh -# 编译 C++ 组件(含 mooncake_master,开启 etcd) mkdir -p build && cd build -cmake .. -DSTORE_USE_ETCD=ON -DUSE_ETCD=ON +cmake .. -DSTORE_USE_ETCD=ON -DUSE_ETCD=ON # redis 后端追加 -DSTORE_USE_REDIS=ON -DUSE_REDIS=ON make -j sudo make install cd .. -# 编译并安装 Python wheel ./scripts/build_wheel.sh pip install mooncake-wheel/dist/*.whl ``` @@ -357,9 +346,24 @@ export CU13_BUILD=1 pip install mooncake-wheel/dist/mooncake_transfer_engine_cuda13-*.whl ``` -#### HA 客户端配置 +#### 方式 A:etcd 后端(`run_ha.sh`) + +**1. 安装 etcd** + +下载并解压 etcd(示例为 v3.5.30),将 `etcd` / `etcdctl` 加入 `PATH`: + +```bash +ETCD_VER=v3.5.30 +curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \ + -o etcd-${ETCD_VER}-linux-amd64.tar.gz +tar -xzf etcd-${ETCD_VER}-linux-amd64.tar.gz +export PATH=$PWD/etcd-${ETCD_VER}-linux-amd64:$PATH +etcd --version +``` + +**2. 客户端配置**(`ha_mooncake_config.json`) -HA 模式下,`metadata_server` 与 `master_server_addr` 都使用 `etcd://` 前缀指向 etcd 集群,由客户端通过 etcd 发现当前 leader(`ha_mooncake_config.json`): +`metadata_server` 与 `master_server_addr` 都使用 `etcd://` 前缀,由客户端通过 etcd 发现当前 leader: ```json { @@ -372,45 +376,77 @@ HA 模式下,`metadata_server` 与 `master_server_addr` 都使用 `etcd://` } ``` -#### 一键启动与 failover 验证 +**3. 运行** + +```bash +cd examples/cache_storage +bash run_ha.sh +``` -单个自包含脚本 `examples/cache_storage/run_ha.sh` 负责整个流程 —— 它在脚本内部用循环分别拉起 etcd 集群和 HA master 集群,不再依赖单独的 `start_*.sh`。 +脚本会启动 3 节点 etcd 集群(client 端口 12379/22379/32379)、3 个 HA Master(rpc 8081/8082/8083)和 2 个 FastDeploy 实例;leader 地址写入 etcd 的 `mooncake-store/mooncake_cluster/master_view`。 -直接运行: +手动查看当前 leader: + +```bash +etcdctl --endpoints=http://127.0.0.1:12379,http://127.0.0.1:22379,http://127.0.0.1:32379 \ + get "mooncake-store/mooncake_cluster/master_view" --print-value-only +``` + +#### 方式 B:redis 后端(`run_ha_redis.sh`) + +**1. 客户端配置**(`ha_redis_mooncake_config.json`) + +`metadata_server` 与 `master_server_addr` 都使用 `redis://` 前缀指向单个 redis 实例: + +```json +{ + "metadata_server": "redis://127.0.0.1:6399", + "global_segment_size": 1000000000, + "local_buffer_size": 134217728, + "protocol": "rdma", + "rdma_devices": "", + "master_server_addr": "redis://127.0.0.1:6399" +} +``` + +**2. 运行** ```bash cd examples/cache_storage -bash run_ha.sh +bash run_ha_redis.sh ``` -`run_ha.sh` 的执行流程: +脚本会启动单个 redis 实例(端口 6399)、3 个用 `--ha_backend_type redis --ha_backend_connstring redis://127.0.0.1:6399` 拉起的 HA Master(rpc 8081/8082/8083)和 2 个 FastDeploy 实例。master_view 是 redis 中的一个 HASH,key 为 `mooncake-store/{mooncake_cluster}/master_view`。 + +手动查看当前 leader: + +```bash +redis-cli -p 6399 hget "mooncake-store/{mooncake_cluster}/master_view" leader_address +``` + +#### HA 脚本验证的内容 + +两个脚本流程相同,均验证 failover: -1. **启动 etcd 集群**:端口检查后,用循环拉起 3 个 etcd 节点(client 端口 12379/22379/32379)组成 raft 集群。 -2. **启动 3 个 HA Master**:用循环拉起 3 个 `mooncake_master`(rpc 8081/8082/8083,metrics 9091/9092/9093),每个都带 `--enable_ha --etcd_endpoints ... --rpc_port ...`,通过 etcd 选出一个 leader。leader 地址写入 etcd 的 `mooncake-store/mooncake_cluster/master_view`。 -3. **启动 2 个 FastDeploy 实例**,均以 `--kvcache-storage-backend mooncake` 接入同一缓存池。 -4. **验证池化(failover 前)**:用 prompt **A** 在 `server_0` 预热,再向 `server_1` 发送相同 prompt,应命中全局缓存。 -5. **杀掉 leader**:脚本从 etcd 读取当前 leader 的 `rpc_port`,`kill -9` 对应进程,触发重新选主。 -6. **验证池化(failover 后)**:等待 etcd 中 `master_view` 更新为新 leader 后,用一条**全新的** prompt **B**(failover 前从未发过)在 `server_0` 预热,再在 `server_1` 复用。使用新 prompt 可确保 `server_1` 的命中只能来自新 leader 的全局池,而非步骤 4 残留的本地缓存。 +1. 启动协调后端(etcd 集群 / 单 redis)。 +2. 启动 3 个 HA Master,选出一个 leader 并写入 `master_view`。 +3. 启动 2 个 FastDeploy 实例,均以 `--kvcache-storage-backend mooncake` 接入同一缓存池。 +4. **failover 前**:用 prompt **A** 在 `server_0` 预热,再向 `server_1` 发送相同 prompt,应命中全局缓存。 +5. **杀掉 leader**:从后端读取当前 leader 的 `rpc_port`,`kill -9` 触发重新选主。 +6. **failover 后**:等待 `master_view` 更新为新 leader 后,用一条**全新的** prompt **B** 在 `server_0` 预热,再在 `server_1` 复用。使用新 prompt 可确保命中只能来自新 leader 的全局池,而非步骤 4 残留的本地缓存。 -> 单独验证选主状态: -> -> ```bash -> # 查看当前 leader(rpc_address:rpc_port) -> etcdctl --endpoints=http://127.0.0.1:12379,http://127.0.0.1:22379,http://127.0.0.1:32379 \ -> get "mooncake-store/mooncake_cluster/master_view" --print-value-only -> ``` -> -> 各 Master 角色可在 `log_master_1` / `log_master_2` / `log_master_3` 中查看(`role=leader` / `role=standby`),etcd 日志见 `log_etcd_1` / `log_etcd_2` / `log_etcd_3`。 +各 Master 角色可在 `log_master_1` / `log_master_2` / `log_master_3` 中查看(`role=leader` / `role=standby`)。 #### HA Master 关键参数 | 参数 | 说明 | |------|------| | `--enable_ha` | 开启 HA 模式 | +| `--ha_backend_type` | 协调后端:`etcd`(默认)或 `redis` | | `--etcd_endpoints` | etcd 端点,分号分隔(`ha_backend_type=etcd` 时) | +| `--ha_backend_connstring` | 后端连接串,如 `redis://127.0.0.1:6399`(`ha_backend_type=redis` 时) | | `--rpc_address` / `--rpc_port` | 该 Master 对外可达的 RPC 地址与端口(每个实例需唯一) | | `--cluster_id` | 集群标识,同一集群的 Master 需一致 | -| `--root_fs_dir` | HA 模式下的存储根目录(每个实例独立) | ## FastDeploy Mooncake 相关参数 diff --git a/examples/cache_storage/README.md b/examples/cache_storage/README.md index c1b4d05bcb8..6c269ed13fd 100644 --- a/examples/cache_storage/README.md +++ b/examples/cache_storage/README.md @@ -16,8 +16,9 @@ bash run.sh # PD disaggregation scenario bash run_03b_pd_storage.sh -# High-availability (etcd + multi-master + failover) scenario -bash run_ha.sh +# High-availability scenario (multi-master + failover) +bash run_ha.sh # etcd backend +bash run_ha_redis.sh # redis backend ``` ## Scripts @@ -26,11 +27,13 @@ bash run_ha.sh |--------|----------|-------------| | `run.sh` | Multi-Instance | Two standalone instances sharing cache | | `run_03b_pd_storage.sh` | PD Disaggregation | P+D instances with global cache pooling | -| `run_ha.sh` | High Availability | Self-contained: starts etcd + 3 masters with leader election, then kills the leader and re-verifies pooling with a fresh prompt after re-election | +| `run_ha.sh` | High Availability (etcd) | Self-contained: starts etcd + 3 masters with leader election, then kills the leader and re-verifies pooling with a fresh prompt after re-election | +| `run_ha_redis.sh` | High Availability (redis) | Same flow as `run_ha.sh`, but uses a single redis instead of etcd for leader election | ## Files - `mooncake_config.json` - Mooncake configuration file (single master) - `ha_mooncake_config.json` - Mooncake HA client config (etcd-based master discovery) +- `ha_redis_mooncake_config.json` - Mooncake HA client config (redis-based master discovery) - `utils.sh` - Utility functions for scripts - `stop.sh` - Stop all running services diff --git a/examples/cache_storage/ha_redis_mooncake_config.json b/examples/cache_storage/ha_redis_mooncake_config.json new file mode 100644 index 00000000000..662bfc6965f --- /dev/null +++ b/examples/cache_storage/ha_redis_mooncake_config.json @@ -0,0 +1,8 @@ +{ + "metadata_server": "redis://127.0.0.1:6399", + "global_segment_size": 1000000000, + "local_buffer_size": 134217728, + "protocol": "rdma", + "rdma_devices": "", + "master_server_addr": "redis://127.0.0.1:6399" +} diff --git a/examples/cache_storage/run_ha.sh b/examples/cache_storage/run_ha.sh index dd87dbb7128..ba7ea0a60de 100644 --- a/examples/cache_storage/run_ha.sh +++ b/examples/cache_storage/run_ha.sh @@ -65,17 +65,23 @@ wait_for_leader() { done } -# Kill the mooncake_master process that owns the given rpc_port (leader). +# Kill the mooncake_master process(es) that own the given rpc_port (leader). kill_master_by_rpc_port() { local rpc_port=$1 # match "--rpc_port 8081" or "--rpc_port=8081" on the full command line - local pid=$(pgrep -af mooncake_master | grep -E "rpc_port[ =]${rpc_port}([^0-9]|$)" | awk '{print $1}' | head -n1) - if [ -n "${pid}" ]; then - echo "kill leader master pid=${pid} (rpc_port=${rpc_port})" - kill -9 "${pid}" || true - else + local pids=$(pgrep -af mooncake_master | grep -E "rpc_port[ =]${rpc_port}([^0-9]|$)" | awk '{print $1}') + if [ -z "${pids}" ]; then echo "⚠️ no mooncake_master process found for rpc_port=${rpc_port}" + return fi + # also collect direct children by ppid, in case a child's cmdline didn't match. + local all_pids="${pids}" + for p in ${pids}; do + local kids=$(pgrep -P "${p}" 2>/dev/null) + [ -n "${kids}" ] && all_pids="${all_pids} ${kids}" + done + echo "kill leader master pids=$(echo ${all_pids} | tr '\n' ' ')(rpc_port=${rpc_port})" + kill -9 ${all_pids} 2>/dev/null || true } # Send a chat request to a FastDeploy server. @@ -135,17 +141,13 @@ check_ports "${master_ports[@]}" || { } for i in 1 2 3; do - rm -rf /tmp/mooncake_ha/master${i} - mkdir -p /tmp/mooncake_ha/master${i} mooncake_master \ --enable_ha \ --etcd_endpoints "${ETCD_ENDPOINTS_HA}" \ --cluster_id "${CLUSTER_ID}" \ --rpc_address "127.0.0.1" \ --rpc_port 808${i} \ - --metrics_port=909${i} \ - --root_fs_dir=/tmp/mooncake_ha/master${i} \ - --enable_offload=true > log_master_${i} 2>&1 & + --metrics_port=909${i} > log_master_${i} 2>&1 & done echo "waiting for leader election..." diff --git a/examples/cache_storage/run_ha_redis.sh b/examples/cache_storage/run_ha_redis.sh new file mode 100644 index 00000000000..0eb25e02202 --- /dev/null +++ b/examples/cache_storage/run_ha_redis.sh @@ -0,0 +1,260 @@ +#!/bin/bash +set -e +# ============================================================================= +# HA Global Cache Pooling test script — REDIS backend (single redis + multi-master + failover) +# Mirror of run_ha.sh, but replaces the 3-node etcd cluster with a SINGLE redis +# instance. The 3 mooncake_master use redis (lease-based leader election) instead +# of etcd raft. Motivation: redis avoids introducing etcd as an extra component. +# +# Flow (identical to run_ha.sh): +# 1. start a single redis instance +# 2. start 3 HA masters (one is elected leader via a redis lease) +# 3. start 2 FastDeploy instances sharing the global cache pool +# 4. verify pooling (before failover): warmup on server_0, reuse on server_1 +# 5. kill the leader master, wait for a standby to be re-elected +# 6. verify pooling (after failover) with a BRAND-NEW prompt +# ============================================================================= + +export PYTHONPATH="/workspace/mooncake-test/FastDeploy:$PYTHONPATH" +export MODEL_NAME="/workspace/models/Ernie-0.3B" +export MOONCAKE_CONFIG_PATH=./ha_redis_mooncake_config.json +export FD_DEBUG=1 + +unset http_proxy && unset https_proxy + +echo "begin" +source ./utils.sh + +# ---- topology --------------------------------------------------------------- +# redis: client port = 6399 (single instance) +# master node i: rpc port = 808${i}, metrics port = 909${i} +REDIS_PORT=6399 +REDIS_SERVER_BIN="$(command -v redis-server || echo /usr/local/redis/bin/redis-server)" +REDIS_CLI_BIN="$(command -v redis-cli || echo /usr/local/redis/bin/redis-cli)" +REDIS_CONN="redis://127.0.0.1:${REDIS_PORT}" # for mooncake_master + client config + +CLUSTER_ID="mooncake_cluster" +# redis master_view key uses a hash-tag {cluster_id} so all related keys land in +# the same Redis Cluster slot; it is a HASH with fields leader_address/view_version/owner_token. +MASTER_VIEW_KEY="mooncake-store/{${CLUSTER_ID}}/master_view" + +S0_PORT=52700 +S1_PORT=52800 + +# ---- helpers ---------------------------------------------------------------- + +# Query redis for the current leader's "rpc_address:rpc_port". +# The master_view is a redis HASH; the leader endpoint lives in field leader_address. +# redis-cli prints raw (unquoted) output when piped, so no extra unquoting needed. +get_leader_addr() { + "${REDIS_CLI_BIN}" -p "${REDIS_PORT}" hget "${MASTER_VIEW_KEY}" leader_address 2>/dev/null \ + | tr -d '[:space:]' +} + +# Wait until a leader is elected and published into redis. +wait_for_leader() { + local timeout=${1:-60} + local start_time=$(date +%s) + while true; do + local leader=$(get_leader_addr) + if [ -n "${leader}" ]; then + echo "${leader}" + return 0 + fi + if [ $(( $(date +%s) - start_time )) -ge ${timeout} ]; then + echo "" + return 1 + fi + sleep 1 + done +} + +# Kill the mooncake_master process(es) that own the given rpc_port (leader). +kill_master_by_rpc_port() { + local rpc_port=$1 + # match "--rpc_port 8081" or "--rpc_port=8081" on the full command line + local pids=$(pgrep -af mooncake_master | grep -E "rpc_port[ =]${rpc_port}([^0-9]|$)" | awk '{print $1}') + if [ -z "${pids}" ]; then + echo "⚠️ no mooncake_master process found for rpc_port=${rpc_port}" + return + fi + # also collect direct children by ppid, in case a child's cmdline didn't match. + local all_pids="${pids}" + for p in ${pids}; do + local kids=$(pgrep -P "${p}" 2>/dev/null) + [ -n "${kids}" ] && all_pids="${all_pids} ${kids}" + done + echo "kill leader master pids=$(echo ${all_pids} | tr '\n' ' ')(rpc_port=${rpc_port})" + kill -9 ${all_pids} 2>/dev/null || true + +} + +# Send a chat request to a FastDeploy server. +send_request() { + local port=$1 + local content=$2 + curl -s -X POST "http://0.0.0.0:${port}/v1/chat/completions" \ + -H "Content-Type: application/json" \ + -d "{ + \"messages\": [ + {\"role\": \"user\", \"content\": \"${content}\"} + ], + \"max_tokens\": 50, + \"stream\": false, + \"top_p\": 0 + }" + echo +} + +# ---- 1. start a single redis instance --------------------------------------- +echo "=== [1/6] start redis ===" +pkill -9 -f "redis-server .*:${REDIS_PORT}" || true +sleep 1 + +check_ports "${REDIS_PORT}" || { + echo "❌ redis port ${REDIS_PORT} is in use. Please release it." + exit 1 +} + +# disable persistence; this is a throwaway coordination store. +"${REDIS_SERVER_BIN}" --port "${REDIS_PORT}" --save "" --appendonly no \ + --daemonize no > log_redis 2>&1 & +sleep 2 +echo "=== redis health check ===" +"${REDIS_CLI_BIN}" -p "${REDIS_PORT}" ping + +# ---- 2. start 3 HA masters (redis backend) ---------------------------------- +echo "=== [2/6] start 3 HA mooncake_master (redis backend) ===" +pkill -9 -f mooncake_master || true +sleep 1 + +master_ports=(8081 8082 8083 9091 9092 9093) +check_ports "${master_ports[@]}" || { + echo "❌ Some master ports are in use. Please release them." + exit 1 +} + +for i in 1 2 3; do + # --ha_backend_type redis + --ha_backend_connstring redis://... + mooncake_master \ + --enable_ha \ + --ha_backend_type redis \ + --ha_backend_connstring "${REDIS_CONN}" \ + --cluster_id "${CLUSTER_ID}" \ + --rpc_address "127.0.0.1" \ + --rpc_port 808${i} \ + --metrics_port=909${i} > log_master_${i} 2>&1 & +done + +echo "waiting for leader election..." +LEADER_ADDR=$(wait_for_leader 60) || { + echo "❌ no leader elected within timeout" + exit 1 +} +echo "✅ current leader: ${LEADER_ADDR}" + +# ---- 3. start 2 FastDeploy instances ---------------------------------------- +echo "=== [3/6] start FastDeploy instances ===" + +# clean up any lingering FastDeploy services so the ports are free. +# the api_server runs under gunicorn; killing the gunicorn masters takes the +# workers down with them. +pkill -f "gunicorn: master" || true +sleep 2 + +rm -rf log_0 log_1 + +fd_ports=("$S0_PORT" "$S1_PORT") +check_ports "${fd_ports[@]}" || { + echo "❌ Some ports are in use. Please release them." + exit 1 +} + +# Launch FD server 0 +export CUDA_VISIBLE_DEVICES=6 +export FD_LOG_DIR="log_0" +mkdir -p ${FD_LOG_DIR} +echo "server 0 port: ${S0_PORT}" + +nohup python -m fastdeploy.entrypoints.openai.api_server \ + --model ${MODEL_NAME} \ + --port ${S0_PORT} \ + --max-model-len 32768 \ + --max-num-seqs 32 \ + --kvcache-storage-backend mooncake \ + 2>&1 >${FD_LOG_DIR}/nohup & + +# Launch FD server 1 +export CUDA_VISIBLE_DEVICES=7 +export FD_LOG_DIR="log_1" +mkdir -p ${FD_LOG_DIR} +echo "server 1 port: ${S1_PORT}" + +nohup python -m fastdeploy.entrypoints.openai.api_server \ + --model ${MODEL_NAME} \ + --port ${S1_PORT} \ + --max-model-len 32768 \ + --max-num-seqs 32 \ + --kvcache-storage-backend mooncake \ + 2>&1 >${FD_LOG_DIR}/nohup & + +wait_for_health ${S0_PORT} +wait_for_health ${S1_PORT} +# ---- 4. verify pooling before failover (warmup on s0, reuse on s1) ---------- +# msg_a: warmed on server_0, then reused on server_1. +msg_a="深圳是中国经济实力最强的城市之一。近年来,深圳GDP持续稳步增长,2023年突破3.4万亿元人民币,2024年接近3.7万亿元。长期位居全国城市前列。深圳经济以第二产业和第三产业为主,高端制造业、电子信息产业和现代服务业发达,形成了以科技创新为核心的产业结构。依托华为、腾讯、大疆等龙头企业,深圳在数字经济、人工智能、新能源等领域具有显著优势。同时,深圳进出口总额常年位居全国城市第一,是中国对外开放和高质量发展的重要引擎。深圳持续推进创新驱动发展战略,不断加大研发投入,全社会研发投入占GDP比重长期保持较高水平。深圳拥有完善的创业生态体系,吸引了大量科技企业和创新人才。近年来,深圳积极布局半导体、生物医药、低空经济和智能网联汽车等战略性新兴产业,进一步增强经济增长动能。请总结深圳经济发展的核心优势。" + +echo "=== [4/6] verify pooling before failover ===" +echo ">>> warmup msg_a on server_0 (${S0_PORT})" +send_request ${S0_PORT} "${msg_a}" +sleep 5 +echo ">>> reuse msg_a on server_1 (${S1_PORT}), expect cache hit" +send_request ${S1_PORT} "${msg_a}" + +# ---- 5. kill the leader, wait for re-election ------------------------------- +echo "=== [5/6] kill leader and wait for failover ===" +OLD_LEADER_ADDR=$(get_leader_addr) +OLD_LEADER_PORT="${OLD_LEADER_ADDR##*:}" +echo "old leader: ${OLD_LEADER_ADDR} (rpc_port=${OLD_LEADER_PORT})" +kill_master_by_rpc_port "${OLD_LEADER_PORT}" + +echo "waiting for a new leader to be elected..." +NEW_LEADER_ADDR="" +start_time=$(date +%s) +while true; do + cur=$(get_leader_addr) + if [ -n "${cur}" ] && [ "${cur}" != "${OLD_LEADER_ADDR}" ]; then + NEW_LEADER_ADDR="${cur}" + break + fi + if [ $(( $(date +%s) - start_time )) -ge 60 ]; then + echo "❌ no new leader elected within timeout" + exit 1 + fi + sleep 1 +done +echo "✅ new leader: ${NEW_LEADER_ADDR} (was ${OLD_LEADER_ADDR})" + +# wait for the new leader to finish recovery and reach serving state +# (and for clients to reconnect) before sending requests; 5s was too short. +sleep 10 + +# ---- 6. verify pooling after failover with a BRAND-NEW prompt --------------- +# Use a different prompt (msg_b) never sent before the failover, so a hit on +# server_1 proves the cache was written/read through the NEW leader's global +# pool (not stale local cache from step 4). +msg_b="人工智能已经成为全球科技竞争的重要方向。近年来,大模型技术快速发展,在自然语言处理、代码生成、多模态理解以及智能代理等领域取得显著突破。越来越多企业开始将人工智能技术应用于客服、办公自动化、内容生成、金融风控和软件开发等场景。与此同时,人工智能的发展也带来了新的挑战,包括算力成本快速上升、训练数据质量参差不齐、模型幻觉问题以及隐私保护需求增强。各国政府正在制定相应监管框架,以平衡技术创新和风险控制之间的关系。未来几年,人工智能有望进一步推动生产力提升,并深刻影响教育、医疗、科研和工业制造等行业的发展模式。请列出人工智能当前面临的主要挑战。" + +echo "=== [6/6] verify pooling after failover (new prompt msg_b) ===" +echo ">>> warmup msg_b on server_0 (${S0_PORT})" +send_request ${S0_PORT} "${msg_b}" +sleep 5 +echo ">>> reuse msg_b on server_1 (${S1_PORT}), expect cache hit via new leader" +send_request ${S1_PORT} "${msg_b}" + +echo +echo "=== HA (redis) test completed ===" +echo "Check cache hit: grep -E 'storage_cache_token_num' log_*/cache_storage.log* " +echo "Master logs: log_master_1 / log_master_2 / log_master_3" +echo "Redis log: log_redis" +echo "Current leader: ${REDIS_CLI_BIN} -p ${REDIS_PORT} hget '${MASTER_VIEW_KEY}' leader_address"