Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 86 additions & 50 deletions docs/features/global_cache_pooling.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ Ready-to-use example scripts are available in [examples/cache_storage/](../../..
|--------|----------|-------------|
| `run.sh` | Multi-Instance | Two standalone instances sharing cache |
| `run_03b_pd_storage.sh` | PD Disaggregation | P+D instances with global cache pooling |
| `run_ha.sh` | High Availability | etcd + multi-master leader election, verifies failover after killing the leader |
| `run_ha.sh` | High Availability (etcd) | etcd + multi-master leader election, verifies failover after killing the leader |
| `run_ha_redis.sh` | High Availability (redis) | single redis + multi-master leader election, verifies failover after killing the leader |

## Prerequisites

Expand Down Expand Up @@ -287,14 +288,19 @@ curl -X POST "http://0.0.0.0:52700/v1/chat/completions" \

### Scenario 3: High-Availability (HA) Deployment

A single master is a single point of failure; if it crashes, cluster operations pause. For production, use the **etcd + multi-master** mode: multiple `mooncake_master` instances perform leader election through etcd. When the leader fails, a standby is automatically re-elected, transparently to clients.
A single master is a single point of failure; if it crashes, cluster operations pause. For production, run multiple `mooncake_master` instances that perform leader election through a coordination backend. When the leader fails, a standby is automatically re-elected, transparently to clients.

Two coordination backends are supported:

- **etcd** (`run_ha.sh`): a 3-node etcd cluster does election and metadata storage.
- **redis** (`run_ha_redis.sh`): a single redis instance does lease-based election. Use this to avoid introducing etcd as an extra component.

**Architecture:**

```
┌──────────────────────────────────────┐
etcd cluster (3 nodes)
│ leader election / metadata store
coordination backend (etcd / redis)
│ leader election (master_view)
└───────────────────┬──────────────────┘
│ election (master_view)
┌─────────────────────┼─────────────────────┐
Expand All @@ -304,47 +310,30 @@ A single master is a single point of failure; if it crashes, cluster operations
│ rpc:8081 │ │ rpc:8082 │ │ rpc:8083 │
│ (leader) │ │ (standby) │ │ (standby) │
└──────┬──────┘ └─────────────┘ └─────────────┘
│ FastDeploy clients discover the current leader via etcd
│ FastDeploy clients discover the current leader via the backend
┌──────┴───────┐
▼ ▼
server_0 server_1
```

#### Prerequisites

**1. Install etcd**

Download and extract etcd (v3.5.30 in this example), then add `etcd` / `etcdctl` to `PATH`:

```bash
ETCD_VER=v3.5.30
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
-o etcd-${ETCD_VER}-linux-amd64.tar.gz
tar -xzf etcd-${ETCD_VER}-linux-amd64.tar.gz
export PATH=$PWD/etcd-${ETCD_VER}-linux-amd64:$PATH
etcd --version
```
#### Build Mooncake from source

**2. Build Mooncake from source (with etcd support)**
HA mode requires Mooncake built with the matching backend enabled:

HA mode requires Mooncake built with etcd support (`-DSTORE_USE_ETCD=ON -DUSE_ETCD=ON`). Install dependencies first, then build:
- etcd: `-DSTORE_USE_ETCD=ON -DUSE_ETCD=ON`
- redis: `-DSTORE_USE_REDIS=ON -DUSE_REDIS=ON` (build dependency: `libhiredis-dev`)

```bash
# Download the source
git clone https://github.com/kvcache-ai/Mooncake.git
cd Mooncake

# Install system & third-party dependencies
bash dependencies.sh

# Build C++ components (including mooncake_master, with etcd enabled)
mkdir -p build && cd build
cmake .. -DSTORE_USE_ETCD=ON -DUSE_ETCD=ON
cmake .. -DSTORE_USE_ETCD=ON -DUSE_ETCD=ON # add -DSTORE_USE_REDIS=ON -DUSE_REDIS=ON for redis
make -j
sudo make install
cd ..

# Build and install the Python wheel
./scripts/build_wheel.sh
pip install mooncake-wheel/dist/*.whl
```
Expand All @@ -358,9 +347,24 @@ export CU13_BUILD=1
pip install mooncake-wheel/dist/mooncake_transfer_engine_cuda13-*.whl
```

#### HA Client Configuration
#### Option A: etcd backend (`run_ha.sh`)

**1. Install etcd**

Download and extract etcd (v3.5.30 in this example), then add `etcd` / `etcdctl` to `PATH`:

```bash
ETCD_VER=v3.5.30
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
-o etcd-${ETCD_VER}-linux-amd64.tar.gz
tar -xzf etcd-${ETCD_VER}-linux-amd64.tar.gz
export PATH=$PWD/etcd-${ETCD_VER}-linux-amd64:$PATH
etcd --version
```

**2. Client configuration** (`ha_mooncake_config.json`)

In HA mode, both `metadata_server` and `master_server_addr` use the `etcd://` prefix pointing to the etcd cluster; clients discover the current leader through etcd (`ha_mooncake_config.json`):
Both `metadata_server` and `master_server_addr` use the `etcd://` prefix; clients discover the current leader through etcd:

```json
{
Expand All @@ -373,45 +377,77 @@ In HA mode, both `metadata_server` and `master_server_addr` use the `etcd://` pr
}
```

#### One-Command Launch & Failover Verification
**3. Run**

```bash
cd examples/cache_storage
bash run_ha.sh
```

A single self-contained script `examples/cache_storage/run_ha.sh` handles the whole flow — it starts the etcd cluster and the HA master cluster inline (each via a 3-iteration loop), so no separate `start_*.sh` is needed.
The script starts a 3-node etcd cluster (client ports 12379/22379/32379), 3 HA masters (rpc 8081/8082/8083), and 2 FastDeploy instances; the leader address is written to the etcd key `mooncake-store/mooncake_cluster/master_view`.

Run directly:
Inspect the current leader manually:

```bash
etcdctl --endpoints=http://127.0.0.1:12379,http://127.0.0.1:22379,http://127.0.0.1:32379 \
get "mooncake-store/mooncake_cluster/master_view" --print-value-only
```

#### Option B: redis backend (`run_ha_redis.sh`)

**1. Client configuration** (`ha_redis_mooncake_config.json`)

Both `metadata_server` and `master_server_addr` use the `redis://` prefix pointing to the single redis instance:

```json
{
"metadata_server": "redis://127.0.0.1:6399",
"global_segment_size": 1000000000,
"local_buffer_size": 134217728,
"protocol": "rdma",
"rdma_devices": "",
"master_server_addr": "redis://127.0.0.1:6399"
}
```

**2. Run**

This comment was marked as outdated.


```bash
cd examples/cache_storage
bash run_ha.sh
bash run_ha_redis.sh
```

What `run_ha.sh` does:
The script starts a single redis instance (port 6399), 3 HA masters (rpc 8081/8082/8083) launched with `--ha_backend_type redis --ha_backend_connstring redis://127.0.0.1:6399`, and 2 FastDeploy instances. The master_view is a redis HASH at `mooncake-store/{mooncake_cluster}/master_view`.

Inspect the current leader manually:

```bash
redis-cli -p 6399 hget "mooncake-store/{mooncake_cluster}/master_view" leader_address
```

#### What the HA scripts verify

Both scripts run the same flow and verify failover:

1. **Start the etcd cluster**: a loop launches 3 etcd nodes (client ports 12379/22379/32379) forming a raft cluster, after a port check.
2. **Start 3 HA masters**: a loop launches 3 `mooncake_master` (rpc 8081/8082/8083, metrics 9091/9092/9093), each with `--enable_ha --etcd_endpoints ... --rpc_port ...`, electing one leader via etcd. The leader address is written to the etcd key `mooncake-store/mooncake_cluster/master_view`.
3. **Start 2 FastDeploy instances**, both joining the same cache pool with `--kvcache-storage-backend mooncake`.
4. **Verify pooling (before failover)**: warm up prompt **A** on `server_0`, then send the same prompt to `server_1`, which should hit the global cache.
5. **Kill the leader**: the script reads the current leader's `rpc_port` from etcd, `kill -9`s that process, triggering re-election.
6. **Verify pooling (after failover)**: once etcd's `master_view` is updated to the new leader, warm up a **brand-new** prompt **B** (never sent before) on `server_0`, then reuse it on `server_1`. Using a fresh prompt ensures the hit on `server_1` can only come from the new leader's global pool, rather than stale local cache from step 4.
1. Start the coordination backend (etcd cluster / single redis).
2. Start 3 HA masters; one is elected leader and published to `master_view`.
3. Start 2 FastDeploy instances, both joining the same cache pool with `--kvcache-storage-backend mooncake`.
4. **Before failover**: warm up prompt **A** on `server_0`, then send the same prompt to `server_1`, which should hit the global cache.
5. **Kill the leader**: read the current leader's `rpc_port` from the backend and `kill -9` it, triggering re-election.
6. **After failover**: once `master_view` updates to the new leader, warm up a **brand-new** prompt **B** on `server_0`, then reuse it on `server_1`. Using a fresh prompt ensures the hit can only come from the new leader's global pool, not stale local cache from step 4.

> Check the election state manually:
>
> ```bash
> # Current leader (rpc_address:rpc_port)
> etcdctl --endpoints=http://127.0.0.1:12379,http://127.0.0.1:22379,http://127.0.0.1:32379 \
> get "mooncake-store/mooncake_cluster/master_view" --print-value-only
> ```
>
> Per-master roles can be seen in `log_master_1` / `log_master_2` / `log_master_3` (`role=leader` / `role=standby`), and etcd logs in `log_etcd_1` / `log_etcd_2` / `log_etcd_3`.
Per-master roles can be seen in `log_master_1` / `log_master_2` / `log_master_3` (`role=leader` / `role=standby`).

#### Key HA Master Parameters

| Parameter | Description |
|-----------|-------------|
| `--enable_ha` | Enable HA mode |
| `--ha_backend_type` | Coordination backend: `etcd` (default) or `redis` |
| `--etcd_endpoints` | etcd endpoints, semicolon separated (when `ha_backend_type=etcd`) |
| `--ha_backend_connstring` | Backend connection string, e.g. `redis://127.0.0.1:6399` (when `ha_backend_type=redis`) |
| `--rpc_address` / `--rpc_port` | This master's reachable RPC address and port (must be unique per instance) |
| `--cluster_id` | Cluster identifier; masters in the same cluster must match |
| `--root_fs_dir` | Storage root directory in HA mode (unique per instance) |

## FastDeploy Parameters for Mooncake

Expand Down
Loading
Loading