|
| 1 | +--- |
| 2 | +sidebar_position: 1 |
| 3 | +--- |
| 4 | + |
| 5 | +# Deployment Patterns |
| 6 | + |
| 7 | +Arc Enterprise supports two clustering topologies, each optimized for a different operational environment. The choice is about **where the Parquet files live** — and that decision shapes durability, cost, and the operational model of your cluster. |
| 8 | + |
| 9 | +## The Two Patterns |
| 10 | + |
| 11 | +### Pattern A: Shared Object Storage |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +All nodes read and write to the **same object store** — S3, MinIO, or Azure Blob. The bucket is the source of truth for Parquet files. Nodes are stateless from a data perspective: any reader can serve any query because every file is one API call away. |
| 16 | + |
| 17 | +**Best for:** |
| 18 | +- Cloud deployments (AWS, GCP, Azure) |
| 19 | +- Teams that already operate object storage |
| 20 | +- Workloads where scaling readers elastically matters more than query latency |
| 21 | +- Kubernetes-native deployments with object storage |
| 22 | + |
| 23 | +### Pattern B: Local Storage with Peer Replication |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +Each node has its own **local disks** (NVMe, SSD, or attached block storage). Parquet files are replicated peer-to-peer over the cluster protocol, verified via SHA-256, and kept on every node that needs them. A Raft-backed file manifest is the cluster-wide source of truth for which files exist. |
| 28 | + |
| 29 | +**Best for:** |
| 30 | +- Bare metal and virtual machine deployments |
| 31 | +- Edge, on-premises, and air-gapped environments |
| 32 | +- Defense, aerospace, industrial, and regulated workloads where shared object storage is not available |
| 33 | +- Deployments that need the lowest possible query latency (local NVMe beats network-attached storage every time) |
| 34 | + |
| 35 | +## Side-by-Side Comparison |
| 36 | + |
| 37 | +| Aspect | Shared Object Storage | Local Storage + Peer Replication | |
| 38 | +|--------|----------------------|----------------------------------| |
| 39 | +| **Storage layout** | Single bucket, all nodes read/write | Per-node local disks, replicated peer-to-peer | |
| 40 | +| **Source of truth** | The bucket itself | Raft-backed file manifest (FSM) | |
| 41 | +| **Durability** | Relies on S3/MinIO/Azure replication | Replicated across N cluster nodes | |
| 42 | +| **Query latency** | Network fetch from object store | Local disk I/O | |
| 43 | +| **New-node bootstrap** | Instant (no data transfer needed) | Startup catch-up pulls bytes from peers | |
| 44 | +| **Compactor outputs** | Written once to bucket, visible to all | Compactor writes locally, Raft announces, peers pull | |
| 45 | +| **Compactor failover** | Any healthy node can take over | Any healthy node can take over | |
| 46 | +| **Best deployment** | Kubernetes, cloud-native | Bare metal, VMs, edge | |
| 47 | +| **Cost model** | Object storage API calls + egress | Local disk capacity × nodes | |
| 48 | +| **Network requirements** | Reliable path to object store | Reliable path between cluster nodes | |
| 49 | + |
| 50 | +## Choosing a Pattern |
| 51 | + |
| 52 | +Start here: |
| 53 | + |
| 54 | +1. **Do you already run S3/MinIO/Azure in production?** → Pattern A (shared). |
| 55 | +2. **Do your nodes have fast local disks and you want minimum query latency?** → Pattern B (local). |
| 56 | +3. **Is shared object storage unavailable (edge, air-gap, defense)?** → Pattern B (local). |
| 57 | +4. **Do you expect to scale readers elastically based on demand?** → Pattern A (shared). |
| 58 | +5. **Do you need a single-digit-ms query path?** → Pattern B (local). |
| 59 | + |
| 60 | +You can also mix — a cluster can use shared object storage for cold data (tiered storage to S3 Glacier) while keeping hot data on local disks. See [Tiered Storage](/arc-enterprise/tiered-storage). |
| 61 | + |
| 62 | +## Pattern A — Shared Storage Setup |
| 63 | + |
| 64 | +### Minimal 3-node cluster (1 writer, 1 reader, 1 compactor) on MinIO |
| 65 | + |
| 66 | +```yaml |
| 67 | +# docker-compose.yml |
| 68 | +services: |
| 69 | + minio: |
| 70 | + image: minio/minio |
| 71 | + command: server /data --console-address ":9001" |
| 72 | + environment: |
| 73 | + MINIO_ROOT_USER: minioadmin |
| 74 | + MINIO_ROOT_PASSWORD: minioadmin123 |
| 75 | + ports: ["9001:9001"] |
| 76 | + |
| 77 | + arc-writer: |
| 78 | + image: basekick/arc:latest |
| 79 | + environment: |
| 80 | + ARC_LICENSE_KEY: "ARC-XXXX-XXXX-XXXX-XXXX" |
| 81 | + ARC_STORAGE_BACKEND: minio |
| 82 | + ARC_STORAGE_S3_BUCKET: arc-data |
| 83 | + ARC_STORAGE_S3_ENDPOINT: minio:9000 |
| 84 | + ARC_STORAGE_S3_ACCESS_KEY: minioadmin |
| 85 | + ARC_STORAGE_S3_SECRET_KEY: minioadmin123 |
| 86 | + ARC_STORAGE_S3_USE_SSL: "false" |
| 87 | + ARC_STORAGE_S3_PATH_STYLE: "true" |
| 88 | + ARC_CLUSTER_ENABLED: "true" |
| 89 | + ARC_CLUSTER_NODE_ID: writer-01 |
| 90 | + ARC_CLUSTER_ROLE: writer |
| 91 | + ARC_CLUSTER_CLUSTER_NAME: production |
| 92 | + ARC_CLUSTER_RAFT_BOOTSTRAP: "true" |
| 93 | + ARC_CLUSTER_SHARED_SECRET: "your-cluster-secret" |
| 94 | + ARC_CLUSTER_REPLICATION_ENABLED: "false" # not needed on shared storage |
| 95 | + ports: ["8001:8000"] |
| 96 | + |
| 97 | + arc-reader: |
| 98 | + image: basekick/arc:latest |
| 99 | + environment: |
| 100 | + ARC_LICENSE_KEY: "ARC-XXXX-XXXX-XXXX-XXXX" |
| 101 | + ARC_STORAGE_BACKEND: minio |
| 102 | + ARC_STORAGE_S3_BUCKET: arc-data |
| 103 | + ARC_STORAGE_S3_ENDPOINT: minio:9000 |
| 104 | + ARC_STORAGE_S3_ACCESS_KEY: minioadmin |
| 105 | + ARC_STORAGE_S3_SECRET_KEY: minioadmin123 |
| 106 | + ARC_STORAGE_S3_USE_SSL: "false" |
| 107 | + ARC_STORAGE_S3_PATH_STYLE: "true" |
| 108 | + ARC_CLUSTER_ENABLED: "true" |
| 109 | + ARC_CLUSTER_NODE_ID: reader-01 |
| 110 | + ARC_CLUSTER_ROLE: reader |
| 111 | + ARC_CLUSTER_CLUSTER_NAME: production |
| 112 | + ARC_CLUSTER_SEEDS: arc-writer:9200 |
| 113 | + ARC_CLUSTER_SHARED_SECRET: "your-cluster-secret" |
| 114 | + ports: ["8002:8000"] |
| 115 | + |
| 116 | + arc-compactor: |
| 117 | + image: basekick/arc:latest |
| 118 | + environment: |
| 119 | + ARC_LICENSE_KEY: "ARC-XXXX-XXXX-XXXX-XXXX" |
| 120 | + ARC_STORAGE_BACKEND: minio |
| 121 | + ARC_STORAGE_S3_BUCKET: arc-data |
| 122 | + ARC_STORAGE_S3_ENDPOINT: minio:9000 |
| 123 | + ARC_STORAGE_S3_ACCESS_KEY: minioadmin |
| 124 | + ARC_STORAGE_S3_SECRET_KEY: minioadmin123 |
| 125 | + ARC_STORAGE_S3_USE_SSL: "false" |
| 126 | + ARC_STORAGE_S3_PATH_STYLE: "true" |
| 127 | + ARC_CLUSTER_ENABLED: "true" |
| 128 | + ARC_CLUSTER_NODE_ID: compactor-01 |
| 129 | + ARC_CLUSTER_ROLE: compactor |
| 130 | + ARC_CLUSTER_CLUSTER_NAME: production |
| 131 | + ARC_CLUSTER_SEEDS: arc-writer:9200 |
| 132 | + ARC_CLUSTER_SHARED_SECRET: "your-cluster-secret" |
| 133 | + ARC_CLUSTER_FAILOVER_ENABLED: "true" |
| 134 | + ARC_COMPACTION_ENABLED: "true" |
| 135 | + ports: ["8003:8000"] |
| 136 | +``` |
| 137 | +
|
| 138 | +### Key points |
| 139 | +
|
| 140 | +- **All nodes point to the same bucket.** The writer flushes to the bucket; readers query directly from it; the compactor reads source files, writes compacted outputs back, and deletes the sources. |
| 141 | +- **`ARC_CLUSTER_REPLICATION_ENABLED=false`** is the right choice on shared storage — there's no peer-to-peer file transfer needed because the bucket is already shared. |
| 142 | +- **Exactly one compactor node.** Multiple compactors against a shared bucket produce duplicate outputs. Arc warns you via the cluster health check if it sees more than one. |
| 143 | +- **Compactor failover** (`ARC_CLUSTER_FAILOVER_ENABLED=true`) lets the Raft leader automatically reassign the compactor lease to another healthy node if the current compactor dies. No restart required. |
| 144 | + |
| 145 | +## Pattern B — Local Storage Setup |
| 146 | + |
| 147 | +### Minimal 3-node cluster (1 writer, 1 reader, 1 compactor) on local disks |
| 148 | + |
| 149 | +```yaml |
| 150 | +# docker-compose.yml |
| 151 | +services: |
| 152 | + arc-writer: |
| 153 | + image: basekick/arc:latest |
| 154 | + environment: |
| 155 | + ARC_LICENSE_KEY: "ARC-XXXX-XXXX-XXXX-XXXX" |
| 156 | + ARC_STORAGE_BACKEND: local |
| 157 | + ARC_STORAGE_LOCAL_PATH: /app/data |
| 158 | + ARC_CLUSTER_ENABLED: "true" |
| 159 | + ARC_CLUSTER_NODE_ID: writer-01 |
| 160 | + ARC_CLUSTER_ROLE: writer |
| 161 | + ARC_CLUSTER_CLUSTER_NAME: production |
| 162 | + ARC_CLUSTER_RAFT_BOOTSTRAP: "true" |
| 163 | + ARC_CLUSTER_SHARED_SECRET: "your-cluster-secret" |
| 164 | + ARC_CLUSTER_REPLICATION_ENABLED: "true" # CRITICAL for local storage |
| 165 | + volumes: |
| 166 | + - writer-data:/app/data |
| 167 | + ports: ["8001:8000"] |
| 168 | +
|
| 169 | + arc-reader: |
| 170 | + image: basekick/arc:latest |
| 171 | + environment: |
| 172 | + ARC_LICENSE_KEY: "ARC-XXXX-XXXX-XXXX-XXXX" |
| 173 | + ARC_STORAGE_BACKEND: local |
| 174 | + ARC_STORAGE_LOCAL_PATH: /app/data |
| 175 | + ARC_CLUSTER_ENABLED: "true" |
| 176 | + ARC_CLUSTER_NODE_ID: reader-01 |
| 177 | + ARC_CLUSTER_ROLE: reader |
| 178 | + ARC_CLUSTER_CLUSTER_NAME: production |
| 179 | + ARC_CLUSTER_SEEDS: arc-writer:9200 |
| 180 | + ARC_CLUSTER_SHARED_SECRET: "your-cluster-secret" |
| 181 | + ARC_CLUSTER_REPLICATION_ENABLED: "true" |
| 182 | + volumes: |
| 183 | + - reader-data:/app/data |
| 184 | + ports: ["8002:8000"] |
| 185 | +
|
| 186 | + arc-compactor: |
| 187 | + image: basekick/arc:latest |
| 188 | + environment: |
| 189 | + ARC_LICENSE_KEY: "ARC-XXXX-XXXX-XXXX-XXXX" |
| 190 | + ARC_STORAGE_BACKEND: local |
| 191 | + ARC_STORAGE_LOCAL_PATH: /app/data |
| 192 | + ARC_CLUSTER_ENABLED: "true" |
| 193 | + ARC_CLUSTER_NODE_ID: compactor-01 |
| 194 | + ARC_CLUSTER_ROLE: compactor |
| 195 | + ARC_CLUSTER_CLUSTER_NAME: production |
| 196 | + ARC_CLUSTER_SEEDS: arc-writer:9200 |
| 197 | + ARC_CLUSTER_SHARED_SECRET: "your-cluster-secret" |
| 198 | + ARC_CLUSTER_REPLICATION_ENABLED: "true" |
| 199 | + ARC_CLUSTER_FAILOVER_ENABLED: "true" |
| 200 | + ARC_COMPACTION_ENABLED: "true" |
| 201 | + volumes: |
| 202 | + - compactor-data:/app/data |
| 203 | + ports: ["8003:8000"] |
| 204 | +
|
| 205 | +volumes: |
| 206 | + writer-data: |
| 207 | + reader-data: |
| 208 | + compactor-data: |
| 209 | +``` |
| 210 | + |
| 211 | +### How peer replication works |
| 212 | + |
| 213 | +1. **Writer flushes a Parquet file locally.** The file hash (SHA-256) is computed and included in the flush. |
| 214 | +2. **The writer registers the file in the Raft manifest** via a `CommandRegisterFile` entry. This commits cluster-wide — every node now knows the file exists and where to find it. |
| 215 | +3. **Readers and compactors observe the FSM callback.** A background puller enqueues a byte-level pull from the origin peer (or any healthy peer that has a copy). |
| 216 | +4. **The puller fetches over the cluster protocol**, streams bytes, verifies the SHA-256 against the manifest, and writes to local storage. Checksum mismatches trigger retries; failed pulls fall back to other peers. |
| 217 | +5. **On node restart**, a startup catch-up walker reconciles the local manifest against the Raft FSM and pulls any files the node missed. |
| 218 | + |
| 219 | +### Key points |
| 220 | + |
| 221 | +- **`ARC_CLUSTER_REPLICATION_ENABLED=true`** is required — this enables the file manifest and peer puller. |
| 222 | +- **Each node has its own volume.** No shared volume, no NFS, no clustered filesystem — the replication is the primary data-plane mechanism. |
| 223 | +- **Shared secret is mandatory.** Peer fetch requests are HMAC-authenticated with the shared secret; Arc refuses to start replication without one. |
| 224 | +- **Raft leader is the writer by default.** `ARC_CLUSTER_RAFT_BOOTSTRAP=true` on the writer makes it bootstrap Raft; other nodes join via the seed. Non-leader nodes forward manifest commands to the leader transparently. |
| 225 | + |
| 226 | +### Compacted file distribution |
| 227 | + |
| 228 | +Compaction on local storage works the same way as ingest: |
| 229 | + |
| 230 | +1. The compactor reads source Parquet files (from local storage, pulling from peers if missing). |
| 231 | +2. It produces a compacted output, writes it to its own local disk. |
| 232 | +3. It registers the new file in the Raft manifest and marks the source files as deleted. |
| 233 | +4. Every other node sees the manifest change: readers pull the compacted bytes from the compactor, and delete their local copies of the source files. |
| 234 | + |
| 235 | +## Security Notes |
| 236 | + |
| 237 | +Both patterns share the same security posture: |
| 238 | + |
| 239 | +- **Shared secret authentication** (`cluster.shared_secret`) — required for peer discovery and, in Pattern B, for all peer file fetches. Arc refuses to boot if replication is enabled without a shared secret. |
| 240 | +- **TLS encryption** (`cluster.tls_enabled`) — optional but recommended. Encrypts the inter-node coordinator protocol, Raft transport, and peer file transfers. |
| 241 | +- **Role-based authorization on manifest mutations** — only nodes with `CanIngest` (writers) or `CanCompact` (compactors) can forward `RegisterFile` / `DeleteFile` commands to the leader. Reader nodes are rejected. |
| 242 | + |
| 243 | +See [Cluster Security](/arc-enterprise/security) for full details. |
| 244 | + |
| 245 | +## Common Mistakes |
| 246 | + |
| 247 | +- **Multiple compactor nodes on shared storage.** This produces duplicate compacted outputs and double-counted query results. Use exactly one `ARC_CLUSTER_ROLE=compactor` and enable `ARC_CLUSTER_FAILOVER_ENABLED=true` for automatic failover. |
| 248 | +- **Mixing shared and local storage in the same cluster.** All nodes must agree on the storage model. Pick one per cluster. |
| 249 | +- **Forgetting `ARC_CLUSTER_REPLICATION_ENABLED=true` on local storage.** Without it, readers will query empty local directories. |
| 250 | +- **Using a shared volume (NFS, EFS) as "local" storage.** Don't — the concurrent-write semantics of a shared POSIX filesystem aren't what Arc expects, and you lose the durability guarantees of either pattern. Either go full shared object storage or full per-node local disks. |
| 251 | + |
| 252 | +## Next Steps |
| 253 | + |
| 254 | +- [Clustering Configuration Reference](/arc-enterprise/clustering) — full list of cluster config options |
| 255 | +- [Tiered Storage](/arc-enterprise/tiered-storage) — combine local hot storage with cold object storage |
| 256 | +- [Cluster Security](/arc-enterprise/security) — TLS and shared secret configuration |
0 commit comments