Skip to content

Commit 6b7abdf

Browse files
authored
Merge branch 'main' into docs-2026-03-26-architecture-rbd
2 parents b7fd7b3 + 8663f22 commit 6b7abdf

3 files changed

Lines changed: 348 additions & 0 deletions

File tree

docs/architecture/arbiter.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
---
2+
title: Arbiter
3+
---
4+
5+
# Arbiter
6+
7+
## About This Project
8+
9+
The external-arbiter-operator (Arbiter) works with Rook-provisioned Ceph
10+
clusters to deploy external arbiters (monitors) that are not managed by Rook
11+
but that participate in consensus.
12+
13+
The operator also monitors the remote cluster to verify its availability and
14+
ensure that the tenant has sufficient permissions to handle the deployment of
15+
Arbiter.
16+
17+
## Requirements and Setup
18+
19+
### Required Tools
20+
21+
The following tools are required on your development machine:
22+
23+
- `sed`
24+
- `openssl`
25+
- `make`
26+
- `git`
27+
- `golang`
28+
- `lima` (or another method to provision Kubernetes locally, such as Minikube)
29+
- `kubectl`
30+
- `docker` (or any compatible container engine, such as Podman)
31+
- `helm`
32+
33+
The remaining dependencies are provisioned via Go tools, including the
34+
Kubebuilder toolset.
35+
36+
## Quick Start
37+
38+
What follows is a quick walkthrough on how to prepare the environment, run the
39+
operator locally, and deploy an external monitor.
40+
41+
### Clone and Setup
42+
43+
```bash
44+
# Clone the Rook repository: https://github.com/rook/rook
45+
46+
#Run `make deps`:
47+
make deps
48+
49+
# Create OSD for Ceph
50+
limactl disk create osd --size=8G
51+
52+
# Create VM instance
53+
limactl create --name=k8s ./contrib/vm.yaml
54+
55+
# Start VM
56+
limactl start k8s
57+
58+
# Use kubeconfig provided by VM
59+
export KUBECONFIG="${HOME}/.lima/k8s/copied-from-guest/kubeconfig.yaml"
60+
```
61+
62+
### Install Prerequisites
63+
64+
```bash
65+
# Install cert-manager
66+
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.19.2/cert-manager.yaml
67+
68+
# Install Rook operator
69+
kubectl apply -f ./rook/deploy/examples/crds.yaml
70+
kubectl apply -f ./rook/deploy/examples/common.yaml
71+
kubectl apply -f ./rook/deploy/examples/operator.yaml
72+
kubectl apply -f ./rook/deploy/examples/csi-operator.yaml
73+
74+
# Create Ceph cluster
75+
kubectl apply -f ./rook/deploy/examples/cluster-test.yaml
76+
77+
# (Optional) Install Ceph toolbox
78+
kubectl apply -f ./rook/deploy/examples/toolbox.yaml
79+
```
80+
81+
### Build and Install Operator
82+
83+
```bash
84+
# Build image
85+
limactl shell k8s sudo nerdctl --namespace k8s.io build \
86+
-t localhost:5000/cobaltcore-dev/external-arbiter-operator:latest \
87+
-f ./Dockerfile .
88+
89+
# Dry run operator install via Helm
90+
helm install --dry-run --create-namespace --namespace arbiter-operator \
91+
--values ./contrib/charts/external-arbiter-operator/local.yaml \
92+
arbiter-operator ./contrib/charts/external-arbiter-operator
93+
94+
# Install operator via Helm chart
95+
helm install --create-namespace --namespace arbiter-operator \
96+
--values ./contrib/charts/external-arbiter-operator/local.yaml \
97+
arbiter-operator ./contrib/charts/external-arbiter-operator
98+
```
99+
100+
### Configure and Deploy Arbiter
101+
102+
```bash
103+
# Create namespace, user, role, rolebinding, kubeconfig and secret for arbiter
104+
./hack/configure-k8s-user.sh
105+
106+
# Create secret with remote cluster access configuration
107+
kubectl apply -f ./contrib/k8s/examples/secret.yaml -n arbiter-operator
108+
109+
# Create remote cluster resource
110+
kubectl apply -f ./contrib/k8s/examples/remote-cluster.yaml -n arbiter-operator
111+
112+
# Create remote arbiter resource
113+
kubectl apply -f ./contrib/k8s/examples/remote-arbiter.yaml -n arbiter-operator
114+
115+
# Watch until Arbiter is ready
116+
kubectl get remotearbiter -n arbiter-operator -w
117+
118+
# Check that Arbiter has joined quorum
119+
kubectl exec deployment/rook-ceph-tools -n rook-ceph -it -- ceph mon dump
120+
```
121+
122+
### Cleanup
123+
124+
```bash
125+
# Remove Helm chart
126+
helm uninstall --namespace arbiter-operator arbiter-operator
127+
128+
# Stop VM
129+
limactl stop k8s
130+
131+
# Delete VM
132+
limactl delete k8s
133+
```
134+
135+
## Make Goals
136+
137+
Useful make commands for development:
138+
139+
```bash
140+
# Build binary
141+
make
142+
143+
# Prettify project, run linters, etc.
144+
make pretty
145+
146+
# Run tests
147+
make test
148+
149+
# Regenerate Kubernetes resources
150+
make gen
151+
152+
# Copy CRD definitions to Helm chart
153+
make helm
154+
```
155+
156+
## Configuration
157+
158+
### Deployment Configuration
159+
160+
Deployment manifests are managed by Helm. The `values.yaml` file lists all
161+
available configuration options.
162+
163+
### Resource Configuration
164+
165+
The following example resources are provided:
166+
167+
- `secret.yaml` - Kubeconfig secret for arbiter installation
168+
- `remote-cluster.yaml` - RemoteCluster resource definition
169+
- `remote-arbiter.yaml` - RemoteArbiter resource definition
170+
171+
## How to Run
172+
173+
### Prerequisites
174+
175+
Before running the operator, ensure the following conditions are met:
176+
177+
1. A Ceph cluster operated by Rook is already up and running on the source
178+
Kubernetes cluster
179+
1. Resources (pods, services) from the target (arbiter) cluster are reachable
180+
from the source (operator/Rook) cluster and vice versa
181+
182+
### Deployment Steps
183+
184+
1. Create a user on the target cluster.
185+
1. Create the target namespace on the target cluster.
186+
1. Grant the user permissions to manage deployments, secrets, configmaps, their
187+
statuses, and finalizers.
188+
1. Provision the target user kubeconfig on the source cluster via secret.
189+
1. Deploy the operator on the source cluster.
190+
1. Create a RemoteCluster resource on the source cluster, referencing the target
191+
user kubeconfig secret.
192+
1. Create a RemoteArbiter resource on the source cluster, referencing the
193+
RemoteCluster.
194+
1. Watch until resources are ready.
195+
1. Verify that the arbiter has joined the quorum by running `ceph mon dump`.
196+
197+
## See Also
198+
[Arbiter Repository](https://github.com/cobaltcore-dev/external-arbiter-operator?tab=readme-ov-file)

docs/architecture/ceph.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,134 @@ capable of addressing diverse organizational storage requirements through a
135135
single infrastructure platform. This convergence of capabilities, combined with
136136
proven integration with major virtualization and cloud platforms, establishes
137137
Ceph block devices as a viable solution for modern data center storage needs.
138+
### The Ceph Storage Cluster
139+
140+
At its core, Ceph provides an infinitely scalable storage cluster based on
141+
RADOS (Reliable Autonomic Distributed Object Store), a distributed storage
142+
service that uses the intelligence in each node to secure data and provide it
143+
to clients. A Ceph Storage Cluster consists of four daemon types: Ceph
144+
Monitors, which maintain the master copy of the cluster map; Ceph OSD Daemons,
145+
which check their own state and that of other OSDs; Ceph Managers, serving as
146+
endpoints for monitoring and orchestration; and Ceph Metadata Servers (MDS),
147+
which manage file metadata when CephFS provides file services.
148+
149+
Storage cluster clients and Ceph OSD Daemons use the CRUSH (Controlled
150+
Scalable Decentralized Placement of Replicated Data) algorithm to compute data
151+
location information, avoiding bottlenecks from central lookup tables. This
152+
algorithmic approach enables Ceph's high-level features, including a native
153+
interface to the storage cluster via librados and numerous service interfaces
154+
built atop it.
155+
156+
### Data Storage and Organization
157+
158+
The Ceph Storage Cluster receives data from clients through various
159+
interfaces—Ceph Block Device, Ceph Object Storage, CephFS, or custom
160+
implementations using librados—and stores it as RADOS objects. Each object
161+
resides on an Object Storage Device (OSD), with Ceph OSD Daemons controlling
162+
read, write, and replication operations. The default BlueStore backend stores
163+
objects in a monolithic, database-like fashion within a flat namespace, meaning
164+
objects lack hierarchical directory structures. Each object has an identifier,
165+
binary data, and name/value pair metadata, with clients determining object data
166+
semantics.
167+
168+
### Eliminating Centralization
169+
170+
Traditional architectures rely on centralized components—gateways, brokers, or
171+
APIs—that act as single points of entry, creating failure points and
172+
performance limits. Ceph eliminates these centralized components, enabling
173+
clients to interact directly with Ceph OSDs. OSDs create object replicas on
174+
other nodes to ensure data safety and high availability, while monitor clusters
175+
ensure high availability. The CRUSH algorithm replaces centralized lookup
176+
tables, providing better data management by distributing work across all OSD
177+
daemons and communicating clients, using intelligent data replication to ensure
178+
resiliency suitable for hyper-scale storage.
179+
180+
### Cluster Map and High Availability
181+
182+
For proper functioning, Ceph clients and OSDs require current cluster topology
183+
information stored in the Cluster Map, actually a collection of five maps: the
184+
Monitor Map (containing cluster fsid, monitor positions, names, addresses, and
185+
ports), the OSD Map (containing cluster fsid, pool lists, replica sizes, PG
186+
numbers, and OSD statuses), the PG Map (containing PG versions, timestamps, and
187+
placement group details), the CRUSH Map (containing storage devices, failure
188+
domain hierarchy, and traversal rules), and the MDS Map (containing MDS map
189+
epoch, metadata storage pool, and metadata server information). Each map
190+
maintains operational state change history, with Ceph Monitors maintaining
191+
master copies including cluster members, states, changes, and overall health.
192+
193+
Ceph uses monitor clusters for reliability and fault tolerance. To establish
194+
consensus about cluster state, Ceph employs the Paxos algorithm, requiring a
195+
majority of monitors to agree (one in single-monitor clusters, two in
196+
three-monitor clusters, three in five-monitor clusters, and so forth). This
197+
prevents issues when monitors fall behind due to latency or faults.
198+
199+
### Authentication and Security
200+
201+
The cephx authentication system authenticates users and daemons while
202+
protecting against man-in-the-middle attacks, though it doesn't address
203+
transport encryption or encryption at rest. Using shared secret keys, cephx
204+
enables mutual authentication without revealing keys. Like Kerberos, each
205+
monitor can authenticate users and distribute keys, eliminating single points
206+
of failure. The system issues session keys encrypted with users' permanent
207+
secret keys, which clients use to request services. Monitors provide tickets
208+
authenticating clients against OSDs handling data, with monitors and OSDs
209+
sharing secrets enabling ticket use across any cluster OSD or metadata server.
210+
Tickets expire to prevent attackers from using obtained credentials, protecting
211+
against message forgery and alteration as long as secret keys remain secure
212+
before expiration.
213+
214+
### Smart Daemons and Hyperscale
215+
216+
Ceph's architecture makes OSD Daemons and clients cluster-aware, unlike
217+
centralized storage clusters requiring double dispatches that bottleneck at
218+
petabyte-to-exabyte scale. Each Ceph OSD Daemon knows other OSDs in the
219+
cluster, enabling direct interaction with other OSDs and monitors. This
220+
awareness allows clients to interact directly with OSDs, and because monitors
221+
and OSD daemons interact directly, OSDs leverage aggregate cluster CPU and RAM
222+
resources.
223+
224+
This distributed intelligence provides several benefits: OSDs service clients
225+
directly, improving performance by avoiding centralized interface connection
226+
limits; OSDs report membership and status (up or down), with neighboring OSDs
227+
detecting and reporting failures; data scrubbing maintains consistency by
228+
comparing object metadata across replicas, with deeper scrubbing comparing data
229+
bit-for-bit against checksums to find bad drive sectors; and replication
230+
involves client-OSD collaboration, with clients using CRUSH to determine object
231+
locations, mapping objects to pools and placement groups, then writing to
232+
primary OSDs that replicate to secondary OSDs.
233+
234+
### Dynamic Cluster Management
235+
236+
Pools are logical partitions for storing objects, with clients retrieving
237+
cluster maps from monitors and writing RADOS objects to pools. CRUSH
238+
dynamically maps placement groups (PGs) to OSDs, with clients storing objects
239+
by having CRUSH map each RADOS object to a PG. This abstraction layer between
240+
OSDs and clients enables adaptive cluster growth, shrinkage, and data
241+
redistribution when topology changes. The indirection allows dynamic
242+
rebalancing when new OSDs come online.
243+
244+
Clients compute object locations rather than querying, requiring only object ID
245+
and pool name. Ceph hashes object IDs, calculates hash modulo PG numbers,
246+
retrieves pool IDs from pool names, and prepends pool IDs to PG IDs. This
247+
computation proves faster than query sessions, with CRUSH enabling clients to
248+
compute expected object locations and contact primary OSDs for storage or
249+
retrieval.
250+
251+
### Client Interfaces
252+
253+
Ceph provides three client types: Ceph Block Device (RBD) offers resizable,
254+
thin-provisioned, snapshottable block devices striped across clusters for high
255+
performance; Ceph Object Storage (RGW) provides RESTful APIs compatible with
256+
Amazon S3 and OpenStack Swift; and CephFS provides POSIX-compliant filesystems
257+
mountable as kernel objects or FUSE. Modern applications access storage through
258+
librados, which provides direct parallel cluster access supporting pool
259+
operations, snapshots, copy-on-write cloning, object read/write operations,
260+
extended attributes, key/value pairs, and object classes.
261+
262+
The architecture demonstrates how Ceph's distributed, intelligent design
263+
eliminates traditional storage limitations, enabling massive scalability while
264+
maintaining reliability and performance through algorithmic data placement,
265+
autonomous daemon operations, and direct client-storage interactions.
138266

139267
## See Also
140268
The architecture of the Ceph cluster is explained in [the Architecture

docs/architecture/chorus.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
title: Chorus
3+
---
4+
5+
# Chorus
6+
7+
Chorus is data replication software designed for Object Storage systems,
8+
supporting S3 and OpenStack Swift APIs. It enables zero-downtime migration
9+
between storage systems, maintains synchronized backups for disaster recovery,
10+
and verifies migration integrity through consistency checks.
11+
12+
Chorus operates through two main components: Chorus Proxy, an S3 proxy that
13+
captures changes, and Chorus Worker, which processes replication tasks and
14+
webhook events. Users configure storage credentials, designating one endpoint
15+
as "main" while others become "followers." Requests route through Chorus's S3
16+
API to the main storage and asynchronously replicate to follower endpoints.
17+
18+
The system supports user-level and bucket-level replication policies, allowing
19+
users to pause and resume replication via web admin UI or CLI. Chorus handles
20+
initial replication of existing data in the background and can accept change
21+
events via webhooks when proxy deployment isn't feasible, supporting S3 bucket
22+
notifications and Swift access-log events.

0 commit comments

Comments
 (0)