Skip to content

Commit 714db54

Browse files
committed
docs: update discovery service docs to multi-doc
- We allow configuring multiple discovery service endpoints now, expressed through multiple DiscoveryServiceConfig documents. - `.cluster.discovery` is deprecated, but it's the only way to perform discovery through Kubernetes registry (which is also deprecated). - Moves discovery service to the top, and kubernetes registry to the bottom, in the "Registries" section. The discovery service is the most relevant nowadays. - Removes the video walkthrough, as it instructs the user to use the deprecated `.cluster.discovery` block. Signed-off-by: Maja Bojarska <maja.bojarska@siderolabs.com>
1 parent a74d188 commit 714db54

3 files changed

Lines changed: 97 additions & 57 deletions

File tree

public/talos/v1.14/configure-your-talos-cluster/system-configuration/discovery.mdx

Lines changed: 92 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ aliases:
66
canonical: https://docs.siderolabs.com/talos/v1.13/configure-your-talos-cluster/system-configuration/discovery
77
---
88

9-
import { VersionWarningBanner } from "/snippets/version-warning-banner.jsx"
10-
import { release_v1_12 } from '/snippets/custom-variables.mdx';
9+
import { VersionWarningBanner } from "/snippets/version-warning-banner.jsx";
10+
import { release_v1_12 } from "/snippets/custom-variables.mdx";
1111

1212
<VersionWarningBanner />
1313

@@ -17,39 +17,111 @@ Without discovery, nodes have no built-in way to learn about other cluster membe
1717

1818
When discovery is enabled, this information is shared and kept up to date across all nodes. This allows Talos to form a cluster and, when enabled, establish encrypted [KubeSpan](../../networking/kubespan) tunnels and support [KubePrism](../../../../kubernetes-guides/advanced-guides/kubeprism) peer endpoint discovery.
1919

20+
## Registries
21+
2022
Discovery works through a **registry**, a backend that nodes publish their connection information to and read peer information from. Talos supports two registry types:
2123

22-
- **Service registry**: Nodes publish to and read from an external discovery service. This is enabled by default and does not depend on Kubernetes or etcd, so it continues to work even when Kubernetes is unavailable.
23-
- **Kubernetes registry**: Nodes publish discovery data as annotations on Kubernetes `Node` resources. This is disabled by default.
24+
- **Service registry**: Nodes publish to and read from an external discovery service. This is configured with a [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) document, enabled by default, and does not depend on Kubernetes or etcd, so it continues to work even when Kubernetes is unavailable.
25+
- **Kubernetes registry**: Nodes publish discovery data as annotations on Kubernetes `Node` resources. This is deprecated and disabled by default.
2426

2527
<Warning>
26-
The Kubernetes registry is deprecated. Starting with Kubernetes 1.32, the `AuthorizeNodeWithSelectors` feature gate restricts `Node` resource read access in a way that prevents the Kubernetes registry from functioning correctly. Disabling the feature gate is not recommended as it removes other important security protections.
28+
The Kubernetes registry is deprecated. Starting with Kubernetes 1.32, the
29+
`AuthorizeNodeWithSelectors` feature gate restricts `Node` resource read
30+
access in a way that prevents the Kubernetes registry from functioning
31+
correctly. Disabling the feature gate is not recommended as it removes other
32+
important security protections.
2733
</Warning>
2834

29-
## Video walkthrough
35+
By default, Talos uses the service registry. Peers are aggregated from all enabled registries.
3036

31-
To see a live demo of cluster discovery, see the video below:
37+
### Service registry
3238

33-
<iframe width="560" height="315" src="https://www.youtube.com/embed/GCBTrHhjawY" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
39+
The service registry uses a public external discovery service to exchange encrypted information about cluster members. Sidero Labs maintains a public instance at `https://discovery.talos.dev/`. Organizations that require private infrastructure can self-host the discovery service under a commercial license.
3440

35-
## Registries
41+
Cluster members use a globally unique shared key to coordinate basic connection information, the set of possible endpoints (IP:port pairs). Talos refers to this as **affiliate data**. All affiliate data is encrypted by Talos Linux before being sent to the discovery service and can only be decrypted by cluster members. The discovery service never has access to the encryption key.
42+
43+
<Note>
44+
When KubeSpan is enabled, affiliate data also includes the node's WireGuard
45+
public key.
46+
</Note>
47+
48+
Data is encrypted as follows:
49+
50+
- Affiliate data is encrypted with AES-GCM encryption.
51+
- Endpoint data is separately encrypted with AES in ECB mode, allowing endpoints from different sources to be deduplicated server-side.
52+
53+
Each node submits its own data plus the endpoints it observes from other peers. The discovery service aggregates this, deduplicates endpoints, and distributes updates to all connected peers. Peers decrypt the data locally and use it to drive cluster discovery and [KubeSpan](../../networking/kubespan).
54+
55+
Data is stored only in memory, with encrypted snapshots written to disk to enable fast recovery after restarts.
56+
The cluster ID is a random value generated as part of the cluster secrets in the machine configuration. It is used by the discovery service to separate affiliates between different clusters.
57+
58+
The discovery service is aware of the client version, cluster ID, number of affiliates, encrypted affiliate data, and a list of encrypted endpoints. However, it never has access to the actual node information.
59+
60+
Nodes must be able to reach the discovery service on TCP port 443. For organizations that require it, the discovery service may be self-hosted under a commercial license and [downloaded from GitHub](https://github.com/siderolabs/discovery-service).
61+
62+
The service registry is enabled by the presence of a [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) document. A freshly generated machine configuration includes one named `default` that points at the public discovery service:
63+
64+
```yaml
65+
apiVersion: v1alpha1
66+
kind: DiscoveryServiceConfig
67+
name: default
68+
endpoint: https://discovery.talos.dev/
69+
```
70+
71+
To use a self-hosted discovery service, append the following document to the machine configuration with the `endpoint` set to your own instance:
3672

37-
By default, Talos uses the `service` registry. The `kubernetes` registry is disabled by default. Peers are aggregated from all enabled registries.
73+
```yaml
74+
apiVersion: v1alpha1
75+
kind: DiscoveryServiceConfig
76+
name: self-hosted
77+
endpoint: https://discovery.example.com/
78+
```
3879

39-
To disable a registry, set `disabled: true` in the cluster configuration. For example, to disable the `service` registry:
80+
For a highly available setup, append additional [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) documents, each with a unique `name` and `endpoint`. Endpoints must use the `http://`, `https://` or `grpc://` scheme. Nodes publish to and read from every configured discovery service, so the cluster continues to discover peers as long as at least one of them is reachable:
81+
82+
```yaml
83+
apiVersion: v1alpha1
84+
kind: DiscoveryServiceConfig
85+
name: discovery-1
86+
endpoint: https://discovery-1.example.com/
87+
---
88+
apiVersion: v1alpha1
89+
kind: DiscoveryServiceConfig
90+
name: discovery-2
91+
endpoint: https://discovery-2.example.com/
92+
```
93+
94+
To disable the discovery service, remove all [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) documents from the machine configuration. With no service registry and no Kubernetes registry, [member discovery is effectively disabled](#what-changes-when-discovery-is-disabled).
95+
96+
### Kubernetes registry
97+
98+
The Kubernetes registry has no [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) equivalent and should not be used in new clusters. It can only be enabled through the deprecated `.cluster.discovery` configuration block, where it is disabled by default:
4099

41100
```yaml
42101
cluster:
43102
discovery:
44103
enabled: true
45104
registries:
46-
service:
47-
disabled: true
105+
kubernetes:
106+
disabled: false
48107
```
49108

50-
[Disabling all registries](#what-changes-when-discovery-is-disabled) effectively disables member discovery entirely.
109+
<Note>
110+
The [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) documents and the legacy `.cluster.discovery` configuration block are mutually exclusive. A machine configuration must not contain both.
111+
</Note>
51112

52-
### Kubernetes registry
113+
A configuration that uses the legacy `.cluster.discovery` block must also configure the service registry there (under `registries.service`) rather than with a [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) document:
114+
115+
```yaml
116+
cluster:
117+
discovery:
118+
enabled: true
119+
registries:
120+
kubernetes:
121+
disabled: false
122+
service:
123+
endpoint: https://discovery.talos.dev/
124+
```
53125

54126
The Kubernetes registry stores discovery data as annotations on Kubernetes `Node` resources:
55127

@@ -66,33 +138,9 @@ Annotations: cluster.talos.dev/node-id: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yz
66138
...
67139
```
68140

69-
### Service registry
70-
71-
The service registry uses a public external discovery service to exchange encrypted information about cluster members. Sidero Labs maintains a public instance at `https://discovery.talos.dev/`. Organizations that require private infrastructure can self-host the discovery service under a commercial license.
72-
73-
Cluster members use a globally unique shared key to coordinate basic connection information, the set of possible endpoints (IP:port pairs). Talos refers to this as **affiliate data**. All affiliate data is encrypted by Talos Linux before being sent to the discovery service and can only be decrypted by cluster members. The discovery service never has access to the encryption key.
74-
75-
<Note>
76-
When KubeSpan is enabled, affiliate data also includes the node's WireGuard public key.
77-
</Note>
78-
79-
Data is encrypted as follows:
80-
81-
- Affiliate data is encrypted with AES-GCM encryption.
82-
- Endpoint data is separately encrypted with AES in ECB mode, allowing endpoints from different sources to be deduplicated server-side.
83-
84-
Each node submits its own data plus the endpoints it observes from other peers. The discovery service aggregates this, deduplicates endpoints, and distributes updates to all connected peers. Peers decrypt the data locally and use it to drive cluster discovery and [KubeSpan](../../networking/kubespan).
85-
86-
Data is stored only in memory, with encrypted snapshots written to disk to enable fast recovery after restarts.
87-
The cluster ID is a random value generated as part of the cluster secrets in the machine configuration. It is used by the discovery service to separate affiliates between different clusters.
88-
89-
The discovery service is aware of the client version, cluster ID, number of affiliates, encrypted affiliate data, and a list of encrypted endpoints. However, it never has access to the actual node information.
90-
91-
Nodes must be able to reach the discovery service on TCP port 443. For organisations that require it, the discovery service may be self-hosted under a commercial license and [downloaded from GitHub](https://github.com/siderolabs/discovery-service).
92-
93141
## What changes when discovery is disabled
94142

95-
Talos can operate with discovery disabled, but this affects several features and behaviours:
143+
Discovery is disabled by removing all [`DiscoveryServiceConfig`](../../reference/configuration/cluster/discoveryserviceconfig) documents from the machine configuration (and not configuring the legacy `.cluster.discovery` block). Talos can operate with discovery disabled, but this affects several features and behaviours:
96144

97145
- [KubeSpan](../../networking/kubespan) and KubePrism require discovery and do not function correctly without it.
98146
- Initial cluster bootstrap and recovery may take longer, as peer and control plane endpoints are not available from discovery.
@@ -116,7 +164,9 @@ If a node reboots while the discovery service is unavailable, it loses all in-me
116164
If the outage exceeds the TTL, all discovery records expire. When the service comes back online, it may return an empty dataset. Nodes receiving this update drop their existing peer information, which can temporarily disrupt KubeSpan connectivity. Recovery is automatic, nodes republish their data, peer information is rebuilt, and connectivity is restored without manual intervention.
117165

118166
<Note>
119-
When KubeSpan is enabled, WireGuard keys are generated on boot and not persisted to disk. A rebooted node must publish its new public key via the discovery service before peers can establish tunnels to it.
167+
When KubeSpan is enabled, WireGuard keys are generated on boot and not
168+
persisted to disk. A rebooted node must publish its new public key via the
169+
discovery service before peers can establish tunnels to it.
120170
</Note>
121171

122172
## Inspect discovery resources
@@ -196,7 +246,7 @@ talosctl get members
196246
You should see an output similar to:
197247

198248
<CodeBlock lang="sh">
199-
{`
249+
{`
200250
ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
201251
talos-default-controlplane-1 2 talos-default-controlplane-1 controlplane Talos ${release_v1_12} ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]
202252
talos-default-controlplane-2 1 talos-default-controlplane-2 controlplane Talos ${release_v1_12} ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]

public/talos/v1.14/networking/kubespan.mdx

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -90,35 +90,25 @@ For advanced eBPF features without these constraints, use Cilium's built-in Wire
9090
9191
### Creating a new cluster
9292
93-
To enable KubeSpan for a new cluster, we can use the `--with-kubespan` flag in `talosctl gen config`.
94-
This will enable peer discovery and KubeSpan.
93+
To enable KubeSpan for a new cluster, we can use the `--with-kubespan` flag in `talosctl gen config`. This will add the following `KubeSpanConfig` document to the machine config:
9594

9695
```yaml
97-
cluster:
98-
discovery:
99-
enabled: true
100-
# Configure registries used for cluster member discovery.
101-
registries:
102-
kubernetes: # Kubernetes registry is problematic with KubeSpan, if the control plane endpoint is routable itself via KubeSpan.
103-
disabled: true
104-
service: {}
10596
---
10697
apiVersion: v1alpha1
10798
kind: KubeSpanConfig
10899
enabled: true # Enable the KubeSpan feature.
109100
```
110101

102+
[Discovery](../configure-your-talos-cluster/system-configuration/discovery) is enabled by default for new clusters, so no additional configuration is needed.
103+
111104
> The default discovery service is an external service hosted by Sidero Labs at `https://discovery.talos.dev/`.
112105
> Contact Sidero Labs if you need to run this service privately.
113106

114107
### Enabling for an existing cluster
115108

116-
In order to enable KubeSpan on an existing cluster, add `KubeSpanConfig` and `discovery` settings in the machine config for each machine in the cluster (`discovery` is enabled by default):
109+
In order to enable KubeSpan on an existing cluster, add a `KubeSpanConfig` document to the machine config and ensure the discovery service is enabled for every machine in the cluster ([discovery service](../configure-your-talos-cluster/system-configuration/discovery) is enabled by default):
117110

118111
```yaml
119-
cluster:
120-
discovery:
121-
enabled: true
122112
---
123113
apiVersion: v1alpha1
124114
kind: KubeSpanConfig

public/talos/v1.14/platform-specific-installations/air-gapped.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,5 +41,5 @@ See the [guide on running Image Factory in air-gapped environments](../../../omn
4141

4242
## Discovery service
4343

44-
Talos Linux by default uses the public Discovery Service at `discovery.talos.dev` to facilitate cluster bootstrapping and node discovery.
44+
Talos Linux by default uses the public [Discovery Service](../configure-your-talos-cluster/system-configuration/discovery) at `discovery.talos.dev` to facilitate cluster bootstrapping and node discovery.
4545
In air-gapped environments, it is recommended to run a self-hosted instance of the Discovery Service (requires a license from Sidero Labs).

0 commit comments

Comments
 (0)