Skip to content

Commit 2af60be

Browse files
committed
Update NPEP-133 with Admin-tier restriction, DNS security guidance, and CEL
The PR updates the details around domainNames as discussed in KubeCon Atlanta: https://docs.google.com/document/d/1AtWQy2fNa4qXRag9cCp5_HsefD7bxKe3ea2RPn8jnSs/edit?tab=t.k47ujuef4zxk#bookmark=id.hl0pbdvwfotd - Restrict domainNames to Admin tier only with CEL validation (Baseline tier breaks the NetworkPolicy override model) - Add CEL validation for Accept-only action restriction - Update API examples to v1alpha2 ClusterNetworkPolicy format - Add Expected Behavior points for DNS reachability and resolv.conf - Add Recommended Behavior section covering DNS security concerns, grace period, and DNSSEC best practice - Add Connection Lifecycle section for policy add/remove semantics
1 parent 1c9af17 commit 2af60be

2 files changed

Lines changed: 318 additions & 94 deletions

File tree

npeps/npep-133-fqdn-egress-selector.md

Lines changed: 159 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -16,27 +16,28 @@ Names](https://www.wikipedia.org/wiki/Fully_qualified_domain_name) (FQDNs).
1616
(for example `kubernetes.io`).
1717
* Support basic wildcard matching capabilities when specifying FQDNs (for
1818
example `*.cloud-provider.io`)
19-
* Currently only `Accept` type rules are proposed.
20-
* Safely enforcing `Deny` rules based on FQDN selectors is difficult as there
19+
* Currently only `ACCEPT` type rules are proposed.
20+
* Safely enforcing `DENY` rules based on FQDN selectors is difficult as there
2121
is no guarantee a Network Policy plugin is aware of all IPs backing a FQDN
2222
policy. If a Network Policy plugin has incomplete information, it may
2323
accidentally allow traffic to an IP belonging to a denied domain. This would
2424
constitute a security breach.
2525

26-
By contrast, `Accept` rules, which may also have an incomplete list of IPs,
26+
By contrast, `ACCEPT` rules, which may also have an incomplete list of IPs,
2727
would not create a security breach. In case of incomplete information, valid
2828
traffic would be dropped as the plugin believes the destination IP does not
2929
belong to the domain. While this is definitely undesirable, it is at least
3030
not an unsafe failure.
3131

32-
* Currently only Admin tier of ClusterNetworkPolicy is the intended scope for this
33-
proposal.
34-
* Since Kubernetes NetworkPolicy does not have a FQDN selector, adding this
35-
capability to Baseline tier could result in writing baseline rules that can't
36-
be replicated by an overriding NetworkPolicy. For example, if Baseline tier
37-
allows traffic to `example.io`, but the namespace admin installs a Kubernetes
38-
Network Policy, the namespace admin has no way to replicate the `example.io`
39-
selector using just Kubernetes Network Policies.
32+
* DomainNames is restricted to the Admin tier of ClusterNetworkPolicy only.
33+
* Since Kubernetes NetworkPolicy does not have a FQDN selector, using
34+
domainNames in the Baseline tier would allow writing baseline rules that
35+
can't be replicated by an overriding NetworkPolicy. For example, if a
36+
Baseline-tier ClusterNetworkPolicy allows traffic to `example.io`, but
37+
the namespace admin installs a Kubernetes NetworkPolicy, the namespace
38+
admin has no way to replicate the `example.io` selector using just
39+
Kubernetes NetworkPolicies. This breaks the fundamental tier override
40+
model where NetworkPolicy can always override Baseline-tier rules.
4041

4142
## Non-Goals
4243

@@ -93,15 +94,29 @@ goal in this case is to ensure we do not make these unimplementable down the
9394
line.
9495

9596
* As a cluster admin, I want to switch the default disposition of the cluster to
96-
be default deny. This is enforced using a `Baseline` tier in ClusterNetworkPolicy`.
97-
I also want individual namespace owners to be able to specify their egress peers.
98-
Namespace admins would then use a FQDN selector in the Kubernetes
99-
`NetworkPolicy` objects to allow `my-service.com`.
97+
be default deny. This is enforced using a Baseline-tier
98+
`ClusterNetworkPolicy`. I also want individual namespace owners to be able to
99+
specify their egress peers. Namespace admins would then use a FQDN selector
100+
in the Kubernetes `NetworkPolicy` objects to allow `my-service.com`.
100101

101102
## API
102103

103-
This NPEP proposes adding a new type of `ClusterNetworkPolicyEgressPeer` called
104-
`FQDNPeerSelector` which allows specifying domain names.
104+
This NPEP proposes adding a `DomainNames` field to
105+
`ClusterNetworkPolicyEgressPeer` which allows specifying domain names as
106+
egress peers. DomainNames is only available with Accept rules in the Admin
107+
tier of ClusterNetworkPolicy.
108+
109+
These restrictions are enforced via CEL validation on
110+
`ClusterNetworkPolicySpec` (Baseline tier) and
111+
`ClusterNetworkPolicyEgressRule` (Accept-only):
112+
113+
```go
114+
// +kubebuilder:validation:XValidation:rule="self.tier == 'Baseline' ? !self.egress.exists(rule, rule.to.exists(peer, has(peer.domainNames))) : true",message="domainNames cannot be used in Baseline tier as NetworkPolicy cannot override FQDN rules"
115+
type ClusterNetworkPolicySpec struct { ... }
116+
117+
// +kubebuilder:validation:XValidation:rule="self.action != 'Accept' ? !self.to.exists(peer, has(peer.domainNames)) : true",message="domainNames may only be used with Accept action"
118+
type ClusterNetworkPolicyEgressRule struct { ... }
119+
```
105120

106121
```golang
107122

@@ -130,19 +145,24 @@ type DomainName string
130145
type ClusterNetworkPolicyEgressPeer struct {
131146
<snipped>
132147
// DomainNames provides a way to specify domain names as peers.
133-
//
134-
// DomainNames is only supported for Allow rules. In order to control
135-
// access, DomainNames Allow rules should be used with a lower priority
136-
// egress deny -- this allows the admin to maintain an explicit "allowlist"
137-
// of reachable domains.
148+
//
149+
// DomainNames is only supported for Accept rules in the Admin tier.
150+
// In order to control access, DomainNames Accept rules should be used
151+
// with a lower precedence egress deny -- this allows the admin to
152+
// maintain an explicit "allowlist" of reachable domains.
153+
//
154+
// DomainNames cannot be used in the Baseline tier because Kubernetes
155+
// NetworkPolicy has no FQDN selector, so a Baseline FQDN rule cannot
156+
// be overridden by a NetworkPolicy.
138157
//
139158
// Support: Extended
140159
//
141160
// <network-policy-api:experimental>
142161
// +optional
143162
// +listType=set
144163
// +kubebuilder:validation:MinItems=1
145-
DomainNames []Domain `json:"domainNames,omitempty"`
164+
// +kubebuilder:validation:MaxItems=25
165+
DomainNames []DomainName `json:"domainNames,omitempty"`
146166
}
147167
```
148168

@@ -151,11 +171,12 @@ type ClusterNetworkPolicyEgressPeer struct {
151171
#### Pods in `monitoring` namespace can talk to `my-service.com` and `*.cloud-provider.io`
152172

153173
```yaml
154-
apiVersion: policy.networking.k8s.io/v1alpha1
174+
apiVersion: policy.networking.k8s.io/v1alpha2
155175
kind: ClusterNetworkPolicy
156176
metadata:
157177
name: allow-my-service-egress
158178
spec:
179+
tier: Admin
159180
priority: 55
160181
tier: Admin
161182
subject:
@@ -164,7 +185,7 @@ spec:
164185
kubernetes.io/metadata.name: "monitoring"
165186
egress:
166187
- name: "allow-to-my-service"
167-
action: "Allow"
188+
action: "Accept"
168189
to:
169190
- domainNames:
170191
- "my-service.com"
@@ -179,14 +200,16 @@ spec:
179200
180201
There are a couple ways to maintain an allowlist:
181202
182-
This example, includes the Deny rule in the same CNP object. It's also possible
183-
to use another CNP object with a lower priority (e.g. `100` in this example):
203+
This example includes the Deny rule in the same ClusterNetworkPolicy object.
204+
It's also possible to use another ClusterNetworkPolicy object with a lower
205+
priority (e.g. `100` in this example):
184206
```yaml
185-
apiVersion: policy.networking.k8s.io/v1alpha1
207+
apiVersion: policy.networking.k8s.io/v1alpha2
186208
kind: ClusterNetworkPolicy
187209
metadata:
188210
name: allow-my-service-egress
189211
spec:
212+
tier: Admin
190213
priority: 55
191214
tier: Admin
192215
subject:
@@ -195,30 +218,32 @@ spec:
195218
kubernetes.io/metadata.name: "monitoring"
196219
egress:
197220
- name: "allow-to-my-service"
198-
action: "Allow"
221+
action: "Accept"
199222
to:
200223
- domainNames:
201224
- "my-service.com"
202225
- "*.cloud-provider.io"
203226
protocols:
204227
- tcp:
205-
destinationPort:
206-
number: 443
228+
destinationPort:
229+
number: 443
207230
- name: "default-deny"
208231
action: "Deny"
209232
to:
210233
- networks:
211234
- "0.0.0.0/0"
235+
- "::/0"
212236
```
213237

214-
This example uses a default-deny Baseline ClusterNetworkPolicy to create the
215-
allowlist:
238+
This example uses a Baseline-tier default-deny ClusterNetworkPolicy to create
239+
the allowlist:
216240
```yaml
217-
apiVersion: policy.networking.k8s.io/v1alpha1
241+
apiVersion: policy.networking.k8s.io/v1alpha2
218242
kind: ClusterNetworkPolicy
219243
metadata:
220244
name: allow-my-service-egress
221245
spec:
246+
tier: Admin
222247
priority: 55
223248
tier: Admin
224249
subject:
@@ -227,36 +252,40 @@ spec:
227252
kubernetes.io/metadata.name: "monitoring"
228253
egress:
229254
- name: "allow-to-my-service"
230-
action: "Allow"
255+
action: "Accept"
231256
to:
232257
- domainNames:
233258
- "my-service.com"
234259
- "*.cloud-provider.io"
235260
protocols:
236261
- tcp:
237-
destinationPort:
238-
number: 443
262+
destinationPort:
263+
number: 443
239264
---
240-
apiVersion: policy.networking.k8s.io/v1alpha1
265+
apiVersion: policy.networking.k8s.io/v1alpha2
241266
kind: ClusterNetworkPolicy
242267
metadata:
243-
name: default
268+
name: default-deny
244269
spec:
245270
tier: Baseline
271+
priority: 0
246272
subject:
247273
namespaces: {}
248-
ingress:
249-
- action: Deny
250-
to:
251-
- networks:
252-
- "0.0.0.0/0"
274+
egress:
275+
- name: "default-deny"
276+
action: "Deny"
277+
to:
278+
- networks:
279+
- "0.0.0.0/0"
280+
- "::/0"
253281
```
254282

255283
### Expected Behavior
256284

257285
1. A FQDN egress policy does not grant the workload permission to communicate
258286
with any in-cluster DNS services (like `kube-dns`). A separate rule needs to
259-
be configured to allow traffic to any DNS servers.
287+
be configured to allow traffic to any DNS servers. FQDN policies are not
288+
expected to work if the pod cannot reach DNS.
260289
1. FQDN policies should not affect the ability of workloads to resolve domains,
261290
only their ability to communicate with the IP backing them. Put another way,
262291
FQDN policies should not result in any form of DNS filtering.
@@ -268,6 +297,10 @@ spec:
268297
considered authoritative for resolving domain names. This could be the
269298
`kube-dns` Service or potentially some other DNS provider specified in the
270299
implementation's configuration.
300+
1. Pods are expected to use the DNS configuration provided via `resolv.conf`
301+
(i.e. the canonical cluster DNS server). If a pod uses a different DNS
302+
server (e.g. hardcoded `8.8.8.8`), FQDN rule processing is not guaranteed
303+
to work.
271304
1. DNS record querying and lifetimes:
272305
* Pods are expected to make a DNS query for a domain before sending traffic
273306
to it. If the Pod fails to send a DNS request and instead just sends
@@ -277,9 +310,12 @@ spec:
277310
establish new connection using DNS records that are expired is not
278311
guaranteed to work.
279312
* When the TTL for a DNS record expires, the implementor should stop
280-
allowing new connections to that IP. Existing connection will still be
313+
allowing new connections to that IP. Existing connections will still be
281314
allowed (that's consistent with NetworkPolicy behavior on long-running
282-
connections).
315+
connections).
316+
* If a DNS record is refreshed before the TTL expires (e.g., due to a new
317+
DNS query from the workload), the TTL timer should be reset based on the
318+
new response.
283319
1. Implementations must support at least 100 unique IPs (either IPv4 or IPv6)
284320
for each domain. This is true for both explicitly specified domains, as well
285321
as for each domain selected by a wild-card rule. For example, the rule
@@ -347,6 +383,82 @@ spec:
347383
The implementer can still deny traffic to `1.2.3.4` because no single
348384
response contained the full chain required to resolve the domain.
349385

386+
### Recommended Behavior
387+
388+
The following recommendations are based on operational experience from existing
389+
implementations and feedback gathered during KubeCon Atlanta 2025.
390+
391+
1. Pods should only make DNS requests to the canonical DNS server, because
392+
FQDN rules for a pod are only guaranteed to work after that pod makes an
393+
appropriate DNS request to the canonical DNS server (see [Expected
394+
Behavior](#expected-behavior)). If a pod queries an alternate DNS server
395+
(e.g. hardcoded `8.8.8.8`), the implementation may not observe the DNS
396+
response, and the FQDN Accept rule will simply not take effect. This is
397+
a functionality issue, not a security breach -- the pod's traffic is
398+
denied, not accidentally allowed. For example, when a pod queries the
399+
canonical DNS server for `my-service.com` and receives `1.2.3.4`, the
400+
implementation observes that response and learns that `my-service.com`
401+
maps to `1.2.3.4`. It can then enforce the FQDN Accept rule by allowing
402+
traffic to `1.2.3.4`. If the pod instead queries `8.8.8.8`, the
403+
implementation never sees the response and has no IP to allow -- the rule
404+
simply doesn't take effect.
405+
1. Administrators should create a high-precedence Admin-tier rule allowing
406+
egress DNS traffic to the canonical DNS server (e.g. `kube-dns` in
407+
`kube-system`), to ensure that DNS is not blocked by other deny rules.
408+
For example:
409+
```yaml
410+
apiVersion: policy.networking.k8s.io/v1alpha2
411+
kind: ClusterNetworkPolicy
412+
metadata:
413+
name: allow-dns
414+
spec:
415+
tier: Admin
416+
priority: 50
417+
subject:
418+
namespaces:
419+
matchLabels:
420+
requires-fqdn-policy: "true"
421+
egress:
422+
- name: "allow-dns"
423+
action: "Accept"
424+
to:
425+
- namespaces:
426+
matchLabels:
427+
kubernetes.io/metadata.name: "kube-system"
428+
protocols:
429+
- udp:
430+
destinationPort:
431+
number: 53
432+
- tcp:
433+
destinationPort:
434+
number: 53
435+
```
436+
1. Implementations that operate by snooping DNS responses on the wire MUST
437+
only trust responses originating from the canonical DNS server. Trusting
438+
responses from arbitrary sources is a security concern: a malicious actor
439+
could forge DNS responses to trick the implementation into allowing
440+
traffic to unintended IPs.
441+
1. Implementations MAY provide a configurable grace period beyond the TTL to
442+
accommodate DNS propagation delays and client-side caching.
443+
1. Although securing DNS resolution is a non-goal of this NPEP, implementations
444+
are recommended to consider mitigations for DNS cache poisoning (e.g.,
445+
DNSSEC validation by the canonical DNS server) when documenting their trust
446+
model.
447+
448+
## Connection Lifecycle on Policy Updates
449+
450+
When FQDN policies change, implementations must handle in-flight connections
451+
gracefully. See the [Expected Behavior](#expected-behavior) section for DNS
452+
record updates.
453+
454+
* **Policy addition**: New connections matching the FQDN are allowed once DNS
455+
resolution for the domain has been observed. Connections to IPs that haven't
456+
been observed via DNS are not guaranteed to be allowed.
457+
* **Policy removal**: Existing established connections SHOULD be allowed to
458+
complete gracefully. New connections SHOULD be denied after the policy is
459+
removed. This is consistent with the general NetworkPolicy behavior for
460+
long-running connections.
461+
350462
## Alternatives
351463

352464
### IP Block Selector

0 commit comments

Comments
 (0)