Skip to content

Commit b8a2a6c

Browse files
committed
Update NPEP-133 with Admin-tier restriction, DNS security guidance, and v1alpha2 API
The PR updates the details around domainNames as discussed in KubeCon Atlanta: https://docs.google.com/document/d/1AtWQy2fNa4qXRag9cCp5_HsefD7bxKe3ea2RPn8jnSs/edit?tab=t.k47ujuef4zxk#bookmark=id.hl0pbdvwfotd - Restrict domainNames to Admin tier only with CEL validation (Baseline tier breaks the NetworkPolicy override model) - Add CEL validation for Accept-only action restriction - Update API examples to v1alpha2 ClusterNetworkPolicy format - Add Expected Behavior points for DNS reachability and resolv.conf - Add Recommended Behavior section covering DNS security concerns, grace period, and DNSSEC best practice - Add Connection Lifecycle section for policy add/remove semantics
1 parent 6bb86de commit b8a2a6c

2 files changed

Lines changed: 312 additions & 118 deletions

File tree

npeps/npep-133-fqdn-egress-selector.md

Lines changed: 156 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -16,33 +16,35 @@ Names](https://www.wikipedia.org/wiki/Fully_qualified_domain_name) (FQDNs).
1616
(for example `kubernetes.io`).
1717
* Support basic wildcard matching capabilities when specifying FQDNs (for
1818
example `*.cloud-provider.io`)
19-
* Currently only `ALLOW` type rules are proposed.
19+
* Currently only `ACCEPT` type rules are proposed.
2020
* Safely enforcing `DENY` rules based on FQDN selectors is difficult as there
2121
is no guarantee a Network Policy plugin is aware of all IPs backing a FQDN
2222
policy. If a Network Policy plugin has incomplete information, it may
2323
accidentally allow traffic to an IP belonging to a denied domain. This would
2424
constitute a security breach.
2525

26-
By contrast, `ALLOW` rules, which may also have an incomplete list of IPs,
26+
By contrast, `ACCEPT` rules, which may also have an incomplete list of IPs,
2727
would not create a security breach. In case of incomplete information, valid
2828
traffic would be dropped as the plugin believes the destination IP does not
2929
belong to the domain. While this is definitely undesirable, it is at least
3030
not an unsafe failure.
3131

32-
* Currently only AdminNetworkPolicy is the intended scope for this proposal.
33-
* Since Kubernetes NetworkPolicy does not have a FQDN selector, adding this
34-
capability to BaselineAdminNetworkPolicy could result in writing baseline
35-
rules that can't be replicated by an overriding NetworkPolicy. For example,
36-
if BANP allows traffic to `example.io`, but the namespace admin installs a
37-
Kubernetes Network Policy, the namespace admin has no way to replicate the
38-
`example.io` selector using just Kubernetes Network Policies.
32+
* DomainNames is restricted to the Admin tier of ClusterNetworkPolicy only.
33+
* Since Kubernetes NetworkPolicy does not have a FQDN selector, using
34+
domainNames in the Baseline tier would allow writing baseline rules that
35+
can't be replicated by an overriding NetworkPolicy. For example, if a
36+
Baseline-tier ClusterNetworkPolicy allows traffic to `example.io`, but
37+
the namespace admin installs a Kubernetes NetworkPolicy, the namespace
38+
admin has no way to replicate the `example.io` selector using just
39+
Kubernetes NetworkPolicies. This breaks the fundamental tier override
40+
model where NetworkPolicy can always override Baseline-tier rules.
3941

4042
## Non-Goals
4143

4244
* This enhancement does not include a FQDN selector for allowing ingress
4345
traffic.
4446
* This enhancement only describes enhancements to the existing L4 filtering as
45-
provided by AdminNetworkPolicy. It does not propose any new L7 matching or
47+
provided by ClusterNetworkPolicy. It does not propose any new L7 matching or
4648
filtering capabilities, like matching HTTP traffic or URL paths.
4749
* This selector should not control what DNS records are resolvable from a
4850
particular workload.
@@ -92,15 +94,29 @@ goal in this case is to ensure we do not make these unimplementable down the
9294
line.
9395

9496
* As a cluster admin, I want to switch the default disposition of the cluster to
95-
be default deny. This is enforced using a `BaselineAdminNetworkPolicy`. I also
96-
want individual namespace owners to be able to specify their egress peers.
97-
Namespace admins would then use a FQDN selector in the Kubernetes
98-
`NetworkPolicy` objects to allow `my-service.com`.
97+
be default deny. This is enforced using a Baseline-tier
98+
`ClusterNetworkPolicy`. I also want individual namespace owners to be able to
99+
specify their egress peers. Namespace admins would then use a FQDN selector
100+
in the Kubernetes `NetworkPolicy` objects to allow `my-service.com`.
99101

100102
## API
101103

102-
This NPEP proposes adding a new type of `AdminNetworkPolicyEgressPeer` called
103-
`FQDNPeerSelector` which allows specifying domain names.
104+
This NPEP proposes adding a `DomainNames` field to
105+
`ClusterNetworkPolicyEgressPeer` which allows specifying domain names as
106+
egress peers. DomainNames is only available with Accept rules in the Admin
107+
tier of ClusterNetworkPolicy.
108+
109+
These restrictions are enforced via CEL validation on
110+
`ClusterNetworkPolicySpec` (Baseline tier) and
111+
`ClusterNetworkPolicyEgressRule` (Accept-only):
112+
113+
```go
114+
// +kubebuilder:validation:XValidation:rule="self.tier == 'Baseline' ? !self.egress.exists(rule, rule.to.exists(peer, has(peer.domainNames))) : true",message="domainNames cannot be used in Baseline tier as NetworkPolicy cannot override FQDN rules"
115+
type ClusterNetworkPolicySpec struct { ... }
116+
117+
// +kubebuilder:validation:XValidation:rule="self.to.exists(peer, has(peer.domainNames)) ? self.action == 'Accept' : true",message="domainNames may only be used with Accept action"
118+
type ClusterNetworkPolicyEgressRule struct { ... }
119+
```
104120

105121
```golang
106122

@@ -126,22 +142,27 @@ This NPEP proposes adding a new type of `AdminNetworkPolicyEgressPeer` called
126142
// +kubebuilder:validation:Pattern=`^(\*\.)?([a-zA-z0-9]([-a-zA-Z0-9_]*[a-zA-Z0-9])?\.)+[a-zA-z0-9]([-a-zA-Z0-9_]*[a-zA-Z0-9])?\.?$`
127143
type DomainName string
128144

129-
type AdminNetworkPolicyEgressPeer struct {
145+
type ClusterNetworkPolicyEgressPeer struct {
130146
<snipped>
131147
// DomainNames provides a way to specify domain names as peers.
132-
//
133-
// DomainNames is only supported for Allow rules. In order to control
134-
// access, DomainNames Allow rules should be used with a lower priority
135-
// egress deny -- this allows the admin to maintain an explicit "allowlist"
136-
// of reachable domains.
148+
//
149+
// DomainNames is only supported for Accept rules in the Admin tier.
150+
// In order to control access, DomainNames Accept rules should be used
151+
// with a lower precedence egress deny -- this allows the admin to
152+
// maintain an explicit "allowlist" of reachable domains.
153+
//
154+
// DomainNames cannot be used in the Baseline tier because Kubernetes
155+
// NetworkPolicy has no FQDN selector, so a Baseline FQDN rule cannot
156+
// be overridden by a NetworkPolicy.
137157
//
138158
// Support: Extended
139159
//
140160
// <network-policy-api:experimental>
141161
// +optional
142162
// +listType=set
143163
// +kubebuilder:validation:MinItems=1
144-
DomainNames []Domain `json:"domainNames,omitempty"`
164+
// +kubebuilder:validation:MaxItems=25
165+
DomainNames []DomainName `json:"domainNames,omitempty"`
145166
}
146167
```
147168

@@ -150,108 +171,116 @@ type AdminNetworkPolicyEgressPeer struct {
150171
#### Pods in `monitoring` namespace can talk to `my-service.com` and `*.cloud-provider.io`
151172

152173
```yaml
153-
apiVersion: policy.networking.k8s.io/v1alpha1
154-
kind: AdminNetworkPolicy
174+
apiVersion: policy.networking.k8s.io/v1alpha2
175+
kind: ClusterNetworkPolicy
155176
metadata:
156177
name: allow-my-service-egress
157178
spec:
179+
tier: Admin
158180
priority: 55
159181
subject:
160182
namespaces:
161183
matchLabels:
162184
kubernetes.io/metadata.name: "monitoring"
163185
egress:
164186
- name: "allow-to-my-service"
165-
action: "Allow"
187+
action: "Accept"
166188
to:
167189
- domainNames:
168190
- "my-service.com"
169191
- "*.cloud-provider.io"
170-
ports:
171-
- portNumber:
172-
protocol: TCP
173-
port: 443
192+
protocols:
193+
- tcp:
194+
destinationPort:
195+
number: 443
174196
```
175197
176198
#### Maintaining an allowlist of domains
177199
178200
There are a couple ways to maintain an allowlist:
179201
180-
This example, includes the DENY rule in the same ANP object. It's also possible
181-
to use another ANP object with a lower priority (e.g. `100` in this example):
202+
This example includes the Deny rule in the same ClusterNetworkPolicy object.
203+
It's also possible to use another ClusterNetworkPolicy object with a lower
204+
priority (e.g. `100` in this example):
182205
```yaml
183-
apiVersion: policy.networking.k8s.io/v1alpha1
184-
kind: AdminNetworkPolicy
206+
apiVersion: policy.networking.k8s.io/v1alpha2
207+
kind: ClusterNetworkPolicy
185208
metadata:
186209
name: allow-my-service-egress
187210
spec:
211+
tier: Admin
188212
priority: 55
189213
subject:
190214
namespaces:
191215
matchLabels:
192216
kubernetes.io/metadata.name: "monitoring"
193217
egress:
194218
- name: "allow-to-my-service"
195-
action: "Allow"
219+
action: "Accept"
196220
to:
197221
- domainNames:
198222
- "my-service.com"
199223
- "*.cloud-provider.io"
200-
ports:
201-
- portNumber:
202-
protocol: TCP
203-
port: 443
224+
protocols:
225+
- tcp:
226+
destinationPort:
227+
number: 443
204228
- name: "default-deny"
205229
action: "Deny"
206230
to:
207231
- networks:
208232
- "0.0.0.0/0"
209233
```
210234

211-
This example uses a default-deny BaselineAdminNetworkPolicy to create the
212-
allowlist:
235+
This example uses a Baseline-tier default-deny ClusterNetworkPolicy to create
236+
the allowlist:
213237
```yaml
214-
apiVersion: policy.networking.k8s.io/v1alpha1
215-
kind: AdminNetworkPolicy
238+
apiVersion: policy.networking.k8s.io/v1alpha2
239+
kind: ClusterNetworkPolicy
216240
metadata:
217241
name: allow-my-service-egress
218242
spec:
243+
tier: Admin
219244
priority: 55
220245
subject:
221246
namespaces:
222247
matchLabels:
223248
kubernetes.io/metadata.name: "monitoring"
224249
egress:
225250
- name: "allow-to-my-service"
226-
action: "Allow"
251+
action: "Accept"
227252
to:
228253
- domainNames:
229254
- "my-service.com"
230255
- "*.cloud-provider.io"
231-
ports:
232-
- portNumber:
233-
protocol: TCP
234-
port: 443
256+
protocols:
257+
- tcp:
258+
destinationPort:
259+
number: 443
235260
---
236-
apiVersion: policy.networking.k8s.io/v1alpha1
237-
kind: BaselineAdminNetworkPolicy
261+
apiVersion: policy.networking.k8s.io/v1alpha2
262+
kind: ClusterNetworkPolicy
238263
metadata:
239-
name: default
264+
name: default-deny
240265
spec:
266+
tier: Baseline
267+
priority: 0
241268
subject:
242269
namespaces: {}
243-
ingress:
244-
- action: Deny
245-
to:
246-
- networks:
247-
- "0.0.0.0/0"
270+
egress:
271+
- name: "default-deny"
272+
action: "Deny"
273+
to:
274+
- networks:
275+
- "0.0.0.0/0"
248276
```
249277

250278
### Expected Behavior
251279

252280
1. A FQDN egress policy does not grant the workload permission to communicate
253281
with any in-cluster DNS services (like `kube-dns`). A separate rule needs to
254-
be configured to allow traffic to any DNS servers.
282+
be configured to allow traffic to any DNS servers. FQDN policies are not
283+
expected to work if the pod cannot reach DNS.
255284
1. FQDN policies should not affect the ability of workloads to resolve domains,
256285
only their ability to communicate with the IP backing them. Put another way,
257286
FQDN policies should not result in any form of DNS filtering.
@@ -263,6 +292,10 @@ spec:
263292
considered authoritative for resolving domain names. This could be the
264293
`kube-dns` Service or potentially some other DNS provider specified in the
265294
implementation's configuration.
295+
1. Pods are expected to use the DNS configuration provided via `resolv.conf`
296+
(i.e. the canonical cluster DNS server). If a pod uses a different DNS
297+
server (e.g. hardcoded `8.8.8.8`), FQDN rule processing is not guaranteed
298+
to work.
266299
1. DNS record querying and lifetimes:
267300
* Pods are expected to make a DNS query for a domain before sending traffic
268301
to it. If the Pod fails to send a DNS request and instead just sends
@@ -272,9 +305,12 @@ spec:
272305
establish new connection using DNS records that are expired is not
273306
guaranteed to work.
274307
* When the TTL for a DNS record expires, the implementor should stop
275-
allowing new connections to that IP. Existing connection will still be
308+
allowing new connections to that IP. Existing connections will still be
276309
allowed (that's consistent with NetworkPolicy behavior on long-running
277-
connections).
310+
connections).
311+
* If a DNS record is refreshed before the TTL expires (e.g., due to a new
312+
DNS query from the workload), the TTL timer should be reset based on the
313+
new response.
278314
1. Implementations must support at least 100 unique IPs (either IPv4 or IPv6)
279315
for each domain. This is true for both explicitly specified domains, as well
280316
as for each domain selected by a wild-card rule. For example, the rule
@@ -342,6 +378,67 @@ spec:
342378
The implementer can still deny traffic to `1.2.3.4` because no single
343379
response contained the full chain required to resolve the domain.
344380

381+
### Recommended Behavior
382+
383+
The following recommendations are based on operational experience from existing
384+
implementations and feedback gathered during KubeCon Atlanta 2025.
385+
386+
1. Allowing pods to reach arbitrary DNS servers is a security concern: an
387+
untrusted DNS server could return malicious IP mappings, potentially
388+
bypassing FQDN policy intent. The administrator should create a DNS allow
389+
rule that restricts egress DNS traffic to only the trusted canonical DNS
390+
server (e.g. `kube-dns` in `kube-system`), rather than allowing DNS to all
391+
destinations. For example:
392+
```yaml
393+
apiVersion: policy.networking.k8s.io/v1alpha2
394+
kind: ClusterNetworkPolicy
395+
metadata:
396+
name: allow-dns
397+
spec:
398+
tier: Admin
399+
priority: 50
400+
subject:
401+
namespaces:
402+
matchLabels:
403+
requires-fqdn-policy: "true"
404+
egress:
405+
- name: "allow-dns"
406+
action: "Accept"
407+
to:
408+
- namespaces:
409+
matchLabels:
410+
kubernetes.io/metadata.name: "kube-system"
411+
protocols:
412+
- udp:
413+
destinationPort:
414+
number: 53
415+
- tcp:
416+
destinationPort:
417+
number: 53
418+
```
419+
1. Implementations SHOULD NOT trust DNS responses that do not originate from
420+
the canonical DNS server, as these could be spoofed or manipulated.
421+
1. Implementations MAY provide a configurable grace period beyond the TTL to
422+
accommodate DNS propagation delays and client-side caching.
423+
1. Although securing DNS resolution is a non-goal of this NPEP, implementations
424+
are recommended to consider mitigations for DNS cache poisoning (e.g.,
425+
DNSSEC validation by the canonical DNS server) when documenting their trust
426+
model.
427+
428+
## Connection Lifecycle on Policy Updates
429+
430+
When FQDN policies change, implementations must handle in-flight connections
431+
gracefully. See the [Expected Behavior](#expected-behavior) section for DNS
432+
record updates.
433+
434+
* **Policy addition**: New connections matching the FQDN are allowed once DNS
435+
resolution for the domain has been observed. Connections to IPs that haven't
436+
been observed via DNS are not guaranteed to be allowed.
437+
* **Policy removal**: Existing established connections SHOULD be allowed to
438+
complete gracefully. New connections SHOULD be denied after the policy is
439+
removed. This is consistent with the general NetworkPolicy behavior for
440+
long-running connections.
441+
345442
## Alternatives
346443

347444
### IP Block Selector

0 commit comments

Comments
 (0)