Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
195 changes: 147 additions & 48 deletions npeps/npep-133-fqdn-egress-selector.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,24 @@ Names](https://www.wikipedia.org/wiki/Fully_qualified_domain_name) (FQDNs).
(for example `kubernetes.io`).
* Support basic wildcard matching capabilities when specifying FQDNs (for
example `*.cloud-provider.io`)
* Currently only `Accept` type rules are proposed.
* Safely enforcing `Deny` rules based on FQDN selectors is difficult as there
* Currently only `ACCEPT` type rules are proposed.
* Safely enforcing `DENY` rules based on FQDN selectors is difficult as there
is no guarantee a Network Policy plugin is aware of all IPs backing a FQDN
policy. If a Network Policy plugin has incomplete information, it may
accidentally allow traffic to an IP belonging to a denied domain. This would
constitute a security breach.

By contrast, `Accept` rules, which may also have an incomplete list of IPs,
By contrast, `ACCEPT` rules, which may also have an incomplete list of IPs,
would not create a security breach. In case of incomplete information, valid
traffic would be dropped as the plugin believes the destination IP does not
belong to the domain. While this is definitely undesirable, it is at least
not an unsafe failure.

* Currently only Admin tier of ClusterNetworkPolicy is the intended scope for this
proposal.
* Since Kubernetes NetworkPolicy does not have a FQDN selector, adding this
capability to Baseline tier could result in writing baseline rules that can't
be replicated by an overriding NetworkPolicy. For example, if Baseline tier
allows traffic to `example.io`, but the namespace admin installs a Kubernetes
Network Policy, the namespace admin has no way to replicate the `example.io`
selector using just Kubernetes Network Policies.
* DomainNames may only be used with Accept action (not Pass or Deny).
* The rationale for excluding Deny is described above.
* Pass is excluded for now because no existing implementations support
DomainNames with Pass and there is no strong use case for it. This
restriction may be relaxed in the future.

## Non-Goals

Expand Down Expand Up @@ -83,7 +80,7 @@ hampers the readability of the network policies.
entire tree of domains. For example, our CDN has domains of the format
`<session>.<random>.<region>.my-app.cdn.com`. I want to be able to use a
wild-card selector to allow the full tree of subdomains below
`**.my-app.cdn.com`.
`*.my-app.cdn.com`.

### Future User Stories

Expand All @@ -93,15 +90,24 @@ goal in this case is to ensure we do not make these unimplementable down the
line.

* As a cluster admin, I want to switch the default disposition of the cluster to
be default deny. This is enforced using a `Baseline` tier in ClusterNetworkPolicy`.
I also want individual namespace owners to be able to specify their egress peers.
Namespace admins would then use a FQDN selector in the Kubernetes
`NetworkPolicy` objects to allow `my-service.com`.
be default deny. This is enforced using a Baseline-tier
`ClusterNetworkPolicy`. I also want individual namespace owners to be able to
specify their egress peers. Namespace admins would then use a FQDN selector
in the Kubernetes `NetworkPolicy` objects to allow `my-service.com`.

## API

This NPEP proposes adding a new type of `ClusterNetworkPolicyEgressPeer` called
`FQDNPeerSelector` which allows specifying domain names.
This NPEP proposes adding a `DomainNames` field to
`ClusterNetworkPolicyEgressPeer` which allows specifying domain names as
egress peers. DomainNames is only available with Accept rules.

This restriction is enforced via CEL validation on
`ClusterNetworkPolicyEgressRule`:

```go
// +kubebuilder:validation:XValidation:rule="self.action != 'Accept' ? !self.to.exists(peer, has(peer.domainNames)) : true",message="domainNames may only be used with Accept action"
type ClusterNetworkPolicyEgressRule struct { ... }
```

```golang

Expand Down Expand Up @@ -130,19 +136,20 @@ type DomainName string
type ClusterNetworkPolicyEgressPeer struct {
<snipped>
// DomainNames provides a way to specify domain names as peers.
//
// DomainNames is only supported for Allow rules. In order to control
// access, DomainNames Allow rules should be used with a lower priority
// egress deny -- this allows the admin to maintain an explicit "allowlist"
// of reachable domains.
//
// DomainNames is only supported for Accept rules.
// In order to control access, DomainNames Accept rules should be used
// with a lower precedence egress deny -- this allows the admin to
// maintain an explicit "allowlist" of reachable domains.
//
// Support: Extended
//
// <network-policy-api:experimental>
// +optional
// +listType=set
// +kubebuilder:validation:MinItems=1
DomainNames []Domain `json:"domainNames,omitempty"`
// +kubebuilder:validation:MaxItems=25
DomainNames []DomainName `json:"domainNames,omitempty"`
}
```

Expand All @@ -151,11 +158,12 @@ type ClusterNetworkPolicyEgressPeer struct {
#### Pods in `monitoring` namespace can talk to `my-service.com` and `*.cloud-provider.io`

```yaml
apiVersion: policy.networking.k8s.io/v1alpha1
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: allow-my-service-egress
spec:
tier: Admin
priority: 55
tier: Admin
subject:
Expand All @@ -164,7 +172,7 @@ spec:
kubernetes.io/metadata.name: "monitoring"
egress:
- name: "allow-to-my-service"
action: "Allow"
action: "Accept"
to:
- domainNames:
- "my-service.com"
Expand All @@ -179,14 +187,16 @@ spec:

There are a couple ways to maintain an allowlist:

This example, includes the Deny rule in the same CNP object. It's also possible
to use another CNP object with a lower priority (e.g. `100` in this example):
This example includes the Deny rule in the same ClusterNetworkPolicy object.
It's also possible to use another ClusterNetworkPolicy object with a lower
priority (e.g. `100` in this example):
```yaml
apiVersion: policy.networking.k8s.io/v1alpha1
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: allow-my-service-egress
spec:
tier: Admin
priority: 55
tier: Admin
subject:
Expand All @@ -195,30 +205,32 @@ spec:
kubernetes.io/metadata.name: "monitoring"
egress:
- name: "allow-to-my-service"
action: "Allow"
action: "Accept"
to:
- domainNames:
- "my-service.com"
- "*.cloud-provider.io"
protocols:
- tcp:
destinationPort:
number: 443
destinationPort:
number: 443
- name: "default-deny"
action: "Deny"
to:
- networks:
- "0.0.0.0/0"
Comment thread
tssurya marked this conversation as resolved.
- "::/0"
```

This example uses a default-deny Baseline ClusterNetworkPolicy to create the
allowlist:
This example uses a Baseline-tier default-deny ClusterNetworkPolicy to create
the allowlist:
```yaml
apiVersion: policy.networking.k8s.io/v1alpha1
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: allow-my-service-egress
spec:
tier: Admin
priority: 55
tier: Admin
subject:
Expand All @@ -227,36 +239,40 @@ spec:
kubernetes.io/metadata.name: "monitoring"
egress:
- name: "allow-to-my-service"
action: "Allow"
action: "Accept"
to:
- domainNames:
- "my-service.com"
- "*.cloud-provider.io"
protocols:
- tcp:
destinationPort:
number: 443
destinationPort:
number: 443
---
apiVersion: policy.networking.k8s.io/v1alpha1
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: default
name: default-deny
spec:
tier: Baseline
priority: 0
subject:
namespaces: {}
ingress:
- action: Deny
to:
- networks:
- "0.0.0.0/0"
egress:
- name: "default-deny"
action: "Deny"
to:
- networks:
- "0.0.0.0/0"
- "::/0"
```

### Expected Behavior

1. A FQDN egress policy does not grant the workload permission to communicate
with any in-cluster DNS services (like `kube-dns`). A separate rule needs to
be configured to allow traffic to any DNS servers.
be configured to allow traffic to any DNS servers. FQDN policies are not
expected to work if the pod cannot reach DNS.
1. FQDN policies should not affect the ability of workloads to resolve domains,
only their ability to communicate with the IP backing them. Put another way,
FQDN policies should not result in any form of DNS filtering.
Expand All @@ -268,6 +284,10 @@ spec:
considered authoritative for resolving domain names. This could be the
`kube-dns` Service or potentially some other DNS provider specified in the
implementation's configuration.
1. Pods are expected to use the DNS configuration provided via `resolv.conf`
(i.e. the canonical cluster DNS server). If a pod uses a different DNS
server (e.g. hardcoded `8.8.8.8`), FQDN rule processing is not guaranteed
to work.
1. DNS record querying and lifetimes:
* Pods are expected to make a DNS query for a domain before sending traffic
to it. If the Pod fails to send a DNS request and instead just sends
Expand All @@ -277,9 +297,12 @@ spec:
establish new connection using DNS records that are expired is not
guaranteed to work.
* When the TTL for a DNS record expires, the implementor should stop
allowing new connections to that IP. Existing connection will still be
allowing new connections to that IP. Existing connections will still be
allowed (that's consistent with NetworkPolicy behavior on long-running
connections).
connections).
* If a DNS record is refreshed before the TTL expires (e.g., due to a new
DNS query from the workload), the TTL timer should be reset based on the
new response.
1. Implementations must support at least 100 unique IPs (either IPv4 or IPv6)
for each domain. This is true for both explicitly specified domains, as well
as for each domain selected by a wild-card rule. For example, the rule
Expand Down Expand Up @@ -347,6 +370,82 @@ spec:
The implementer can still deny traffic to `1.2.3.4` because no single
response contained the full chain required to resolve the domain.

### Recommended Behavior

The following recommendations are based on operational experience from existing
implementations and feedback gathered during KubeCon Atlanta 2025.

Comment thread
tssurya marked this conversation as resolved.
1. Pods should only make DNS requests to the canonical DNS server, because
FQDN rules for a pod are only guaranteed to work after that pod makes an
appropriate DNS request to the canonical DNS server (see [Expected
Behavior](#expected-behavior)). If a pod queries an alternate DNS server
(e.g. hardcoded `8.8.8.8`), the implementation may not observe the DNS
response, and the FQDN Accept rule will simply not take effect. This is
a functionality issue, not a security breach -- the pod's traffic is
denied, not accidentally allowed. For example, when a pod queries the
canonical DNS server for `my-service.com` and receives `1.2.3.4`, the
implementation observes that response and learns that `my-service.com`
maps to `1.2.3.4`. It can then enforce the FQDN Accept rule by allowing
traffic to `1.2.3.4`. If the pod instead queries `8.8.8.8`, the
implementation never sees the response and has no IP to allow -- the rule
simply doesn't take effect.
1. Administrators should create a high-precedence Admin-tier rule allowing
egress DNS traffic to the canonical DNS server (e.g. `kube-dns` in
`kube-system`), to ensure that DNS is not blocked by other deny rules.
For example:
```yaml
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: allow-dns
spec:
tier: Admin
priority: 50
subject:
namespaces:
matchLabels:
requires-fqdn-policy: "true"
egress:
- name: "allow-dns"
action: "Accept"
to:
- namespaces:
matchLabels:
kubernetes.io/metadata.name: "kube-system"
protocols:
- udp:
destinationPort:
number: 53
- tcp:
destinationPort:
number: 53
```
1. Implementations that operate by snooping DNS responses on the wire MUST
only trust responses originating from the canonical DNS server. Trusting
responses from arbitrary sources is a security concern: a malicious actor
could forge DNS responses to trick the implementation into allowing
traffic to unintended IPs.
1. Implementations MAY provide a configurable grace period beyond the TTL to
accommodate DNS propagation delays and client-side caching.
1. Although securing DNS resolution is a non-goal of this NPEP, implementations
are recommended to consider mitigations for DNS cache poisoning (e.g.,
DNSSEC validation by the canonical DNS server) when documenting their trust
model.

## Connection Lifecycle on Policy Updates

When FQDN policies change, implementations must handle in-flight connections
gracefully. See the [Expected Behavior](#expected-behavior) section for DNS
record updates.

* **Policy addition**: New connections matching the FQDN are allowed once DNS
resolution for the domain has been observed. Connections to IPs that haven't
been observed via DNS are not guaranteed to be allowed.
* **Policy removal**: Existing established connections SHOULD be allowed to
complete gracefully. New connections SHOULD be denied after the policy is
removed. This is consistent with the general NetworkPolicy behavior for
long-running connections.

## Alternatives

### IP Block Selector
Expand Down
Loading
Loading