Skip to content

Add deny by default to netpol#2079

Open
aauren wants to merge 3 commits into
masterfrom
add_deny_by_default_to_netpol
Open

Add deny by default to netpol#2079
aauren wants to merge 3 commits into
masterfrom
add_deny_by_default_to_netpol

Conversation

@aauren
Copy link
Copy Markdown
Collaborator

@aauren aauren commented May 18, 2026

What type of PR is this?

feature

What this PR does / why we need it:

Closes the race window where a new pod is routable on the node before its KUBE-POD-FW-* chain has been programmed, by introducing --netpol-default-deny command line option. The initial commit on this branch added CIDR-scoped REJECTs at the tail of KUBE-NWPLCY-TAIL, but those failed for pod-to-pod traffic on the same node because every per-pod chain marks accepted packets with 0x20000, and the KUBE-NWPLCY-TAIL ACCEPT-on-mark rule fired before the CIDR REJECTs could.

This PR finishes the feature by introducing a per-IP-family kube-router-local-pods ipset of pod IPs whose firewall chain has been programmed this sync. Two new ipset-gated REJECT rules are prepended to KUBE-NWPLCY-TAIL ahead of the ACCEPT-on-mark; they fire for any local pod IP not yet in the set. The existing CIDR-scoped REJECTs stay at the tail as defense-in-depth.

When --netpol-default-deny is disabled the ipset is never created and the rule layout is unchanged from today. The feature self-disables (with an error log) when this node's pod CIDRs cannot be detected via node.spec.podCIDRs or the kube-router.io/pod-cidrs annotation.

The user-guide gets a brief section on the feature; the troubleshooting guide gets a symptom-first entry ("NetworkPolicy not enforced on freshly-launched pods") with the operational details.

Which issue(s) this PR is related to:

Further protects against findings discussed in #873

Was AI used during the creation of this PR?

  • What tool was used: Claude (opencode)
  • To what extent was the tool used? Drafted the implementation, tests, and docs on the second commit (feat(npc): enable default deny for pod<->pod) and the third commit (doc: how --netpol-default-deny works) from a detailed plan. The first commit on the branch was hand-written.
  • If drafted, how detailed of a plan did you create for the AI? Very detailed
  • Help us understand if a human was in the loop or not for this PR? Yes — every change was reviewed and the AI was redirected several times (troubleshooting framing, helper consolidation, function renames, docstring brevity).

What, if any, amount of integration testing was done with this change in a Kubernetes environment?

Unit tests cover the new rule layout, ipset refresh, drift recovery, IPv6 prefix handling, and cleanup. Manual cluster validation (deploying the netpol-race-tester DaemonSet from kube-router-automation, verifying iptables -L KUBE-NWPLCY-TAIL and ipset list kube-router-local-pods, toggling the flag off and confirming the ipset and extra rules are gone) is the remaining checklist item before merge.

I also tested against a local Kubernetes cluster with a custom pod workload that attempted to make early connections both to cluster internal and cluster external endpoints. I toggled default-deny on and off and watched the workloads.

Does this PR introduce a breaking change?

NONE

Anything else the reviewer should know that wasn't already covered?

  • The feature is gated behind --netpol-default-deny (off by default), so this is purely additive for existing deployments.
  • This features ONLY works when kube-router is aware of the pod CIDRs on a per-node basis. This happens by default when kube-router is operating the pod routing, and it can also happen with other cluster networking plugins that use --allocate-node-cidrs=true from kube-controller-manager. It can also be used by other more custom configurations as long as the user annotates the node with the kube-router kube-router.io/pod-cidrs annotation. If neither or these are configured, kube-router cannot safely gate traffic and will warn and continue.
  • The per-sync ipset refresh runs before ensureTailChainPosition installs the TAIL jump in the iptables-restore buffer, so the new rules cannot fire against a stale ipset during a fresh kube-router start.

aauren added 3 commits May 18, 2026 11:15
This is the initial work done to optionally stop pod race conditions
where the CNI has already allowed the pod to start, but a policy sync
has not run yet. This potentially allows pods to begin traffic patterns
that would otherwise be prohibited by networkpolicy.

Since kube-router does not control the CNI layer like other k8s network
plugins, we don't have the ability to gate the return of the pod network
sandbox on policy synchronization. So we essentially have two options:

* Fail open (which is what the project has done so far)
* Fail closed (which can now be optionally set via --netpol-default-deny
  parameter)

This essentially gives cluster administrators more options.
The previous attempt only blocked traffic for non-cluster ingress /
egress. Workload to workload communication had problems because traffic
would already be marked as long as the existing workload's networkpolicy
did not deny it during startup, thus the traffic would be accepted by
the mark on the packet before getting to our reject rules.

This change fixes that, by pre-emptively rejecting traffic that has a
source or destination within the node's PodCIDR, but who's pod IP has
not yet been synced (as tracked by an IPset).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant