Skip to content

cilium-envoy RELEASE_ASSERT crash in PortNetworkPolicy when NPDS delivers overlapping port ranges with different endPort #45811

@ecthelion77

Description

@ecthelion77

What happened?

cilium-envoy crashes with a RELEASE_ASSERT in the NPDS (Network Policy Discovery Service) policy ingestion path when the cilium-agent sends overlapping port ranges with different endPort values for the same start port.

The crash occurs in PortNetworkPolicy constructor (cilium/network_policy.cc:1663) when the range-splitting logic attempts to insert a new sub-range into the btree_map but encounters a duplicate key.

Root cause: When a CiliumNetworkPolicy has two ingress (or egress) rules that both specify the same port number but with different endPort values (e.g., port 10000 with endPort 10001 in one rule and endPort 10200 in another), the cilium-agent sends these as separate overlapping port ranges via NPDS without merging them first. The proxy's PortNetworkPolicy constructor then tries to split these ranges into disjoint sub-ranges, but the algorithm produces a duplicate key insertion, triggering the RELEASE_ASSERT.

Expected behavior: Either:

  1. The cilium-agent should merge overlapping port ranges before sending them to the proxy, OR
  2. The proxy should handle overlapping ranges gracefully (merge or reject with a proper error) instead of crashing with a fatal assertion.

How can we reproduce the issue?

  1. Deploy Cilium 1.19.x with standalone cilium-envoy DaemonSet
  2. Create a CiliumNetworkPolicy with two ingress rules targeting the same port but with different endPort values:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: reproduce-envoy-crash
  namespace: default
spec:
  endpointSelector:
    matchLabels:
      app: test-pod
  ingress:
    - fromCIDRSet:
        - cidr: 10.0.0.0/8
      toPorts:
        - ports:
            - port: "10000"
              endPort: 10001
              protocol: UDP
    - fromEndpoints:
        - matchLabels:
            app: client
      toPorts:
        - ports:
            - port: "10000"
              endPort: 10200
              protocol: UDP
  1. Ensure a pod matching app: test-pod exists on the target node
  2. The cilium-envoy pod on that node will crash immediately upon receiving the NPDS update

Key observation: The issue also manifests in cross-policy scenarios where a CiliumNetworkPolicy and a CiliumClusterwideNetworkPolicy both apply to the same pod and define the same port with different endPort ranges. The cilium-agent aggregates all applicable policies per-pod and sends the merged (but non-deduplicated) port ranges to envoy via NPDS.

Cilium Version

Client: 1.19.3 f5eb641b 2026-04-14T12:00:16+00:00 go version go1.25.9 linux/amd64
Daemon: 1.19.3 f5eb641b 2026-04-14T12:00:16+00:00 go version go1.25.9 linux/amd64

cilium-envoy image: quay.io/cilium/cilium-envoy:v1.36.6-1776000132-2437d2edeaf4d9b56ef279bd0d71127440c067aa

Kernel Version

6.18.9-talos (Talos Linux)

Kubernetes Version

Client Version: v1.33.2
Server Version: v1.35.2

Regression

Unknown. The assertion exists in the current main branch of cilium/proxy as well. It is likely present since the port range support was introduced in the NPDS protocol (Cilium 1.14+).

Sysdump

Not attached (production cluster with sensitive data). The crash is fully reproducible with the minimal CNP above.

Relevant log output

[2026-05-06 12:21:11.862][7][critical][assert] [cilium/network_policy.cc:1663] assert failure: new_pair.second. Details: duplicate entry at end when explicitly adding a new range!
[2026-05-06 12:21:11.862][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:38] stacktrace for envoy bug
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #0 UNKNOWN [0x55c70e7eef42]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #1 UNKNOWN [0x55c70e7e6bb1]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #2 UNKNOWN [0x55c710444fb1]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #3 UNKNOWN [0x55c710445652]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #4 UNKNOWN [0x55c7104496bf]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #5 UNKNOWN [0x55c71044bdae]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #6 UNKNOWN [0x55c71044f21f]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #7 UNKNOWN [0x55c710a57fdd]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #8 UNKNOWN [0x55c710a5fe89]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #9 UNKNOWN [0x55c710bd13c0]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #10 UNKNOWN [0x55c710bfb885]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #11 UNKNOWN [0x55c71093ca8d]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #12 UNKNOWN [0x55c710b7d045]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #13 UNKNOWN [0x55c710b899fd]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #14 UNKNOWN [0x55c710b958d7]
[2026-05-06 12:21:11.863][7][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #15 UNKNOWN [0x55c710ec4107]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:129] Caught Aborted, suspect faulting address 0x7
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:113] Backtrace (use tools/stack_decode.py to get line numbers):
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:114] Envoy version: 2437d2edeaf4d9b56ef279bd0d71127440c067aa/1.36.6/Distribution/RELEASE/BoringSSL
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:116] Address mapping: 55c70e734000-55c7116e7000 /usr/bin/cilium-envoy
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:123] #0: [0x7f333960d330]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:121] #1: raise [0x7f333960d27e]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:121] #2: abort [0x7f33395f08ff]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:123] #3: [0x55c70e7f1bbf]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:123] #4: [0x55c70e7eef42]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:123] #5: [0x55c70e7e6bb1]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:123] #6: [0x55c710444fb1]
[2026-05-06 12:21:11.863][7][critical][backtrace] [external/envoy/source/server/backtrace.h:123] #7: [0x55c710445652]

Anything else?

Source code reference: The crash is in cilium/proxycilium/network_policy.cc line 1663. The PortNetworkPolicy constructor iterates over port ranges and splits overlapping ones into disjoint sub-ranges. When two ranges share the same start port (e.g., [10000, 10001] and [10000, 10200]), the "create a new entry covering the end" branch produces a key that already exists in the btree_map, triggering the assertion.

Impact: This is a production-severity issue. The RELEASE_ASSERT fires in release builds (not debug-only), causing an immediate SIGABRT. The cilium-envoy pod enters CrashLoopBackOff and the crash repeats on every restart since the same conflicting policy is re-sent by the agent each time.

Possible fix locations:

  1. cilium-agent (pkg/proxy/endpoint/ or policy resolution): Merge overlapping port ranges before encoding them into the NPDS protobuf message sent to envoy. This is the safest fix as it prevents the proxy from ever receiving conflicting ranges.
  2. cilium/proxy (cilium/network_policy.cc): Replace the RELEASE_ASSERT with graceful handling — either merge the conflicting ranges or skip the duplicate insertion with a warning log.

Workaround: Ensure no CiliumNetworkPolicy (or combination of CNP + CiliumClusterwideNetworkPolicy) defines the same port with different endPort values for pods on the same node. This requires manual audit of all network policies.

Metadata

Metadata

Labels

area/proxyImpacts proxy components, including DNS, Kafka, Envoy and/or XDS servers.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.sig/policyImpacts whether traffic is allowed or denied based on user-defined policies.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions