Skip to content

add defaultTrafficPolicy to MeshConfig#3720

Open
PetrMc wants to merge 1 commit into
istio:masterfrom
PetrMc:petrmc/default-traffic-policy
Open

add defaultTrafficPolicy to MeshConfig#3720
PetrMc wants to merge 1 commit into
istio:masterfrom
PetrMc:petrmc/default-traffic-policy

Conversation

@PetrMc
Copy link
Copy Markdown
Member

@PetrMc PetrMc commented Jun 2, 2026

adds a mesh-wide baseline traffic policy that DestinationRules inherit and override per-field. today a DestinationRule traffic policy replaces the policy wholesale, so any field it leaves unset falls back to istio's hardcoded defaults (max-uint32 connection limits, no outlier detection) rather than an operator baseline. the field reuses the full networking.v1alpha3.TrafficPolicy type; the planned istio implementation will initially honor only connectionPool and outlierDetection, with the remaining sub-policies warned on and support added later without an API change. mirrors the existing defaultHttpRetryPolicy pattern.

… policy

Signed-off-by: Petr McAllister <petr.mcallister@gmail.com>
@PetrMc PetrMc requested a review from a team as a code owner June 2, 2026 21:29
@istio-testing istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 2, 2026
@hzxuzhonghu
Copy link
Copy Markdown
Member

I am afraid this will break existing default traffic policy settings, previoulsy if the dr.trafficpolicy not set, we have many different ways to get some timeouts(i donot have a thorough look)

@PetrMc
Copy link
Copy Markdown
Member Author

PetrMc commented Jun 3, 2026

I am afraid this will break existing default traffic policy settings, previoulsy if the dr.trafficpolicy not set, we have many different ways to get some timeouts(i donot have a thorough look)

thanks @hzxuzhonghu, few points:

  • it's opt-in and nil by default. if defaultTrafficPolicy isn't set, nothing changes. the existing default paths (mesh connectTimeout, tcpKeepalive, the hardcoded CB thresholds) all behave exactly as today. no existing deployment is affected unless the field is explicitly sets

  • when it is set, it slots into a precedence chain/ladder that already exists for these fields. today cluster.ConnectTimeout is seeded from Mesh.ConnectTimeout and then overridden by the DR's connectionPool.tcp.connectTimeout (cluster_builder.go here + cluster_traffic_policy.go - 1 2). so there's already a mesh-baseline -> DR-override model. defaultTrafficPolicy just makes that baseline configurable for connectionPool + outlierDetection, sitting between the legacy mesh fields and any DR:

hardcoded default -> MeshConfig.connectTimeout/tcpKeepalive (legacy) -> defaultTrafficPolicy -> DR trafficPolicy -> DR portLevelSettings
  • each layer only overrides the fields it actually sets. legacy MeshConfig.connectTimeout keeps applying whenever defaultTrafficPolicy doesn't set tcp.connectTimeout, so nothing existing breaks.

  • this scope is intentionally just connectionPool + outlierDetection (planned impl warns on and ignores the rest). those are exactly the fields that already follow the mesh-baseline -> DR-override pattern, so this isn't a new merge model, just extending an existing one. same shape as defaultHttpRetryPolicy.

the actual merge, precedence, and backward-compat tests will land in the istio/istio impl PR, not here.

please let me know if it helps or if there a specific default path you're most worried about (timeouts, lb, outlier) so i can confirm it's covered?

@hzxuzhonghu
Copy link
Copy Markdown
Member

Thank for the thorough analysis.

hardcoded default -> MeshConfig.connectTimeout/tcpKeepalive (legacy) -> defaultTrafficPolicy -> DR trafficPolicy -> DR portLevelSettings

The tier is super long, it would be tricky to debug for end user.

The TrafficPolicy is such a big struct, covers a lot of attributes, some may not be supported,
some have default values, how would we handle the default values like below?
Though it provide the flexibility, i think it also brings complexity.

// Settings common to both HTTP and TCP upstream connections.
type ConnectionPoolSettings_TCPSettings struct {
	state protoimpl.MessageState `protogen:"open.v1"`
	// Maximum number of HTTP1 /TCP connections to a destination host. Default 2^32-1.
	MaxConnections int32 `protobuf:"varint,1,opt,name=max_connections,json=maxConnections,proto3" json:"max_connections,omitempty"`
	// TCP connection timeout. format:
	// 1h/1m/1s/1ms. MUST be >=1ms. Default is 10s.
	ConnectTimeout *duration.Duration `protobuf:"bytes,2,opt,name=connect_timeout,json=connectTimeout,proto3" json:"connect_timeout,omitempty"`

@PetrMc
Copy link
Copy Markdown
Member Author

PetrMc commented Jun 4, 2026

thanks @hzxuzhonghu

both come down to merge granularity, and we reuse the existing mergeTrafficPolicy (https://github.com/istio/istio/blob/e9c2753b9f66749d6c28faddb641f57f2b6592f2/pilot/pkg/networking/util/util.go#L859) instead of inventing one. it merges at the block level: connectionPool and outlierDetection taken as whole units, not field-by-field inside them. so the MaxConnections 0-vs-unset problem doesn't arise, and the documented defaults (10s, 2^32-1) are unchanged, they stay the fallback when nothing in the chain sets the block. set nothing, behaves exactly as today.

scope is just connectionPool + outlierDetection, the rest is warned on and ignored.

on the long ladder: the resolved per-cluster policy is already visible via istioctl proxy-config cluster, so debugging is unchanged. proposal to keep it simple: block-level merge only.

does that work, or were you expecting field-level merge inside the blocks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants