The crash is triggered when a Gateway + HTTPRoute set is updated or deleted while the corresponding Service/Endpoints objects are also being modified (our chart does create-update-delete cycles for e2e tests).
Each panic restarts Pilot, which in turn breaks the validation.istio.io webhook and causes Helm upgrades to fail.
We should have a new image by the end of the week, but in the meantime chart-testing CI will fail but the general deployment does not create the race condition so I don't think we need to swap to kgateway as a default in the meantime.
Error: UPGRADE FAILED: cannot patch "llm-d-wrfpk0ocoj-vllm-sim" with kind DestinationRule: Internal error occurred: failed calling webhook "validation.istio.io": failed to call webhook: Post "https://istiod.istio-system.svc:443/validate?timeout=10s": dial tcp 10.96.234.92:443: connect: connection refused
2025-06-27T03:16:27.7319670Z
2025-06-24T04:31:43.5637747Z 2025-06-24T04:30:47.461311Z info delta CDS: PUSH for node:llm-d-ggrsca8gs3-inference-gateway-istio-5775698557-jswrq.llm-d-ggrsca8gs3 resources:17 removed:0 size:18.9kB cached:15/16
2025-06-24T04:31:43.5638996Z 2025-06-24T04:30:47.461909Z info delta LDS: PUSH for node:llm-d-ggrsca8gs3-inference-gateway-istio-5775698557-jswrq.llm-d-ggrsca8gs3 resources:1 removed:0 size:2.3kB
2025-06-24T04:31:43.5639720Z panic: runtime error: invalid memory address or nil pointer dereference
2025-06-24T04:31:43.5640149Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x29a7902]
2025-06-24T04:31:43.5640401Z
2025-06-24T04:31:43.5640481Z goroutine 1169 [running]:
2025-06-24T04:31:43.5640927Z istio.io/istio/pilot/pkg/networking/core/route.HashForHTTPDestination(0xc000e2a9c0?, 0x5b?, 0xc000e2a9c0?)
2025-06-24T04:31:43.5641636Z istio.io/istio/pilot/pkg/networking/core/route/route.go:1559 +0x42
2025-06-24T04:31:43.5642368Z istio.io/istio/pilot/pkg/networking/core/route.hashForVirtualService(0xc001f8bdc0, _, {{{{0x36f555a, 0x13}, {0x36d3966, 0x2}, {0x36e91b7, 0xe}}, {0x0, 0x0}, ...}, ...})
2025-06-24T04:31:43.5643229Z istio.io/istio/pilot/pkg/networking/core/route/route.go:1535 +0x145
2025-06-24T04:31:43.5643736Z istio.io/istio/pilot/pkg/networking/core/route.GetConsistentHashForVirtualService(...)
2025-06-24T04:31:43.5644218Z istio.io/istio/pilot/pkg/networking/core/route/route.go:1546
2025-06-24T04:31:43.5644920Z istio.io/istio/pilot/pkg/networking/core.(*ConfigGeneratorImpl).buildGatewayHTTPRouteConfig(0xc001f8bdc0?, 0xc00136fb80, 0xc001f8bdc0, {0xc002388d97, 0x7})
2025-06-24T04:31:43.5645606Z istio.io/istio/pilot/pkg/networking/core/gateway.go:451 +0xe88
2025-06-24T04:31:43.5646264Z istio.io/istio/pilot/pkg/networking/core.(*ConfigGeneratorImpl).BuildHTTPRoutes(0xc000bcd8f0, 0xc00136fb80, 0xc0017f21e0, {0xc001dfef70, 0x1, 0x3c?})
2025-06-24T04:31:43.5647178Z istio.io/istio/pilot/pkg/networking/core/httproute.go:98 +0x5fb
2025-06-24T04:31:43.5647748Z istio.io/istio/pilot/pkg/xds.RdsGenerator.Generate({{0x3c5dd70?, 0xc000bcd8f0?}}, 0xc00136fb80, 0xc0013aaab0?, 0xc0017f21e0)
2025-06-24T04:31:43.5648265Z istio.io/istio/pilot/pkg/xds/rds.go:63 +0x376
2025-06-24T04:31:43.5648748Z istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).pushDeltaXds(0xc000744f00, 0xc001518c00, 0xc0013aaab0, 0xc0017f21e0)
2025-06-24T04:31:43.5649238Z istio.io/istio/pilot/pkg/xds/delta.go:504 +0x2cf1
2025-06-24T04:31:43.5649718Z istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).pushConnectionDelta(0xc000744f00, 0xc001518c00, 0xc001518c00?)
2025-06-24T04:31:43.5650195Z istio.io/istio/pilot/pkg/xds/delta.go:169 +0x105
2025-06-24T04:31:43.5650631Z istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).StreamDeltas(0xc000744f00, {0x3c62158, 0xc001a7e8f0})
2025-06-24T04:31:43.5651073Z istio.io/istio/pilot/pkg/xds/delta.go:139 +0x951
2025-06-24T04:31:43.5651557Z istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).DeltaAggregatedResources(0x5e11080?, {0x3c62158?, 0xc001a7e8f0?})
2025-06-24T04:31:43.5652167Z istio.io/istio/pilot/pkg/xds/ads.go:467 +0x1d
2025-06-24T04:31:43.5652915Z github.com/envoyproxy/go-control-plane/envoy/service/discovery/v3._AggregatedDiscoveryService_DeltaAggregatedResources_Handler({0x36cea40?, 0xc000744f00}, {0x3c5a428, 0xc002640620})
2025-06-24T04:31:43.5653914Z github.com/envoyproxy/go-control-plane/envoy@v1.32.5-0.20250228031205-63a55395d7a3/service/discovery/v3/ads_grpc.pb.go:163 +0xd8
2025-06-24T04:31:43.5654875Z google.golang.org/grpc.(*Server).processStreamingRPC(0xc00092f600, {0x3c4f270, 0xc000a0a2d0}, 0xc001b5fec0, 0xc000c6eed0, 0x5e4dd20, 0x0)
2025-06-24T04:31:43.5655447Z google.golang.org/grpc@v1.70.0/server.go:1690 +0x1252
2025-06-24T04:31:43.5655894Z google.golang.org/grpc.(*Server).handleStream(0xc00092f600, {0x3c50ae8, 0xc001613040}, 0xc001b5fec0)
2025-06-24T04:31:43.5656380Z google.golang.org/grpc@v1.70.0/server.go:1814 +0xb47
2025-06-24T04:31:43.5656893Z google.golang.org/grpc.(*Server).serveStreams.func2.1()
2025-06-24T04:31:43.5657229Z google.golang.org/grpc@v1.70.0/server.go:1030 +0x7f
2025-06-24T04:31:43.5657614Z created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 1149
2025-06-24T04:31:43.5658005Z google.golang.org/grpc@v1.70.0/server.go:1041 +0x11d
2025-06-24T04:31:43.5658213Z
Component
Helm Chart
Describe the bug
The crash is triggered when a Gateway + HTTPRoute set is updated or deleted while the corresponding Service/Endpoints objects are also being modified (our chart does create-update-delete cycles for e2e tests).
Each panic restarts Pilot, which in turn breaks the validation.istio.io webhook and causes Helm upgrades to fail.
We should have a new image by the end of the week, but in the meantime chart-testing CI will fail but the general deployment does not create the race condition so I don't think we need to swap to kgateway as a default in the meantime.
The patch that fixes this is istio/istio#56632
In the CI chart-testing job you will see the failure:
In the pod you will see the panic: