xds: XdsDepManager should ignore updates after shutdown#12034
xds: XdsDepManager should ignore updates after shutdown#12034ejona86 merged 2 commits intogrpc:masterfrom
Conversation
This prevents a NPE and subsequent channel panic when trying to build a
config (because there are no watchers, so waitingOnResource==false)
without any listener and route.
```
java.lang.NullPointerException: Cannot invoke "io.grpc.xds.XdsDependencyManager$RdsUpdateSupplier.getRdsUpdate()" because "routeSource" is null
at io.grpc.xds.XdsDependencyManager.buildUpdate(XdsDependencyManager.java:295)
at io.grpc.xds.XdsDependencyManager.maybePublishConfig(XdsDependencyManager.java:266)
at io.grpc.xds.XdsDependencyManager$EdsWatcher.onChanged(XdsDependencyManager.java:899)
at io.grpc.xds.XdsDependencyManager$EdsWatcher.onChanged(XdsDependencyManager.java:888)
at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.notifyWatcher(XdsClientImpl.java:929)
at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.lambda$onData$0(XdsClientImpl.java:837)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:96)
```
I think this fully-fixes the problem today, but not tomorrow.
subscribeToCluster() is racy as well, but not yet used.
This was noticed when idleTimeout was firing, with some other code
calling getState(true) to wake the channel back up. That may have made
this panic more visible than it would be otherwise, but that has not
been investigated.
b/412474567
Flow control prevents all but the first update from occurring before the XdsClient unsubscriptions take effect.
| @Override | ||
| public void onError(Status error) { | ||
| checkNotNull(error, "error"); | ||
| if (cancelled) { |
There was a problem hiding this comment.
Hey Eric, one small doubt in this. How did we decide to put this after checkNotNull? I thought if we have already cancelled the watcher, we should just return. Checking not null should probably go after cancelled check?
There was a problem hiding this comment.
checkNotNull() is checking the API was called correctly. Enforcing API requirements is no less relevant if the watcher is cancelled, although it is true that this implementation need not the results. If it is null here, it is a bug that needs fixing, independent of XdsDependencyManager's state. The robustness principal has been largely found to be a mistake. It is better to be strict of inputs, so bugs have a short lifetime and the system works as one expects. I want things to fail as early as possible, as there they are most obvious.
(I'd agree that the ordering could have been the other way here, and I did consider it. But what value is skipping the null check?)
This prevents a NPE and subsequent channel panic when trying to build a config (because there are no watchers, so waitingOnResource==false) without any listener and route.
I think this fully-fixes the problem today, but not tomorrow. subscribeToCluster() is racy as well, but not yet used.
This was noticed when idleTimeout was firing, with some other code calling getState(true) to wake the channel back up. That may have made this panic more visible than it would be otherwise, but that has not been investigated.
b/412474567