Dial EC endpoints with a timeout in ExecutionClientManager#821
Dial EC endpoints with a timeout in ExecutionClientManager#8210xtrooper wants to merge 2 commits into
Conversation
When establishing connections this avoids having the function stuck trying to connect to the clients
|
So, I talked to fornax a bit about this offline, but this is sort of a bandaid that won't actually fix the larger problem- that we abstracted away the When the rocketpool binary starts, we should create a cancelable parent context for all actions, and make children using WithTimeout. This would be an enormous refactor, but it would mean that context-aware timeouts would be possible. 5 seconds seems too short for a global dial timeout. ECs that are in the middle of syncing, for example, are still dialed to get sync status, but may take more than 5 seconds to reply. I think it's a little too dangerous to apply a unilateral 5 second timeout here. |
|
I agree that a global context would make a lot of sense. Would it be reasonable to have a (longer) timeout for the fallback client only? Feels a bit strange to have the UI not respond because the fallback EC has some issues. |
Sure, that would make sense. I think at some point I also want to add a feature that treats the fallback ec as the primary for queries in rocketpool_node and rocketpool_api so that things like collecting metrics don't use the primary ec if a fallback is available, but adding a timeout shouldn't really hurt that use-case. |
What:
Use a 5-second context timeout when dialing primary and fallback EC URLs to avoid blocking indefinitely.
Why:
Prevents startup hangs if an endpoint is down or unresponsive.
How:
Introduce
dialWithTimeout(url string)helper that utilizesethclient.DialContext(..)and apply to both primary and fallback dials in NewExecutionClientManager.