Move waiting logic for host orchestrator settlement to backend#521
Move waiting logic for host orchestrator settlement to backend#521k311093 wants to merge 2 commits into
Conversation
deec9a3 to
2091716
Compare
| if err != nil { | ||
| return nil, err | ||
| } | ||
| client, err := g.InstanceManager.GetHostClient(g.Zone, host.Name) |
There was a problem hiding this comment.
From GceIM, GetHostClient internally invokes getHostInstance, which seems to be calling GCE API to retrieve the host. However, this method already has ins which is typed as *compute.Instance.
I think it's better not to get *compute.Instance twice, also not to bring new fields(InstanceManager, Zone) into opResultGetter.
There was a problem hiding this comment.
Changed logic to pass *compute.Instance and reuse it if specified.
493fffb to
b5e8c57
Compare
There was a problem hiding this comment.
Adding an inside logic to wait for when the host is ready to interact with shouldn't be part of the CreateHost request logic on the backend side.
CreateHost returns success when the host is created and host resources are allocated, not when the host is ready to interact with. To do this properly, you'd need to add and implement a new http endpoint that tests whether the host is ready, still clients would need to make these calls after CreateHost returned in order to determine whether the host is ready to interact with. However, it's up to every client implementation how they want to handle this, they can always keep doing what the current client is doing.
What use is a host that doesn't have a host orchestrator? If the answer is "none", then it would make sense to only mark the operation to create a host as completed once the host orchestrator is ready. It simplifies the api and puts the need for waiting in a single place (the cloud orchestrator) rather than in every client. The CO is already waiting for the host to be created, it might as well wait a bit longer. |
Please add more details to the description and the title of the PR. What is "host orchestrator settlement"? Which backend? The HO or the CO or something else? Why is this change useful/needed? What are the pros and cons? Even if you write a description in the bug that's not publicly available like this PR is. |
Updated commit message with descriptions. |
+1. I expect users are understanding the meaning of host as a host environment to launch Cuttlefish instances, and the route of orchestrating CF instances is Host Orchestrator. I think health of Host Orchestrator is more representative than health of computing unit(e.g. GCE or docker instance). From API view, I expect users create a host and verify with 3 steps; |
b3d13e7 to
8d2831f
Compare
c4d69b0 to
ce31cf8
Compare
427a0a1 to
0ef723f
Compare
| return "", err | ||
| } | ||
| ilen := len(instance.NetworkInterfaces) | ||
| func getHostAddrWithIns(ins *compute.Instance) (string, error) { |
There was a problem hiding this comment.
Move these new set of helper functions as a continuations block in the end of the file getHostAddrWithIns, getHostURLWithIns to the end. So methods like func (m *GCEInstanceManager) GetHostAddr and func (m *GCEInstanceManager) GetHostURL continued to be grouped in the same block.
There was a problem hiding this comment.
Moved function near its context to make it more easy read.
60566b9 to
98a9f86
Compare
1d29e2e to
4dff5ee
Compare
a7c8ab3 to
1ec8af0
Compare
While interacting with host orchestrator right after creating host API is returned, there is a possibility of getting 503 service temporarily unavailable error because host is created and booted but host orchestrator is not executed / ready yet. Handling logic for waiting host orchestrator readiness is currently in the client logic. Moving this logic into cloud orchestrator API endpoint makes the client simpler and reduces redundant API endpoints. Apply it first for Docker instance manager. Context: b/501288123
While interacting with host orchestrator right after creating host API is returned, there is a possibility of getting 503 service temporarily unavailable error because host is created and booted but host orchestrator is not executed / ready yet. Handling logic for waiting host orchestrator readiness is currently in the client logic. Moving this logic into cloud orchestrator API endpoint makes the client simpler and reduces redundant API endpoints. Apply it for GCE instance manager. Context: b/501288123
| } | ||
| time.Sleep(retryDelay) | ||
| } | ||
| return errors.NewServiceUnavailableError("wait for host orchestrator timed out", nil) |
There was a problem hiding this comment.
Let's now return NewInternalError after time out, otherwise the client will retry again.
| return nil, err | ||
| } | ||
| return url.Parse(fmt.Sprintf("%s://%s:%d", m.Config.HostOrchestratorProtocol, addr, m.Config.GCP.HostOrchestratorPort)) | ||
| return getHostURLWithIns(ins, &m.Config) |
There was a problem hiding this comment.
Can you have getHostURLWithIns as a method of GCEInstanceManager in order to reuse its Config field rather than have it to pass as a parameter? Same for other similar functions in this PR
While interacting with host orchestrator right after creating host
API is returned, there is a possibility of getting 503 service
temporarily unavailable error because host it created and booted but
host orchestrator is not executed / ready yet.
Handling logic for waiting host orchestrator readiness is currently
in the client logic. Moving this logic into cloud orchestrator API
endpoint would make the client that uses host chestrator API simpler.
Context: b/501288123