Move waiting logic for host orchestrator settlement to backend by k311093 · Pull Request #521 · google/cloud-android-orchestration

k311093 · 2026-04-15T02:23:16Z

While interacting with host orchestrator right after creating host
API is returned, there is a possibility of getting 503 service
temporarily unavailable error because host it created and booted but
host orchestrator is not executed / ready yet.

Handling logic for waiting host orchestrator readiness is currently
in the client logic. Moving this logic into cloud orchestrator API
endpoint would make the client that uses host chestrator API simpler.

Context: b/501288123

0405ysj · 2026-04-15T05:09:39Z

+	if err != nil {
+		return nil, err
+	}
+	client, err := g.InstanceManager.GetHostClient(g.Zone, host.Name)


From GceIM, GetHostClient internally invokes getHostInstance, which seems to be calling GCE API to retrieve the host. However, this method already has ins which is typed as *compute.Instance.

I think it's better not to get *compute.Instance twice, also not to bring new fields(InstanceManager, Zone) into opResultGetter.

Changed logic to pass *compute.Instance and reuse it if specified.

ser-io

Adding an inside logic to wait for when the host is ready to interact with shouldn't be part of the CreateHost request logic on the backend side.

CreateHost returns success when the host is created and host resources are allocated, not when the host is ready to interact with. To do this properly, you'd need to add and implement a new http endpoint that tests whether the host is ready, still clients would need to make these calls after CreateHost returned in order to determine whether the host is ready to interact with. However, it's up to every client implementation how they want to handle this, they can always keep doing what the current client is doing.

jemoreira · 2026-04-15T17:35:51Z

Adding an inside logic to wait for when the host is ready to interact with shouldn't be part of the CreateHost request logic on the backend side.

CreateHost returns success when the host is created and host resources are allocated, not when the host is ready to interact with.

What use is a host that doesn't have a host orchestrator? If the answer is "none", then it would make sense to only mark the operation to create a host as completed once the host orchestrator is ready. It simplifies the api and puts the need for waiting in a single place (the cloud orchestrator) rather than in every client. The CO is already waiting for the host to be created, it might as well wait a bit longer.

jemoreira · 2026-04-15T17:38:00Z

Context: b/501288123

Please add more details to the description and the title of the PR.

What is "host orchestrator settlement"?

Which backend? The HO or the CO or something else?

Why is this change useful/needed? What are the pros and cons? Even if you write a description in the bug that's not publicly available like this PR is.

k311093 · 2026-04-16T01:29:28Z

Context: b/501288123

Please add more details to the description and the title of the PR.

What is "host orchestrator settlement"?

Which backend? The HO or the CO or something else?

Why is this change useful/needed? What are the pros and cons? Even if you write a description in the bug that's not publicly available like this PR is.

Updated commit message with descriptions.

0405ysj · 2026-04-16T01:56:20Z

What use is a host that doesn't have a host orchestrator? If the answer is "none"

+1. I expect users are understanding the meaning of host as a host environment to launch Cuttlefish instances, and the route of orchestrating CF instances is Host Orchestrator. I think health of Host Orchestrator is more representative than health of computing unit(e.g. GCE or docker instance).

From API view, I expect users create a host and verify with 3 steps; POST /v1/zones/{zone}/hosts, POST /v1/zones/{zone}/operations/{operation}/:wait, GET /v1/zones/{zone}/hosts/{host}/. Invoking those endpoints should be responsible by clients though. But having another round of wait by setting retry count for requesting GET /v1/zones/{zone}/hosts/{host}/ would be unexpected behavior in users' perspective, I think.

ser-io · 2026-06-11T17:31:54Z

-		return "", err
-	}
-	ilen := len(instance.NetworkInterfaces)
+func getHostAddrWithIns(ins *compute.Instance) (string, error) {


Move these new set of helper functions as a continuations block in the end of the file getHostAddrWithIns, getHostURLWithIns to the end. So methods like func (m *GCEInstanceManager) GetHostAddr and func (m *GCEInstanceManager) GetHostURL continued to be grouped in the same block.

Moved function near its context to make it more easy read.

While interacting with host orchestrator right after creating host API is returned, there is a possibility of getting 503 service temporarily unavailable error because host is created and booted but host orchestrator is not executed / ready yet. Handling logic for waiting host orchestrator readiness is currently in the client logic. Moving this logic into cloud orchestrator API endpoint makes the client simpler and reduces redundant API endpoints. Apply it first for Docker instance manager. Context: b/501288123

While interacting with host orchestrator right after creating host API is returned, there is a possibility of getting 503 service temporarily unavailable error because host is created and booted but host orchestrator is not executed / ready yet. Handling logic for waiting host orchestrator readiness is currently in the client logic. Moving this logic into cloud orchestrator API endpoint makes the client simpler and reduces redundant API endpoints. Apply it for GCE instance manager. Context: b/501288123

ser-io · 2026-06-17T21:34:09Z

+		}
+		time.Sleep(retryDelay)
+	}
+	return errors.NewServiceUnavailableError("wait for host orchestrator timed out", nil)


Let's now return NewInternalError after time out, otherwise the client will retry again.

ser-io · 2026-06-17T21:43:56Z

 		return nil, err
 	}
-	return url.Parse(fmt.Sprintf("%s://%s:%d", m.Config.HostOrchestratorProtocol, addr, m.Config.GCP.HostOrchestratorPort))
+	return getHostURLWithIns(ins, &m.Config)


Can you have getHostURLWithIns as a method of GCEInstanceManager in order to reuse its Config field rather than have it to pass as a parameter? Same for other similar functions in this PR

k311093 force-pushed the wait branch 5 times, most recently from deec9a3 to 2091716 Compare April 15, 2026 04:34

k311093 marked this pull request as ready for review April 15, 2026 04:45

k311093 requested review from Databean, adelva1984, jemoreira, jmacnak, rmuthiah and ser-io as code owners April 15, 2026 04:45

k311093 requested review from 0405ysj and ikicha April 15, 2026 04:45

0405ysj reviewed Apr 15, 2026

View reviewed changes

k311093 force-pushed the wait branch 2 times, most recently from 493fffb to b5e8c57 Compare April 15, 2026 05:59

0405ysj reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/app/instances/gce.go Outdated

ser-io requested changes Apr 15, 2026

View reviewed changes

Comment thread pkg/app/instances/gce.go Outdated

k311093 force-pushed the wait branch from b5e8c57 to c8ef926 Compare April 16, 2026 01:15

k311093 force-pushed the wait branch 2 times, most recently from b3d13e7 to 8d2831f Compare April 16, 2026 03:55

k311093 requested review from 0405ysj and ser-io April 16, 2026 04:47

0405ysj reviewed Apr 16, 2026

View reviewed changes

Comment thread pkg/app/instances/gce.go Outdated

Comment thread pkg/app/instances/gce.go Outdated

k311093 requested a review from ser-io May 27, 2026 05:18

ser-io requested changes May 27, 2026

View reviewed changes

Comment thread pkg/app/app.go Outdated

k311093 force-pushed the wait branch 5 times, most recently from c4d69b0 to ce31cf8 Compare June 4, 2026 06:07

k311093 force-pushed the wait branch 6 times, most recently from 427a0a1 to 0ef723f Compare June 8, 2026 05:02

k311093 requested a review from ser-io June 8, 2026 05:05

ser-io requested changes Jun 11, 2026

View reviewed changes

k311093 force-pushed the wait branch 2 times, most recently from 60566b9 to 98a9f86 Compare June 15, 2026 05:39

k311093 requested a review from ser-io June 15, 2026 06:25

k311093 force-pushed the wait branch from 98a9f86 to 52c26c0 Compare June 15, 2026 07:02

ser-io requested changes Jun 15, 2026

View reviewed changes

Comment thread pkg/app/instances/instances.go Outdated

Comment thread pkg/app/instances/instances.go Outdated

Comment thread pkg/app/instances/instances.go Outdated

k311093 force-pushed the wait branch 2 times, most recently from 1d29e2e to 4dff5ee Compare June 16, 2026 04:48

k311093 requested a review from ser-io June 16, 2026 05:05

ser-io requested changes Jun 16, 2026

View reviewed changes

k311093 force-pushed the wait branch 2 times, most recently from a7c8ab3 to 1ec8af0 Compare June 17, 2026 08:10

k311093 added 2 commits June 17, 2026 17:10

k311093 force-pushed the wait branch from 1ec8af0 to 151582a Compare June 17, 2026 08:10

ser-io requested changes Jun 17, 2026

View reviewed changes

Conversation

k311093 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0405ysj Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

k311093 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ser-io left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jemoreira commented Apr 15, 2026

Uh oh!

jemoreira commented Apr 15, 2026

Uh oh!

k311093 commented Apr 16, 2026

Uh oh!

0405ysj commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ser-io Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

k311093 Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ser-io Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ser-io Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

k311093 commented Apr 15, 2026 •

edited

Loading

ser-io left a comment •

edited

Loading

k311093 Jun 15, 2026 •

edited

Loading