[multicast,instance,test-flake] reorder instance_stop and send stop before tearing down multicast member state by zeeshanlakhani · Pull Request #10402 · oxidecomputer/omicron

zeeshanlakhani · 2026-05-07T16:32:57Z

Handles (closes) #9711 (was fixed downstream in a PR'ed branch).

…efore tearing down multicast member state Handles (closes) #9711 (was fixed downstream in a PR'ed branch).

zeeshanlakhani · 2026-05-22T00:38:58Z

@jgallagher ~~minor one up here btw~~ No so minor, but I think this is much cleaner now. I ran the multicast suite on this (locally), which is turned off on non #9912 affiliated branches.

This addresses a review comment on the synchronous `multicast_group_members_detach_by_instance` call introduced by the prior reorder, as a transient DB failure would 500 the caller for a stop that already succeeded at the sled, while short-circuiting past the reconciler activation. The state would be self-healing via the reconciler, but the visible failure for a successful op would be a regression. Instead, we move the detach into the `instance_update` saga's `siu_commit_instance_updates` action, gated on `update.deprovision.is_some()`, where the saga signals an instance has reached no-active-VMM state. That saga already orchestrates terminal-VMM cleanup, so the detach fits naturally there. As a side effect, this also covers guest-initiated shutdown and sled-agent-reported failure paths that the `instance_stop` callsite hadn't covered. And, we still have the reconciler to check through things. This change also includes a reconciler nudge in `instance_stop` in the case where the saga does not fire if there were no terminal VMM transition to drive it in the first place (instead of waiting for the full reconciler pass on next tick).

zeeshanlakhani · 2026-05-23T04:10:38Z

This included a merge of main over the top of the fixes.

jgallagher

I don't do much with instance sagas or multicast, but the changes LGTM and definitely look more correct than the original version. Happy to approve based on your testing, but I won't be offended if you want to wait for an approval from someone else with more context.

jgallagher · 2026-05-26T14:50:58Z

+            .await
+        {
+            info!(log,
+                  "instance update: failed to detach multicast members on deprovision, next reconciler pass will retry";


Trivial nit with my apologies - mind wrapping this to 80 columns? rustfmt doesn't try to split long strings, or do much with slog macros :(

oh for sure (this is not a standard across the org repos, but I usually come back to these, so good catch).

[multicast,instance,test-flake] reorder instance_stop and send stop b…

7af0a7f

…efore tearing down multicast member state Handles (closes) #9711 (was fixed downstream in a PR'ed branch).

zeeshanlakhani requested review from FelixMcFelix and jgallagher May 7, 2026 16:32

zeeshanlakhani self-assigned this May 10, 2026

merge main

286c487

jgallagher reviewed May 22, 2026

View reviewed changes

Comment thread nexus/src/app/instance.rs Outdated

zeeshanlakhani added 2 commits May 23, 2026 04:00

merge main

b0603bf

jgallagher approved these changes May 26, 2026

View reviewed changes

[review] address 80 cols

d48205f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multicast,instance,test-flake] reorder instance_stop and send stop before tearing down multicast member state#10402

[multicast,instance,test-flake] reorder instance_stop and send stop before tearing down multicast member state#10402
zeeshanlakhani wants to merge 5 commits into
mainfrom
zl/flake-test_join_by_ip_existing_group

zeeshanlakhani commented May 7, 2026

Uh oh!

zeeshanlakhani commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

zeeshanlakhani commented May 23, 2026

Uh oh!

jgallagher left a comment

Uh oh!

jgallagher May 26, 2026

Uh oh!

zeeshanlakhani May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zeeshanlakhani commented May 7, 2026

Uh oh!

zeeshanlakhani commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zeeshanlakhani commented May 23, 2026

Uh oh!

jgallagher left a comment

Choose a reason for hiding this comment

Uh oh!

jgallagher May 26, 2026

Choose a reason for hiding this comment

Uh oh!

zeeshanlakhani May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zeeshanlakhani commented May 22, 2026 •

edited

Loading