Skip to content

fm: minimum viable sitrep equality check#10322

Closed
hawkw wants to merge 2 commits intomainfrom
eliza/sitrep-eq
Closed

fm: minimum viable sitrep equality check#10322
hawkw wants to merge 2 commits intomainfrom
eliza/sitrep-eq

Conversation

@hawkw
Copy link
Copy Markdown
Member

@hawkw hawkw commented Apr 24, 2026

This branch implements the sitrep equality check that I left unimplemented in #10258. Now, when a new sitrep is generated, we check if it represents the same situation as the parent sitrep, and if it does not, we will insert it into the database. This is done using a normal PartialEq implementation. I punted on implementing proper diffing support (see #10259), since it's not actually necessary to test if two sitreps are equivalent.

In order to make this a bit easier to implement correctly, I refactored the nexus_types::fm::Sitrep struct to pull the non-metadata fields (the map of ereports and the map of cases, currently) into a new SitrepData struct, so that the top-level Sitrep now consists of SitrepMetadata and SitrepData. This is intended to make it easier to implement the comparison we need: checking that the sitrep contains identical cases and ereports to the parent, while ignoring the sitrep's ID, timestamp, and nexus ID (as these will differ for otherwise equal sitreps). I decided to factor out the data into its own struct so that the struct can just derive PartialEq, and we can test that the data is equal. This way, we don't have to individually check that the non-metadata fields are equal, which felt error-prone --- we don't want to have to remember to add any new non-metadata fields to the comparison later, since forgetting to do so might result in a non-equal sitrep being accidentally considered unchanged.

@hawkw hawkw requested review from mergeconflict and smklein April 24, 2026 16:37
Copy link
Copy Markdown
Contributor

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like it. Just a few minorish things.

pub async fn fm_sitrep_insert(
&self,
opctx: &OpContext,
sitrep: Sitrep,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would it make sense to exhaustively destructure sitrep into data and metadata up front here?

Comment thread nexus/types/src/fm.rs
Comment on lines +109 to +110
/// A map of all ereports associated with cases in this sitrep.
pub ereports_by_id: IdOrdMap<Arc<Ereport>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should still have ereports_by_id or, if we really need it, whether it should live somewhere else. I'm thinking about it because the PartialEq is looking at the set of ereports twice: once when traversing the cases, and again in aggregate here. If the two could ever drift somehow, then we'd have weirdness.

Could we just get away with an ereports_by_id() method that lazily computes the map?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the map is used for determining which ereports have been included in the sitrep when running the next preparation step, we do need it. I'd prefer not to have to re-build the map every time we do that. I'd also really prefer to not build it when the sitrep is constructed, since the sitrep loader already constructs this map while loading the sitrep parts from the database, here:

// Fetch all ereports assigned to cases in this sitrep. We do this by
// querying the `fm_ereport_in_case` table for all entries with this
// sitrep ID, paginated by the ereport assignment's UUID. This query is
// `INNER JOIN`ed with the `fm_ereport` table to fetch the ereport's
// data for that assignment.
//
// We use the results of this query to populate a map of case UUIDs to
// the map of ereports assigned to that case. Ereports are de-duplicated
// using an additional map of `Arc`ed ereports, to reduce the in-memory
// size of the sitrep when an ereport is assigned to multiple cases. the
// JOINed query *will* potentially load the same ereport multiple times
// in that case, but this is still probably much more efficient than
// issuing a bunch of smaller queries to load ereports individually.
let mut ereports = iddqd::IdOrdMap::<Arc<fm::Ereport>>::new();

It feels pretty sad to build the map once while loading the sitrep, throw it away, and then have a Sitrep::new that immediately iterates over all the cases and re-builds a new copy of the same map we just made while loading and then threw away.

I think it would be fine to move this out of sitrep_data and into a top-level field, so that it's not part of the equality check; you're right that checking it separately from checing the cases should always be superfluous. Then, SitrepData has only one field in it, which feels silly...but the main purpose of that struct was just to ensure that any other non-case top-level sitrep objects go there, and we don't have any of those yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is making me reconsider. I'm starting to think this data structure split is just in service of being able to derive the desired PartialEq, rather than being semantically meaningful. Like you point out, ereports_by_id is a useful view of the data — it's not metadata, and you don't want to recompute it every time you need it — but we don't really want to include it in our PartialEq implementation either.

How would you feel about just writing impl PartialEq for Sitrep by hand? You'd want to destructure it completely and then just compare the cases; that doesn't seem too bad.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, lemme think about it. On one hand, you're right that we could write an impl PartialEq for Sitrep that just ignores some fields, and uses exhaustive destructuring to ensure we don't forget to add any new fields that should be part of the equality check. This would be a much less invasive change than adding the SitrepData struct and having to change everything to access fields through it. On the other hand, I felt like there might be cases where we do want the ability to test whether two Sitrep structs are exactly equal, including their metadata --- I nearly offered tests as an example of this, but remembered that most of the tests that roundtrip through the database don't currently use the impl PartialEq for Sitrep due to the timestamps losing precision when read back from CRDB.

In general, I don't love the idea of having "opinionated" PartialEq implementations that skip potentially unequal fields, since it feels like one could just assume that sitrep_1 == sitrep_2 checks that they are exactly the same and not be aware that actually, the equality only checks that the cases are the same. The nice thing about writing sitrep_1.data == sitrep_2.data is that you are explicitly saying "I don't want to compare the metadata here" at the call site, rather than having to remember what the PartialEq impl for the whole struct does. I do have an idea for a potential third option, though; lemme sketch it out for you...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an alternate design for this that makes it explicit when we are comparing the represented state, but also doesn't require changing the struct layout: https://github.com/oxidecomputer/omicron/pull/10325/changes

What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that works! Could also just be a plain function, whatever makes the call site look nice.

start_time,
end_time,
report,
outcome: status::AnalysisOutcome::Unchanged,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What test coverage do we have for the different AnalysisOutcome variants? For example, if the equality comparison returns false, do we test that the outcome is Committed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't easily test this yet as we never actually perform any analysis and it will always be "analysis failed". Once we are actually performing some analysis, I'll add tests for analysis steps that don't change the sitrep. We could try to refactor this code a bit more so that it's easier to unit test everything that happens after the diagnosis engines actually run, by manually constructing sitreps which should or should not be Unchanged, but I'm not convinced it's worth it.

Comment thread nexus/types/src/fm.rs
}
}

/// The actual sitrep.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ehhh what is "actual," actually?

@hawkw hawkw closed this in #10325 Apr 25, 2026
hawkw added a commit that referenced this pull request Apr 25, 2026
This branch implements the sitrep equality check that I left
unimplemented in #10258. Now, when a new sitrep is generated, we check
if it represents the same situation as the parent sitrep, and if it does
not, we will insert it into the database. This is done using a normal
`PartialEq` implementation. I punted on implementing proper diffing
support (see #10259), since it's not actually necessary to test if two
sitreps are equivalent.

This is an alternate design from the one I implemented in #10322. Rather
than factoring out the sitrep's state into a new struct, which was a bit
awkward, we just add a wrapper type for doing state equivalence
comparisons.

Closes #10322.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants