Skip to content

Commit d6612c5

Browse files
authored
[fm] only you can prevent inventory collection time travel (#10266)
Depends on #10258. As described in #10260 (and [§8.2 RFD 603]), **only _you_ can prevent inventory collection time travel**: because each Nexus that might participate in FM analysis has its own copy of the inventory which it considers to be current (loaded by its `inventory_load` background task). Since we must allow for the possibility of a particular Nexus being arbitrarily behind, we can easily imagine a scenario where a Nexus attempts to perform FM analysis and generate a sitrep while operating on a copy of the inventory that is actually *older* than the inventory that was used to produce the current sitrep. If this happens, we might incorrectly think that the state of the system has _changed_ from the state in the current sitrep, while actually, we are just operating on an outdated understanding. This branch fixes that. In #10260, I proposed addressing this by making the CTE that inserts a sitrep into the sitrep history only succeed if the inventory collection UUID in that sitrep's metadata is newer than the one in the current sitrep's metadata. This would probably have worked, but putting that check in the already somewhat hairy CTE was not actually necessary. Instead, we can just perform this check *before* actually doing any analysis, which feels a lot nicer. In order to do this, it was necessary to add a query to just go and fetch only the timestamps from an inventory by its UUID, which was not a huge deal. I think it is plausible that the "is strictly newer than" check for inventory collections may be useful elsewhere as well. Fixes #10260 [§8.2 RFD 603]: https://rfd.shared.oxide.computer/rfd/0603#_inventory
1 parent a961f76 commit d6612c5

21 files changed

Lines changed: 469 additions & 33 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

dev-tools/omdb/src/bin/omdb/db/sitrep.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,7 @@ async fn cmd_db_sitrep_show(
246246
parent_sitrep_id,
247247
inv_collection_id,
248248
comment,
249+
next_inv_min_time_started,
249250
} = metadata;
250251

251252
const ID: &'static str = "ID";
@@ -259,6 +260,8 @@ async fn cmd_db_sitrep_show(
259260
const INV_COLLECTION_ID: &'static str = "inventory collection ID";
260261
const INV_STARTED_AT: &'static str = " started at";
261262
const INV_FINISHED_AT: &'static str = " finished at";
263+
const NEXT_INV_MIN_START: &'static str =
264+
" next inventory minimum start time";
262265
const TOTAL_EREPORTS: &'static str = "ereports in this sitrep";
263266

264267
const WIDTH: usize = const_max_len(&[
@@ -273,6 +276,7 @@ async fn cmd_db_sitrep_show(
273276
INV_COLLECTION_ID,
274277
INV_STARTED_AT,
275278
INV_FINISHED_AT,
279+
NEXT_INV_MIN_START,
276280
TOTAL_EREPORTS,
277281
]);
278282

@@ -346,6 +350,8 @@ async fn cmd_db_sitrep_show(
346350
)
347351
}
348352
}
353+
println!(" {NEXT_INV_MIN_START:>WIDTH$}: {next_inv_min_time_started}");
354+
println!(" ");
349355
println!(" {TOTAL_EREPORTS}: {}", ereports_by_id.len());
350356
// TODO(eliza): perhaps display a table summarizing those ereports? possibly
351357
// behind a verbose flag?

dev-tools/omdb/src/bin/omdb/nexus.rs

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3485,6 +3485,7 @@ fn print_task_fm_analysis(details: &serde_json::Value) {
34853485
use nexus_types::internal_api::background::fm_analysis::{
34863486
AnalysisOutcome, AnalysisStatus, Outcome, PreparationStatus,
34873487
};
3488+
34883489
let FmAnalysisStatus { parent_sitrep_id, inv_collection_id, outcome } =
34893490
match serde_json::from_value::<FmAnalysisStatus>(details.clone()) {
34903491
Err(error) => {
@@ -3512,6 +3513,30 @@ fn print_task_fm_analysis(details: &serde_json::Value) {
35123513
);
35133514
return;
35143515
}
3516+
Outcome::WaitingForNewerInventory {
3517+
parent_inv_id,
3518+
next_inv_min_time_started,
3519+
input_inv_time_started,
3520+
} => {
3521+
const PARENT_INV: &str = "parent sitrep's inventory ID:";
3522+
const MIN_STARTED: &str = "earliest start time for next inventory:";
3523+
const LOADED_STARTED: &str = "loaded inventory started at:";
3524+
const WIDTH: usize =
3525+
const_max_len(&[PARENT_INV, MIN_STARTED, LOADED_STARTED]) + 1;
3526+
println!(
3527+
" waiting for a newer inventory collection than the one \
3528+
in the parent sitrep"
3529+
);
3530+
let min_started = humantime::format_rfc3339_millis(
3531+
next_inv_min_time_started.into(),
3532+
);
3533+
let loaded_started =
3534+
humantime::format_rfc3339_millis(input_inv_time_started.into());
3535+
println!(" {PARENT_INV:<WIDTH$}{parent_inv_id}");
3536+
println!(" {MIN_STARTED:<WIDTH$}{min_started}");
3537+
println!(" {LOADED_STARTED:<WIDTH$}{loaded_started}");
3538+
return;
3539+
}
35153540
Outcome::PreparationError(error) => {
35163541
println!(
35173542
"{ERRICON} failed to prepare analysis inputs:\n {error}"

nexus/db-model/src/fm.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ pub struct SitrepMetadata {
3838
pub time_created: DateTime<Utc>,
3939
pub creator_id: DbTypedUuid<OmicronZoneKind>,
4040
pub comment: String,
41+
pub next_inv_min_time_started: DateTime<Utc>,
4142
}
4243

4344
impl From<SitrepMetadata> for nexus_types::fm::SitrepMetadata {
@@ -49,12 +50,14 @@ impl From<SitrepMetadata> for nexus_types::fm::SitrepMetadata {
4950
creator_id,
5051
comment,
5152
time_created,
53+
next_inv_min_time_started,
5254
} = db_meta;
5355
Self {
5456
id: id.into(),
5557
parent_sitrep_id: parent_sitrep_id.map(Into::into),
5658
inv_collection_id: inv_collection_id.into(),
5759
creator_id: creator_id.into(),
60+
next_inv_min_time_started,
5861
comment,
5962
time_created,
6063
}
@@ -70,6 +73,7 @@ impl From<nexus_types::fm::SitrepMetadata> for SitrepMetadata {
7073
creator_id,
7174
comment,
7275
time_created,
76+
next_inv_min_time_started,
7377
} = db_meta;
7478
Self {
7579
id: id.into(),
@@ -78,6 +82,7 @@ impl From<nexus_types::fm::SitrepMetadata> for SitrepMetadata {
7882
creator_id: creator_id.into(),
7983
comment,
8084
time_created,
85+
next_inv_min_time_started,
8186
}
8287
}
8388
}

nexus/db-model/src/schema_versions.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ use std::{collections::BTreeMap, sync::LazyLock};
1616
///
1717
/// This must be updated when you change the database schema. Refer to
1818
/// schema/crdb/README.adoc in the root of this repository for details.
19-
pub const SCHEMA_VERSION: Version = Version::new(250, 0, 0);
19+
pub const SCHEMA_VERSION: Version = Version::new(251, 0, 0);
2020

2121
/// List of all past database schema versions, in *reverse* order
2222
///
@@ -28,6 +28,7 @@ pub static KNOWN_VERSIONS: LazyLock<Vec<KnownVersion>> = LazyLock::new(|| {
2828
// | leaving the first copy as an example for the next person.
2929
// v
3030
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
31+
KnownVersion::new(251, "fm-sitrep-next-inv-min-time-started"),
3132
KnownVersion::new(250, "inv-svc-enabled-not-online"),
3233
KnownVersion::new(249, "fm-support-bundle-request"),
3334
KnownVersion::new(248, "cleanup-orphaned-subnet-pool-silo-links"),

nexus/db-queries/src/db/datastore/fm.rs

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1779,6 +1779,7 @@ mod tests {
17791779
comment: "TEST SITREP PLEASE IGNORE".to_string(),
17801780
time_created: Utc::now(),
17811781
parent_sitrep_id: None,
1782+
next_inv_min_time_started: Utc::now(),
17821783
},
17831784
cases: Default::default(),
17841785
ereports_by_id: Default::default(),
@@ -1829,6 +1830,7 @@ mod tests {
18291830
comment: "TEST SITREP 1".to_string(),
18301831
time_created: Utc::now(),
18311832
parent_sitrep_id: None,
1833+
next_inv_min_time_started: Utc::now(),
18321834
},
18331835
cases: Default::default(),
18341836
ereports_by_id: Default::default(),
@@ -1844,6 +1846,7 @@ mod tests {
18441846
comment: "TEST SITREP 2".to_string(),
18451847
time_created: Utc::now(),
18461848
parent_sitrep_id: Some(sitrep1.id()),
1849+
next_inv_min_time_started: Utc::now(),
18471850
},
18481851
cases: Default::default(),
18491852
ereports_by_id: Default::default(),
@@ -1886,6 +1889,7 @@ mod tests {
18861889
comment: "TEST SITREP 1".to_string(),
18871890
time_created: Utc::now(),
18881891
parent_sitrep_id: None,
1892+
next_inv_min_time_started: Utc::now(),
18891893
},
18901894
cases: Default::default(),
18911895
ereports_by_id: Default::default(),
@@ -1902,6 +1906,7 @@ mod tests {
19021906
comment: "TEST SITREP WITH BAD PARENT".to_string(),
19031907
time_created: Utc::now(),
19041908
parent_sitrep_id: Some(nonexistent_id),
1909+
next_inv_min_time_started: Utc::now(),
19051910
},
19061911
cases: Default::default(),
19071912
ereports_by_id: Default::default(),
@@ -1938,6 +1943,7 @@ mod tests {
19381943
comment: "TEST SITREP 1".to_string(),
19391944
time_created: Utc::now(),
19401945
parent_sitrep_id: None,
1946+
next_inv_min_time_started: Utc::now(),
19411947
},
19421948
cases: Default::default(),
19431949
ereports_by_id: Default::default(),
@@ -1953,6 +1959,7 @@ mod tests {
19531959
comment: "TEST SITREP 2".to_string(),
19541960
time_created: Utc::now(),
19551961
parent_sitrep_id: Some(sitrep1.id()),
1962+
next_inv_min_time_started: Utc::now(),
19561963
},
19571964
cases: Default::default(),
19581965
ereports_by_id: Default::default(),
@@ -1969,6 +1976,7 @@ mod tests {
19691976
comment: "TEST SITREP 3 WITH OUTDATED PARENT".to_string(),
19701977
time_created: Utc::now(),
19711978
parent_sitrep_id: Some(sitrep1.id()),
1979+
next_inv_min_time_started: Utc::now(),
19721980
},
19731981
cases: Default::default(),
19741982
ereports_by_id: Default::default(),
@@ -2296,6 +2304,7 @@ mod tests {
22962304
.to_string(),
22972305
time_created: Utc::now(),
22982306
parent_sitrep_id: None,
2307+
next_inv_min_time_started: Utc::now(),
22992308
},
23002309
cases,
23012310
ereports_by_id,
@@ -2405,6 +2414,7 @@ mod tests {
24052414
comment: "sitrep with support bundle requests".to_string(),
24062415
time_created: Utc::now(),
24072416
parent_sitrep_id: None,
2417+
next_inv_min_time_started: Utc::now(),
24082418
},
24092419
cases,
24102420
ereports_by_id: Default::default(),
@@ -2456,6 +2466,7 @@ mod tests {
24562466
creator_id: OmicronZoneUuid::new_v4(),
24572467
comment: "my cool sitrep".to_string(),
24582468
inv_collection_id: CollectionUuid::new_v4(),
2469+
next_inv_min_time_started: Utc::now(),
24592470
},
24602471
cases: Default::default(),
24612472
ereports_by_id: Default::default(),
@@ -2612,6 +2623,7 @@ mod tests {
26122623
creator_id: OmicronZoneUuid::new_v4(),
26132624
comment: "my cool sitrep".to_string(),
26142625
inv_collection_id: CollectionUuid::new_v4(),
2626+
next_inv_min_time_started: Utc::now(),
26152627
},
26162628
cases: Default::default(),
26172629
ereports_by_id: Default::default(),
@@ -2745,6 +2757,7 @@ mod tests {
27452757
comment: "test sitrep".to_string(),
27462758
time_created: Utc::now(),
27472759
parent_sitrep_id: None,
2760+
next_inv_min_time_started: Utc::now(),
27482761
},
27492762
cases: Default::default(),
27502763
ereports_by_id: Default::default(),
@@ -2979,6 +2992,7 @@ mod tests {
29792992
comment: "test sitrep".to_string(),
29802993
time_created: Utc::now(),
29812994
parent_sitrep_id: None,
2995+
next_inv_min_time_started: Utc::now(),
29822996
},
29832997
cases: Default::default(),
29842998
ereports_by_id: Default::default(),
@@ -3105,6 +3119,7 @@ mod tests {
31053119
comment: "test sitrep".to_string(),
31063120
time_created: Utc::now(),
31073121
parent_sitrep_id: None,
3122+
next_inv_min_time_started: Utc::now(),
31083123
},
31093124
cases: Default::default(),
31103125
ereports_by_id: Default::default(),
@@ -3399,6 +3414,7 @@ mod tests {
33993414
creator_id: OmicronZoneUuid::new_v4(),
34003415
comment: "child sitrep".to_string(),
34013416
inv_collection_id: CollectionUuid::new_v4(),
3417+
next_inv_min_time_started: Utc::now(),
34023418
},
34033419
cases: Default::default(),
34043420
ereports_by_id: Default::default(),

nexus/db-schema/src/schema.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3135,6 +3135,7 @@ table! {
31353135
time_created -> Timestamptz,
31363136
creator_id -> Uuid,
31373137
comment -> Text,
3138+
next_inv_min_time_started -> Timestamptz,
31383139
}
31393140
}
31403141

nexus/fm/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ serde.workspace = true
2424
serde_json.workspace = true
2525
slog.workspace = true
2626
slog-error-chain.workspace = true
27+
thiserror.workspace = true
2728
typed-rng.workspace = true
2829

2930
# deps for test utils

nexus/fm/src/analysis_input.rs

Lines changed: 43 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,12 @@
44

55
//! Inputs to fault management analysis.
66
7+
use chrono::{DateTime, Utc};
78
use iddqd::IdOrdMap;
89
use nexus_types::fm::analysis_reports::ClosedCaseReport;
910
use nexus_types::fm::{self, Sitrep, SitrepVersion};
1011
use nexus_types::inventory;
12+
use omicron_uuid_kinds::CollectionUuid;
1113
use std::collections::BTreeMap;
1214
use std::collections::BTreeSet;
1315
use std::sync::Arc;
@@ -65,16 +67,49 @@ impl Input {
6567
pub fn builder(
6668
parent_sitrep: Option<Arc<(SitrepVersion, Sitrep)>>,
6769
inv: Arc<inventory::Collection>,
68-
) -> Builder {
69-
Builder {
70+
) -> Result<Builder, InvalidInputs> {
71+
// Before preparing analysis inputs, check that the proposed input
72+
// inventory collection is at least as new as the parent sitrep's
73+
// inventory collection.
74+
if let Some((_, ref parent)) = parent_sitrep.as_deref() {
75+
let parent = &parent.metadata;
76+
// It is always okay to produce a new sitrep based on the same
77+
// inventory collection as the parent sitrep...
78+
if parent.inv_collection_id != inv.id
79+
// ...but if they are not equal, the new inventory collection must
80+
// have started after the minimum start time to be considered
81+
// newer than the parent's.
82+
&& inv.time_started < parent.next_inv_min_time_started
83+
{
84+
return Err(InvalidInputs::InventoryStale {
85+
parent_inv_id: parent.inv_collection_id,
86+
next_inv_min_time_started: parent.next_inv_min_time_started,
87+
input_inv_time_started: inv.time_started,
88+
});
89+
}
90+
}
91+
Ok(Builder {
7092
parent_sitrep,
7193
inv,
7294
new_ereports: IdOrdMap::default(),
7395
unmarked_seen_ereports: BTreeSet::default(),
74-
}
96+
})
7597
}
7698
}
7799

100+
#[derive(Debug, Eq, PartialEq, thiserror::Error)]
101+
pub enum InvalidInputs {
102+
#[error(
103+
"inventory collection from {input_inv_time_started} is not new \
104+
enough: must have started at {next_inv_min_time_started} or later"
105+
)]
106+
InventoryStale {
107+
parent_inv_id: CollectionUuid,
108+
next_inv_min_time_started: DateTime<Utc>,
109+
input_inv_time_started: DateTime<Utc>,
110+
},
111+
}
112+
78113
#[must_use]
79114
pub struct Builder {
80115
parent_sitrep: Option<Arc<(SitrepVersion, Sitrep)>>,
@@ -213,7 +248,7 @@ mod tests {
213248
use nexus_types::fm::{DiagnosisEngineKind, SitrepVersion};
214249
use nexus_types::inventory::SpType;
215250
use omicron_uuid_kinds::{
216-
CaseEreportUuid, CaseUuid, CollectionUuid, OmicronZoneUuid, SitrepUuid,
251+
CaseEreportUuid, CaseUuid, OmicronZoneUuid, SitrepUuid,
217252
};
218253
use std::sync::Arc;
219254

@@ -404,10 +439,11 @@ mod tests {
404439
metadata: fm::SitrepMetadata {
405440
id: parent_sitrep_id,
406441
parent_sitrep_id: Some(SitrepUuid::new_v4()),
407-
inv_collection_id: CollectionUuid::new_v4(),
442+
inv_collection_id: inv.id,
408443
creator_id: OmicronZoneUuid::new_v4(),
409444
comment: "parent sitrep for test".to_string(),
410445
time_created: chrono::Utc::now(),
446+
next_inv_min_time_started: inv.time_done,
411447
},
412448
cases,
413449
ereports_by_id,
@@ -424,7 +460,8 @@ mod tests {
424460

425461
// Build analysis input
426462
let (input, report) = {
427-
let mut builder = Input::builder(Some(parent_sitrep), inv);
463+
let mut builder = Input::builder(Some(parent_sitrep), inv)
464+
.expect("collection start time check should always pass");
428465
// Pass in four ereports:
429466
// - two that are in the open cases of the parent sitrep
430467
// - one that is in the (to-be-copied-forward) closed case

nexus/fm/src/builder.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,11 @@ impl<'a> SitrepBuilder<'a> {
124124
creator_id,
125125
comment: self.comment,
126126
time_created,
127+
// When creating a new sitrep that is a child of this sitrep,
128+
// the input inventory collection must either be the same
129+
// inventory as this sitrep, or have started after this sitrep's
130+
// inventory collection ended.
131+
next_inv_min_time_started: self.inventory.time_done,
127132
},
128133
cases,
129134
ereports_by_id,

0 commit comments

Comments
 (0)