[3/3] Add omdb support-bundle collect subcommand#10376
Conversation
b8d58f2 to
adecbd6
Compare
5eefdfd to
9022f6d
Compare
adecbd6 to
d20d559
Compare
9022f6d to
046faaa
Compare
Replaces `BundleCollection.bundle: SupportBundle` with a slim
`BundleInfo { id, reason_for_creation }`. Moves the sled-storage
chunked transfer (`store_bundle_on_sled`), zip helpers
(`bundle_to_zipfile`, `recursively_add_directory_to_zipfile`,
`sha2_hash`), the `CHUNK_SIZE` and `TEMPDIR` constants, and the
DB-polling cancellation (`check_for_cancellation`) out of the inner
`support_bundle/` module and into `support_bundle_collector.rs`.
After this change the inner layer is a pure mechanism: it never reads
the `support_bundle` DB row, never talks to a sled-agent's bundle
storage endpoints, and treats CRDB only as a source of facts about
sleds, ereports, and blueprints. The outer collector remains the
manager of the bundle lifecycle.
This is the first step toward a future shared crate that omdb can use
to collect bundles when Nexus is down.
Lifts `nexus/src/app/background/tasks/support_bundle/` (the mechanism layer) into a new top-level crate `support-bundle-collection` so that both Nexus and omdb can call it. No logic changes; pure relocation plus import rewriting.
d20d559 to
863c24b
Compare
1310da4 to
b26398c
Compare
Wires a new subcommand on omdb that calls into the `support-bundle-collection` crate to gather a bundle locally. Unlike the Nexus background task, this path does not register a row in the `support_bundle` table, does not transfer the bundle to a sled agent, and does not require Nexus to be up — it only needs CRDB, internal DNS, MGS, and sled-agents reachable on the underlay. This is intended for incident response: when Nexus is down (the most important time to gather a bundle), an operator can still produce one locally.
b26398c to
d180d48
Compare
wfchandler
left a comment
There was a problem hiding this comment.
Thanks for putting this together! Just a couple questions.
| #[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, ValueEnum)] | ||
| enum BundleCategory { | ||
| Reconfigurator, | ||
| HostInfo, | ||
| SledCubbyInfo, | ||
| SpDumps, | ||
| Ereports, | ||
| } |
There was a problem hiding this comment.
This seems like something that could fall out of sync pretty easily. Could we add a test asserting our copy has the same variants as the original?
There was a problem hiding this comment.
I think our dependencies line up in such a way that it's actually not too bad to just "directly use the nexus_types type". I'll do that instead.
| /// Path where the resulting bundle zip will be written. | ||
| #[clap(long, short = 'o')] | ||
| output: Utf8PathBuf, |
There was a problem hiding this comment.
It would be useful to have an option for streaming the bytes of the bundle directly to stdout. I think we had discussed the possibility of in our meeting a few weeks ago.
Using the switch zone to store both the final archive bundle and its constituent parts in tempdir feels like a bit of a risk when requesting a full bundle.
There was a problem hiding this comment.
Sure thing, added the ability to stream to stdout in c1a72bb (avoiding the intermediate file)
| if let Some(ereports) = &report.ereports { | ||
| eprintln!( | ||
| "ereports: {} found, {} collected, {} errors", | ||
| ereports.n_found, | ||
| ereports.n_collected, | ||
| ereports.errors.len(), | ||
| ); | ||
| } |
There was a problem hiding this comment.
I'm not necessarily opposed to this, but I'm curious why ereports merit a special call-out here, versus, say, the number of SP dumps.
There was a problem hiding this comment.
it is a little arbitrary - ereports are currently the only support bundle collection step that can silently skip individual items (where "n_found != n_collected", with ereport.errors justifying "why"). Other collection steps are more pass/fail, so their steps describe them.
I could try to store this some other way, if we want, but it's currently describing the format of SupportBundleCollectionReport.
I could try to save this summary info within the bundle itself, but maybe that should be a follow-up PR, as it changes the contents of bundles themselves?
There was a problem hiding this comment.
Thanks for explaining. Agree nothing needs to change here, should we file an issue for adding a bundle summary?
wfchandler
left a comment
There was a problem hiding this comment.
Thanks again! Nothing further from me
| if let Some(ereports) = &report.ereports { | ||
| eprintln!( | ||
| "ereports: {} found, {} collected, {} errors", | ||
| ereports.n_found, | ||
| ereports.n_collected, | ||
| ereports.errors.len(), | ||
| ); | ||
| } |
There was a problem hiding this comment.
Thanks for explaining. Agree nothing needs to change here, should we file an issue for adding a bundle summary?
Wires a new subcommand on omdb that calls into the
support-bundle-collectioncrate to gather a bundle locally. Unlike the Nexus background task, this path does not register a row in thesupport_bundletable, does not transfer the bundle to a sled agent, and does not require Nexus to be up — it only needs CRDB, internal DNS, MGS, and sled-agents reachable on the underlay.When Nexus is down (an important time to gather a bundle), an operator can still produce one locally.
Builds on #10374 and #10375
Fixes #10371