fix: GBx00 misclassified as DPU during PXE boot#2932
Conversation
Signed-off-by: Krish Dandiwala <kdandiwala@nvidia.com>
Summary by CodeRabbit
WalkthroughThe ARM fallback in ChangesARM fallback classification
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/api-core/src/ipxe.rs (1)
369-386: 🎯 Functional Correctness | 🟠 MajorAdd a regression test for the ARM non-DPU fallback.
endpoint.report.is_dpu()is the right predicate here, but there is still no test for the exact case this fixes: ARM PXE with a BlueField part number in inventory where the endpoint is not a DPU BMC. The current coverage exercises the DPU path viatarget.product, so this branch can still regress silently. Add a case that assertsaarch64/scout.efifor that scenario.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-core/src/ipxe.rs` around lines 369 - 386, Add a regression test covering the ARM non-DPU fallback in the PXE logic so this branch cannot regress silently. Update the test coverage around `PxeInstructions::get_pxe_instruction_for_arch` in `ipxe.rs` to use an ARM target with a BlueField part number in inventory while `endpoint.report.is_dpu()` is false, and assert it selects the ARM fallback output `aarch64/scout.efi`. Keep the existing DPU-path coverage intact and add this as the distinct non-DPU case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/api-core/src/ipxe.rs`:
- Around line 369-386: Add a regression test covering the ARM non-DPU fallback
in the PXE logic so this branch cannot regress silently. Update the test
coverage around `PxeInstructions::get_pxe_instruction_for_arch` in `ipxe.rs` to
use an ARM target with a BlueField part number in inventory while
`endpoint.report.is_dpu()` is false, and assert it selects the ARM fallback
output `aarch64/scout.efi`. Keep the existing DPU-path coverage intact and add
this as the distinct non-DPU case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eae6d5fa-c383-4cc7-bee6-27d90374c889
📒 Files selected for processing (1)
crates/api-core/src/ipxe.rs
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
Problem
On ARM hosts without a minted machine ID yet, the PXE boot path classified a machine as a DPU if its explored endpoint had any BlueField part number in its chassis inventory (
has_bluefield_part_number()). The GBx00 host BMC reports its BlueField-3 as a chassis object, so the host was misclassified as a DPU and served the DPU OS (carbide.efi/carbide.root) instead of the host image (scout.efi).Fix
In
crates/api-core/src/ipxe.rs, we now treat an endpoint as a DPU only when the explored endpoint is itself a DPU BMC (endpoint.report.is_dpu(), i.e.Systems[0].Id == "Bluefield") rather than merely containing a BlueField part number.Related issues
#2930
Type of Change
Breaking Changes
Testing
Additional Notes