Conversation
07cd55a to
8e74111
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5758 +/- ##
==========================================
- Coverage 82.99% 82.91% -0.09%
==========================================
Files 275 275
Lines 29399 29380 -19
==========================================
- Hits 24401 24361 -40
- Misses 4998 5019 +21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| } else { | ||
| self.configuration | ||
| .write_config_register(reg_idx, offset, data); | ||
| if BAR0_REG as usize <= reg_idx && reg_idx < (BAR0_REG + NUM_BAR_REGS) as usize { |
There was a problem hiding this comment.
nit: would a match or a if/else if/else be more clear in this case?
There was a problem hiding this comment.
if self.cap_pci_cfg_info.offset would be changed to be a register and not a byte offset, then this can be a match. But I did not want to touch unrelated code here. In general, I think we should agree on if we use registers or byte_offsets in the PCI code. I can do a follow up PR with this (or push it into #5752).
There was a problem hiding this comment.
what I was thinking was something like
let cap_range = self.cap_pci_cfg_info.offset
..self.cap_pci_cfg_info.offset + self.cap_pci_cfg_info.cap.bytes().len();
match reg_idx {
r if (BAR0_REG as usize..(BAR0_REG + NUM_BAR_REGS) as usize).contains(&r) => {
#[allow(clippy::cast_possible_truncation)]
let bar_idx = (r - BAR0_REG as usize) as u8;
#[allow(clippy::cast_possible_truncation)]
let offset = offset as u8;
self.bars.write(bar_idx, offset, data);
None
}
r if r * 4 == self.msix_config_offset as usize => {
#[allow(clippy::cast_possible_truncation)]
let offset = offset as u8;
self.msix_config
.lock()
.expect("Poisoned lock")
.write_msg_ctl_register(offset, data);
self.configuration
.write_config_register(reg_idx, offset.into(), data);
None
}
r if cap_range.contains(&(r * 4 + u64_to_usize(offset))) => {
let offset = r * 4 + u64_to_usize(offset) - self.cap_pci_cfg_info.offset;
self.write_cap_pci_cfg(offset, data)
}
_ => {
self.configuration
.write_config_register(reg_idx, offset, data);
None
}
}
There was a problem hiding this comment.
Can we define 3 variables (or use functions/macros) and do if in_bar_range {} else if in_cap_pci_cfg_range {} else {}?
There was a problem hiding this comment.
done in the last commit
| if data.len() == 4 { | ||
| if bar.about_to_be_read { | ||
| data.copy_from_slice(bar.size.as_bytes()); | ||
| bar.about_to_be_read = false; |
There was a problem hiding this comment.
I'm not sure this is exactly equivalent as the current code, where we actually have 2 fields, the current programmed addr and the register value. When the guest writes all 1s, only those beyond the size mask are persisted and the reprogramming detection is not kicked. When the guest writes something else, then we store the actual value and detect if the BAR was moved. How does reprogram detection works here?
There was a problem hiding this comment.
The logic here is to always ignore writes (unless it is a size reading write). This makes the BAR relocation be ignored. If the driver will try to relocate it, it will write to the BAR (and this write will be ignored) and then it will read the addr back to validate the write. Since we ignore it, it will read the old addr and assume that the relocation failed. At least Linux kernel does this: https://elixir.bootlin.com/linux/v6.19.8/source/drivers/pci/setup-res.c#L107.
Basically I tried to minimize the logic here to only handle the sizing write since this is all we can currently support. The BAR relocation (or even resizing) can be implemented later (and it will be a bit different for virtio and vfio devices anyway)
There was a problem hiding this comment.
right, but won't we need to change the implementation in the future to allow relocations? Likely back to what it originally was? I'm not saying the existing code is perfect, but it was following what an actual device would do, rather than introducing extra variables (about_to_be_read) and states which are not part of the spec.
All we actually need is a write mask wmask to know which bits are actually writeable by the guest. The original bug is due to the fact that we never change back the config space if the relocation fails or is unsupported, which is what I think we should do.
There was a problem hiding this comment.
For relocation we will add relocating_to: Option<NoneZeroU32> and write to it (since it is only needed for 64bit BAR relocation, NonZeroU32 will work file since there will a at least 1 bit set in the value). Then add a couple of checks to the write function.
Old version was relying of BAR values being duplicated in the registers array an in the Vec<Bar> array. Since for VFIO there is no registers array, we use Bars to do it. This way all Bar handling is in one place and not spread all over the place
8b1b015 to
8050f34
Compare
dda34d8 to
a615239
Compare
| self.configuration.detect_bar_reprogramming(reg_idx, data) | ||
| } | ||
|
|
||
| fn move_bar(&mut self, old_base: u64, new_base: u64) -> Result<(), DeviceRelocationError> { |
There was a problem hiding this comment.
We did not get a dead code warning?
There was a problem hiding this comment.
And can this be merged with "pci: virtio: refactor: remove BAR handling from PciConfiguration"?
There was a problem hiding this comment.
I removed this commit in general as move_bar is actually a trait function
src/vmm/src/pci/configuration.rs
Outdated
| /// Bits 31:4: Base Address (read/write, writable bits depend on size) | ||
| #[derive(Debug, Default, Clone, Copy, Serialize, Deserialize)] | ||
| pub struct Bar { | ||
| /// Encoded address value of the register. |
There was a problem hiding this comment.
nit: This comment could clarify what 'encoded' means in this case (lower bit carrying information)
| /// Encoded size value of the register. | ||
| pub encoded_size: u32, | ||
| /// Indicator if the register was prepared to be read as the `size` instead of `addr` | ||
| pub about_to_be_read: bool, |
There was a problem hiding this comment.
Would size_read be more descriptive?
There was a problem hiding this comment.
For me size_read sounds like: "we did read of the size and here it is".
| if let Ok(value) = u32::read_from_bytes(data) | ||
| && value == 0xffff_ffff | ||
| { | ||
| assert!(offset == 0); |
There was a problem hiding this comment.
Should we assert here()? Can the guest trigger this?
There was a problem hiding this comment.
these read/write functions supposed to only be called by internal code. And that internal code will need to ensure that guest provided values make sense.
There was a problem hiding this comment.
does it? why are we adding so many assumptions in code so low in the stack?
src/vmm/src/pci/configuration.rs
Outdated
| assert!(offset == 0); | ||
| self.bars[bar_idx as usize].about_to_be_read = true; | ||
| } else { | ||
| // No BAR relocation/resizing support. Ignore write. |
There was a problem hiding this comment.
Can we add a comment saying that the PCIe spec doesn't offer any way to reject this, but in practice Linux checks if it succeeded?
| data.copy_from_slice(bar.encoded_addr.as_bytes()); | ||
| } | ||
| } else { | ||
| assert!(offset < 4); |
There was a problem hiding this comment.
Again, do these asserts here mean that a "wrong" guest read will trigger a crash?
There was a problem hiding this comment.
same as previous comment
src/vmm/src/pci/configuration.rs
Outdated
| registers: [u32; NUM_CONFIGURATION_REGISTERS], | ||
| writable_bits: [u32; NUM_CONFIGURATION_REGISTERS], // writable bits for each register. | ||
| bars: [PciBar; NUM_BAR_REGS], | ||
| bars: [PciBar; NUM_BAR_REGS as usize], |
There was a problem hiding this comment.
Can we avoid changing this type to minimize this diff? Otherwise do it in a separate patch, that patch is already quite long.
There was a problem hiding this comment.
split into multiple commits
| VIRTIO_BAR_INDEX, | ||
| virtio_pci_bar_addr, | ||
| CAPABILITY_BAR_SIZE, | ||
| false, |
There was a problem hiding this comment.
nit: Perhaps an enum type Prefetchable / Non-prefetchable would be more descriptive and avoid the comment above?
There was a problem hiding this comment.
I would use enum if there were more than 2 options, but here it does not seem like it would change much.
| /// BARs | ||
| pub bars: [Bar; 6], | ||
| } | ||
| impl Bars { |
There was a problem hiding this comment.
I would like this patch to get smaller so that it's easier to review and argue about what it's doing. Can we have a patch that adds the new implementation along with the tests (and tell clippy to ignore dead code temporarily if needed)? And then in future commits we swap to using it?
| } else { | ||
| self.configuration | ||
| .write_config_register(reg_idx, offset, data); | ||
| if BAR0_REG as usize <= reg_idx && reg_idx < (BAR0_REG + NUM_BAR_REGS) as usize { |
There was a problem hiding this comment.
So this overlaps with the code in PciConfigMmio::config_space_write()? The commit title only talks about PciConfiguration. I would prefer it to be more explicit about this overlap.
| } else { | ||
| self.configuration | ||
| .write_config_register(reg_idx, offset, data); | ||
| if BAR0_REG as usize <= reg_idx && reg_idx < (BAR0_REG + NUM_BAR_REGS) as usize { |
There was a problem hiding this comment.
Can we define 3 variables (or use functions/macros) and do if in_bar_range {} else if in_cap_pci_cfg_range {} else {}?
4e30a8d to
473e3ff
Compare
| @@ -550,10 +541,10 @@ mod tests { | |||
|
|
|||
| fn detect_bar_reprogramming( | |||
There was a problem hiding this comment.
this is not really about VFIO preparation. I can remove this in another PR if you want.
| self.configuration.detect_bar_reprogramming(reg_idx, data) | ||
| } | ||
|
|
||
| fn move_bar(&mut self, old_base: u64, new_base: u64) -> Result<(), DeviceRelocationError> { |
There was a problem hiding this comment.
And can this be merged with "pci: virtio: refactor: remove BAR handling from PciConfiguration"?
| cap_pci_cfg_info: VirtioPciCfgCapInfo::default(), | ||
| bars: Bars::default(), | ||
| msix_config, | ||
| msix_config_offset: 0, |
There was a problem hiding this comment.
I think from this name it's ambiguous whether this is a capability offset or a BAR offset for the MSI-x table.
There was a problem hiding this comment.
renamed to msix_config_cap_offset
| let base = reg_idx as usize * 4; | ||
| if base + offset as usize >= self.cap_pci_cfg_info.offset as usize | ||
| && base + offset as usize + data.len() | ||
| // Handle access to the header of the MsixCapability. The rest of the |
There was a problem hiding this comment.
As discussed offline I find this a bit weird so I'll wait to hear for more opinions on this.
There was a problem hiding this comment.
It's hard to follow as this comment now references outdated code. What do you find weird?
1754d4a to
7c53d11
Compare
For some reason we had 2 constants that we used interchangeably to specify the BAR for virtio device. Use the more reasonably named one and delete the old one. Also delete VIRTIO_SHM_BAR_INDEX since it is never used. No functional changes. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a new Bars type that handles basic interactions with PCI BAR registers. This type is separate from PciConfiguration and can be reused for future VFIO work. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Change BAR0_REG and NUM_BAR_REGS u8. Update PciConfiguration code and tests to use the new types. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Make VirtioPciDevice use the new Bars type instead of relying on PciConfiguration for BAR handling. This separates BAR management from PCI configuration space, making PciConfiguration unaware of BARs. The VirtioPciDevice now directly handles BAR reads/writes and uses Bars for address tracking instead of a bare bar_address field. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Previous commit created a separate `Bars` type to handle BARs region and made VirtioPciDevice type to use it instead of PciConfiguration to handle BARs region. This made BARs handling in PciConfiguration redundant, so this commit removes it. This also removes `detect_bar_reprogramming` support from PciConfiguration. But Firecracker does not support BAR relocation anyway, this does not affect functionality. The removal of `detect_bar_reprogramming` also required to remove the unit test for it in the pci/bus.rs. No functional change. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Following previous commits notion, move MsixConfig handling out of PciConfiguration into VirtioPciDevice. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
This constant is only used as u16, so change it's type from u8 to u16 to avoid unnecessary conversions. Also rename with `_IDX` to have similarity with `reg_idx` we use in PCI code. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
f229ae5 to
40c123b
Compare
Manciukic
left a comment
There was a problem hiding this comment.
This PR is very dense so I only managed to have a surface look at it. I have a few concerns in how we manage the Bars, how we changed the integer arithmetic to use smaller types which could overflow, and on all the asserts we sprinkled on guest-controlled inputs. I also need to understand the discussion around MSI/MSI-X and the overall goal of this PR.
| } | ||
| if base + u64_to_usize(offset) >= self.cap_pci_cfg_info.offset |
There was a problem hiding this comment.
| } | |
| if base + u64_to_usize(offset) >= self.cap_pci_cfg_info.offset | |
| } else if base + u64_to_usize(offset) >= self.cap_pci_cfg_info.offset |
There was a problem hiding this comment.
I cannot find the code snippet you refer to here.
| } else { | ||
| self.configuration | ||
| .write_config_register(reg_idx, offset, data); | ||
| if BAR0_REG as usize <= reg_idx && reg_idx < (BAR0_REG + NUM_BAR_REGS) as usize { |
There was a problem hiding this comment.
what I was thinking was something like
let cap_range = self.cap_pci_cfg_info.offset
..self.cap_pci_cfg_info.offset + self.cap_pci_cfg_info.cap.bytes().len();
match reg_idx {
r if (BAR0_REG as usize..(BAR0_REG + NUM_BAR_REGS) as usize).contains(&r) => {
#[allow(clippy::cast_possible_truncation)]
let bar_idx = (r - BAR0_REG as usize) as u8;
#[allow(clippy::cast_possible_truncation)]
let offset = offset as u8;
self.bars.write(bar_idx, offset, data);
None
}
r if r * 4 == self.msix_config_offset as usize => {
#[allow(clippy::cast_possible_truncation)]
let offset = offset as u8;
self.msix_config
.lock()
.expect("Poisoned lock")
.write_msg_ctl_register(offset, data);
self.configuration
.write_config_register(reg_idx, offset.into(), data);
None
}
r if cap_range.contains(&(r * 4 + u64_to_usize(offset))) => {
let offset = r * 4 + u64_to_usize(offset) - self.cap_pci_cfg_info.offset;
self.write_cap_pci_cfg(offset, data)
}
_ => {
self.configuration
.write_config_register(reg_idx, offset, data);
None
}
}
| if data.len() == 4 { | ||
| if bar.about_to_be_read { | ||
| data.copy_from_slice(bar.size.as_bytes()); | ||
| bar.about_to_be_read = false; |
There was a problem hiding this comment.
right, but won't we need to change the implementation in the future to allow relocations? Likely back to what it originally was? I'm not saying the existing code is perfect, but it was following what an actual device would do, rather than introducing extra variables (about_to_be_read) and states which are not part of the spec.
All we actually need is a write mask wmask to know which bits are actually writeable by the guest. The original bug is due to the fact that we never change back the config space if the relocation fails or is unsupported, which is what I think we should do.
| let base = reg_idx * 4; | ||
| let cap_start = self.cap_pci_cfg_info.offset; | ||
| let cap_end = | ||
| self.cap_pci_cfg_info.offset as usize + self.cap_pci_cfg_info.cap.bytes().len(); | ||
| let write_start = base + u16::from(offset); | ||
| let write_end = (base + u16::from(offset)) as usize + data.len(); | ||
| cap_start < write_start && write_end <= cap_end |
There was a problem hiding this comment.
we're mixing u16 and usize here which could lead to overflows. unlikely but not impossible. the previous code was correclty upcasting to usize
There was a problem hiding this comment.
type system does not allow mixing of types. Here the cap_start < write_start is done as u16 and write_start <= cap_end is in usize. The only reason we use usize here is just because .len() returns it. I don't see how anything can overflow. reg_idx is within 0..1024 (since 4K region only has 1024 regs). Offset is also 0..4096 since this is the size of the config region. I can add asserts that make sure these ranges hold or use checked_add().expect()(but this one is more ugly)
| let base = reg_idx * 4; | ||
| let cap_start = self.cap_pci_cfg_info.offset; | ||
| let cap_end = | ||
| self.cap_pci_cfg_info.offset as usize + self.cap_pci_cfg_info.cap.bytes().len(); | ||
| let write_start = base + u16::from(offset); | ||
| let write_end = (base + u16::from(offset)) as usize + data.len(); | ||
| cap_start < write_start && write_end <= cap_end |
There was a problem hiding this comment.
wasn't this <= in the original code? which is correct?
There was a problem hiding this comment.
good find. the >= always confuse me.
| if let Ok(value) = u32::read_from_bytes(data) | ||
| && value == 0xffff_ffff | ||
| { | ||
| assert!(offset == 0); |
There was a problem hiding this comment.
isn't this a guest-triggerable crash? In general I'm quite unconfortable with all these extra asserts we're adding.
There was a problem hiding this comment.
The code above validates all the guest inputs, so this should not be trigger-able.
In general, I think the lowers level accesses should have asserts to catch inadequate validation from upper layers. These asserts also serve as a last level of defense.
| if let Ok(value) = u32::read_from_bytes(data) | ||
| && value == 0xffff_ffff | ||
| { | ||
| assert!(offset == 0); |
There was a problem hiding this comment.
does it? why are we adding so many assumptions in code so low in the stack?
| let base = reg_idx as usize * 4; | ||
| if base + offset as usize >= self.cap_pci_cfg_info.offset as usize | ||
| && base + offset as usize + data.len() | ||
| // Handle access to the header of the MsixCapability. The rest of the |
There was a problem hiding this comment.
It's hard to follow as this comment now references outdated code. What do you find weird?
|
I don't know why it published again comments from my previous review. |
Move the comparisons into boolean vars to make dispatch code a bit more readable. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
40c123b to
e4efed3
Compare
Changes
Main change here is the detachment of BAR and MsixConfig handling from the
PciConfigurationtype.BARs andMsixConfigwill be used as separate components by theVFIO, so to prevent duplication, move them away fromPciConfiguration.Reason
Preparation for VFIO
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.