virtio-pmem: fixes and improvements#5789
virtio-pmem: fixes and improvements#5789ShadowCurse wants to merge 7 commits intofirecracker-microvm:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5789 +/- ##
==========================================
- Coverage 83.07% 82.88% -0.20%
==========================================
Files 275 275
Lines 29462 29586 +124
==========================================
+ Hits 24476 24522 +46
- Misses 4986 5064 +78
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
46cccd5 to
813efa7
Compare
d140771 to
e48ab2a
Compare
e4ccee4 to
24441ed
Compare
docs/pmem.md
Outdated
| available for block devices. It throttles flush requests using two token | ||
| buckets: | ||
|
|
||
| - `bandwidth` — limits the total bytes flushed per refill interval. Each flush |
There was a problem hiding this comment.
the very first sentence is incorrect as we don't know how much each msync is going to flush. We should make it more clear that it applies the full file size at all times. Maybe we should start with that or not have a bw limiter at all for this.
Maybe more clear would be: "bandwidth — consumes tokens equal to the full backing file size for each msync call, effectively limiting how often dirty pages can be flushed to disk."
There was a problem hiding this comment.
Updated the first sentence a bit.
There was a problem hiding this comment.
what value does bandwidth actually add to the users (compared to ops) if it's imprecise? How would users know how to configure it properly?
There was a problem hiding this comment.
ok, I saw the guidance we give in the doc, but still not entirely convinced bandwidth adds a value
There was a problem hiding this comment.
our default rate-limiter configuration scheme is to give ability to set both (e.g. we use RateLimiterConfig everywhere) so at least for consistency it makes sense. Also I think bandwidth maybe makes more sense since figuring out how much requests guest sends (for any device at this point) is a questionable business, but bandwidth is something people is more used to thinking about.
There was a problem hiding this comment.
Yes, uniformity here is an argument for me. Users can choose whatever they are more comfortable with.
|
|
||
| // Rate limit: consume 1 op and file_len bytes for the coalesced msync. | ||
| // If the rate limiter is blocked, defer processing until the timer fires. | ||
| if !self.rate_limiter.consume(1, TokenType::Ops) { |
There was a problem hiding this comment.
we're consuming tokens before even knowing if there's an item in the queue to process
There was a problem hiding this comment.
Considering there is only one valid type of request, I think this is fine. Valid guests will only notify if there is something for us to process.
There was a problem hiding this comment.
FC notifies on resume irrespectively on whether there's a request or not in the queue
There was a problem hiding this comment.
ok, moved these checks past the queue processing step, before the notification.
a0aae35 to
a9a3d01
Compare
| - name: body | ||
| in: body | ||
| description: Pmem rate limiter properties | ||
| required: true |
There was a problem hiding this comment.
does it mean that users can't remove the rate limiter later on? Other devices seem to allow that.
There was a problem hiding this comment.
What do you mean? The description here seems to be same as for other devices. Pmem logic for updating rate-limiter is same as for other devices.
There was a problem hiding this comment.
What I mean is, for example, block doesn't make rate limiter required:
PartialDrive:
type: object
required:
- drive_id
properties:
drive_id:
type: string
path_on_host:
type: string
description:
Host level path for the guest drive.
This field is optional for virtio-block config and should be omitted for vhost-user-block configuration.
rate_limiter:
$ref: "#/definitions/RateLimiter"while pmem does:
PartialPmem:
type: object
description:
Defines a partial pmem device structure, used to update the rate limiter
for that device, after microvm start.
required:
- id
- rate_limiter <==== here
properties:
id:
type: string
rate_limiter:
$ref: "#/definitions/RateLimiter"There was a problem hiding this comment.
oh, this is simply because rate-limiter is the only thing that can be changed. There is no reason to PATCH it if not to change the rate-limiter. For other devices I was looking at the ...UpdateConfig types and since there are always a couple things that can change there it makes sense to make them optional. Do you think we should allow empty PATCH requests for pmem?
There was a problem hiding this comment.
I'm just thinking it could be the only way to disable the rate limiter later on. Not sure if it's a common use case though. Would probably make sense to keep it on par with other devices in that sense.
There was a problem hiding this comment.
ok, made it optional
| } | ||
| } | ||
|
|
||
| pub(crate) fn parse_patch_pmem( |
There was a problem hiding this comment.
we seem to have unittests for parse_put_pmem, but not for parse_patch_pmem
docs/pmem.md
Outdated
| available for block devices. It throttles flush requests using two token | ||
| buckets: | ||
|
|
||
| - `bandwidth` — limits the total bytes flushed per refill interval. Each flush |
There was a problem hiding this comment.
ok, I saw the guidance we give in the doc, but still not entirely convinced bandwidth adds a value
CHANGELOG.md
Outdated
| information can be found in the | ||
| [docs](docs/vsock.md/#unix-domain-socket-renaming). | ||
| - [#5789](https://github.com/firecracker-microvm/firecracker/pull/5789): Add | ||
| rate-limiter support to virtio-pmem devie to allow control over I/O bandwidth |
2c56730 to
eb02623
Compare
Head and status descriptors must be 4 bytes long by the spec. Add validation for this. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
There is only one type of request virto-pmem accepts, but guest still can issue many of them and Firecracker will need to process it all. Instead of doing one `msync` per request, it is less resource intensive to do it once on the first valid descriptor and then duplicate the result to other descriptors. This is safe since the guest will only know the result of the execution after Firecracker will signal it, which will only happen after all descriptors are processed. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add rate-limiter support to the virtio-pmem device to allow users to configure limits of the I/O bandwidth generated by the `msync` call in the device which could be triggered by the guest FLUSH requests. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a section in the pmem.md file describing a way of limiting I/O usage of `msync` calls from a virtio-pmem device. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Mention new rate-limiter API for the virtio-pmem device in the CHANGELOG.md Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Fix incorrect NOTE section formatting in memory usage section Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Update device-api.md with rate-limiter for virtio-pmem. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Changes
lenfield to be 4 since the code expects thismsyncresult for better efficiency in case multiple flush requests are presented at oncerate-limitersupportReason
Edge case handling and addition of missing features
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.