@@ -157,6 +157,100 @@ VMs, which could be exploited as a side channel by an attacker inside the
157157microVM. Users that want to use ` virtio-pmem ` to share memory are encouraged to
158158carefully evaluate the security risk according to their threat model.
159159
160+ ### Limiting ` msync ` write bandwidth
161+
162+ When a guest issues a flush request to the ` virtio-pmem ` device (via the
163+ ` VIRTIO_PMEM_REQ_TYPE_FLUSH ` ), Firecracker calls ` msync(MS_SYNC) ` on the backing
164+ file to persist dirty pages to disk. A malicious guest can issue a high volume
165+ of flush requests, leading to excessive host I/O usage.
166+
167+ There are two ways to mitigate this:
168+
169+ #### Firecracker rate limiter
170+
171+ The ` virtio-pmem ` device supports a built-in rate limiter, identical to the one
172+ available for block devices. It throttles flush requests using two token
173+ buckets:
174+
175+ - ` bandwidth ` — limits the total number of bytes sent to the ` msync ` per refill
176+ interval. Each flush consumes tokens equal to the ** full backing file size** ,
177+ because ` msync ` is called over the entire mapped region. For example, with a
178+ 256 MiB backing file and ` size ` set to ` 268435456 ` (256 MiB), at most one
179+ flush is allowed per ` refill_time ` milliseconds.
180+ - ` ops ` — limits the number of ` msync ` calls per refill interval (after
181+ coalescing multiple flush requests within a single queue notification into one
182+ call).
183+
184+ The rate limiter can be configured at device creation time. The following
185+ example allows at most 1 flush per second for a 256 MiB backing file
186+ (` bandwidth.size ` = 256 MiB = 268435456 bytes), and at most 10 ` msync `
187+ operations per second:
188+
189+ ``` json
190+ "pmem" : [
191+ {
192+ "id" : " pmem0" ,
193+ "path_on_host" : " ./backing_file_256m" ,
194+ "rate_limiter" : {
195+ "bandwidth" : { "size" : 268435456 , "refill_time" : 1000 },
196+ "ops" : { "size" : 10 , "refill_time" : 1000 }
197+ }
198+ }
199+ ]
200+ ```
201+
202+ It can also be updated at runtime via the API:
203+
204+ ``` console
205+ curl --unix-socket $socket_location -i \
206+ -X PATCH 'http://localhost/pmem/pmem0' \
207+ -H 'Content-Type: application/json' \
208+ -d '{
209+ "id": "pmem0",
210+ "rate_limiter": {
211+ "bandwidth": { "size": 268435456, "refill_time": 1000 },
212+ "ops": { "size": 10, "refill_time": 1000 }
213+ }
214+ }'
215+ ```
216+
217+ > [ !NOTE]
218+ >
219+ > Since each flush always costs exactly one op and exactly ` file_size ` bytes,
220+ > the ` bandwidth ` and ` ops ` buckets are correlated: setting ` bandwidth.size ` to
221+ > ` file_size ` with a given ` refill_time ` is equivalent to setting ` ops.size ` to
222+ > ` 1 ` with the same ` refill_time ` — both allow one flush per interval. In
223+ > practice, configuring only one of the two buckets is sufficient. Use ` ops ` for
224+ > a simple "N flushes per interval" limit, or ` bandwidth ` if you want to express
225+ > the limit in terms of I/O throughput.
226+
227+ #### Cgroup v2 IO controller
228+
229+ Alternatively, the ** cgroup v2 IO controller** can throttle write bandwidth on
230+ the block device that hosts the ` virtio-pmem ` backing file:
231+
232+ ``` bash
233+ # Identify the block device MAJOR:MINOR for the backing file
234+ dev=$( stat -c ' %d' /path/to/backing_file)
235+ echo " $(( dev >> 8 )) :$(( dev & 0xff )) "
236+
237+ # Enable the io controller
238+ echo " +io" | sudo tee /sys/fs/cgroup/< vm_cgroup> /cgroup.subtree_control
239+
240+ # Limit write bandwidth (e.g. 10 MB/s) on device MAJOR:MINOR
241+ echo " MAJOR:MINOR wbps=10485760" | sudo tee /sys/fs/cgroup/< vm_cgroup> /io.max
242+ ```
243+
244+ > [ !NOTE]
245+ >
246+ > - This requires ** cgroup v2** with a filesystem that supports cgroup-aware
247+ > writeback (e.g. ext4, btrfs).
248+ > - The limit applies to all I/O from the cgroup to that device, not only
249+ > ` msync ` flushes.
250+ > - When using the [ Jailer] ( jailer.md ) , the Firecracker process is already
251+ > placed in a cgroup. You can configure ` io.max ` on that cgroup before
252+ > starting the microVM.
253+
160254## Snapshot support
161255
162256` virtio-pmem ` works with snapshot functionality of Firecracker. Snapshot will
0 commit comments