test(vsock): add latency test#5603
test(vsock): add latency test#5603aaron-ang wants to merge 3 commits intofirecracker-microvm:mainfrom
Conversation
90ff871 to
75e4ea8
Compare
|
hi @raduiliescu @Manciukic could you please take a look? |
|
Hey @aaron-ang, thanks for the contribution! I only had a very quick look, but I was wondering whether we need both |
|
Hi @aaron-ang, Just wanted to see if you had a chance to read our previous update? Let us know if you need anything more from us. Thank you! |
|
Hi @JamesC1305, apologies as I have been really busy the past few weeks. I will look into this within this week. |
75e4ea8 to
1a80bd9
Compare
|
My setup: I did some stats aggregation (ping-uds has slightly higher overall latency):
I think both should be kept. |
Manciukic
left a comment
There was a problem hiding this comment.
Overall LGTM, just a nit to reduce boilerplate code. I need to double check this works, I will kickoff our performance CI as well so we can run these tests
|
|
||
| @pytest.mark.nonci | ||
| @pytest.mark.parametrize("vcpus", [1, 2], ids=["1vcpu", "2vcpu"]) | ||
| def test_vsock_latency_g2h(uvm_plain_acpi, vcpus, metrics, bin_vsock_path): |
There was a problem hiding this comment.
nit: we can avoid some duped code by having the test setup in a separate fixture for all the tests in the file
@pytest.fixture
def vsock_uvm(uvm_plain_acpi, request):
"""Fixture to initialize a microVM with vsock device."""
vcpus = request.param if hasattr(request, "param") else 1
mem_size_mib = 1024
vm = uvm_plain_acpi
vm.spawn(log_level="Info", emit_metrics=True)
vm.basic_config(vcpu_count=vcpus, mem_size_mib=mem_size_mib)
vm.add_net_iface()
vm.api.vsock.put(vsock_id="vsock0", guest_cid=3, uds_path="/" + VSOCK_UDS_PATH)
vm.start()
vm.pin_threads(0)
return vm
then it can be used as
@pytest.mark.parametrize("vsock_uvm", [1, 2], indirect=True, ids=["1vcpu", "2vcpu"])
There was a problem hiding this comment.
@aaron-ang can you use the fixture here as well?
|
Perf test dry-run: https://buildkite.com/firecracker/performance-a-b-tests/builds/833 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5603 +/- ##
=======================================
Coverage 82.79% 82.79%
=======================================
Files 276 276
Lines 29764 29764
=======================================
Hits 24643 24643
Misses 5121 5121
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e7912cd to
dd13239
Compare
|
@Manciukic please review |
dd13239 to
aa6f57b
Compare
Manciukic
left a comment
There was a problem hiding this comment.
Just one nit but I haven't looked too deeply. I'll give it another go in our perf pipeline
|
|
||
| @pytest.mark.nonci | ||
| @pytest.mark.parametrize("vcpus", [1, 2], ids=["1vcpu", "2vcpu"]) | ||
| def test_vsock_latency_g2h(uvm_plain_acpi, vcpus, metrics, bin_vsock_path): |
There was a problem hiding this comment.
@aaron-ang can you use the fixture here as well?
|
Perf test dry run : https://buildkite.com/firecracker/performance-a-b-tests/builds/866/ |
aa6f57b to
e273e3b
Compare
|
@aaron-ang I kicked off CI again and there's another style issue. You can run this checks on your side before submitting running |
08f5641 to
0aebe4a
Compare
rebased with main and resolved git format error, but i'm still getting unrelated errors when running |
|
weird, in any case it passed on the CI. Maybe there's an issue with that command on the local environment. |
|
is this good to merge? |
Manciukic
left a comment
There was a problem hiding this comment.
sorry for the delay! Overall LGTM, just a few more ironing to get it in good shape
| long long end_us = end.tv_sec * 1000000LL + end.tv_nsec / 1000; | ||
| long long rtt_us = end_us - start_us; | ||
|
|
||
| printf("rtt=%.3f us seq=%d\n", (double)rtt_us, seq); |
There was a problem hiding this comment.
We can just write the integer here as it'd always be .000.
| printf("rtt=%.3f us seq=%d\n", (double)rtt_us, seq); | |
| printf("rtt=%lld us seq=%d\n", rtt_us, seq); |
| long long end_us = end.tv_sec * 1000000LL + end.tv_nsec / 1000; | ||
| long long rtt_us = end_us - start_us; | ||
|
|
||
| printf("rtt=%.3f us seq=%d\n", (double)rtt_us, seq); |
There was a problem hiding this comment.
ditto
| printf("rtt=%.3f us seq=%d\n", (double)rtt_us, seq); | |
| printf("rtt=%lld us seq=%d\n", rtt_us, seq); |
There was a problem hiding this comment.
these changes are unrelated. Could you move them to another commit? Also, if we could share some code between the functional and the performance test, it could be interesting.
There was a problem hiding this comment.
split the functional test change to a new commit and moved the helper for the fixture to utils_vsock.py
| f"/tmp/vsock_helper ping 2 {ECHO_SERVER_PORT} {requests_per_round}" | ||
| ) | ||
| samples.extend(consume_vsock_ping_output(ping_output)) | ||
|
|
There was a problem hiding this comment.
we should verify we actually got the expected number of records, just in case
assert len(samples) == rounds * requests_per_round, (
f"Expected {rounds * requests_per_round} samples, got {len(samples)}"
)
| check=True, | ||
| ) | ||
| samples.extend(consume_vsock_ping_output(result.stdout)) | ||
|
|
There was a problem hiding this comment.
ditto
assert len(samples) == rounds * requests_per_round, (
f"Expected {rounds * requests_per_round} samples, got {len(samples)}"
)
232274a to
97c26aa
Compare
thanks for the comments, i've addressed them. please take a look :) |
| for _ in range(rounds): | ||
| _, ping_output, _ = vm.ssh.check_output( | ||
| f"/tmp/vsock_helper ping 2 {ECHO_SERVER_PORT} {requests_per_round}" | ||
| ) |
There was a problem hiding this comment.
I noticed that there's a spike in latency at the beginning of every round. On my test instance (c7i.metal-24xl) the median was ~50us, but every 30th sample was ~1000us. Perhaps we can add an extra request at the beginning of the sample that gets excluded to negate this warmup?
This is for both h2g and g2h
There was a problem hiding this comment.
implemented by adding 1 to requests_per_round then filtering out the first sample when adding it so samples list.
| for _ in range(rounds): | ||
| result = subprocess.run( | ||
| [ | ||
| bin_vsock_path, | ||
| "ping-uds", | ||
| uds_path, | ||
| str(ECHO_SERVER_PORT), | ||
| str(requests_per_round), | ||
| ], | ||
| capture_output=True, | ||
| text=True, | ||
| check=True, | ||
| ) | ||
| samples.extend(consume_vsock_ping_output(result.stdout)) |
There was a problem hiding this comment.
Add ping and ping-uds RTT commands to vsock_helper, plus g2h and h2g latency tests that drop the first sample per round to negate cold-start warmup spikes. Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>
Use shared boot_vsock_vm helper for fixture setup. Add functional tests for vsock uds path override on snapshot restore. Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>
97c26aa to
577bbd3
Compare



Create simple ping latency test in vsock_helper host tools
Changes
Close #2353.
Reason
vsock_helper.cwithpingandping-udscommands that emit per‑request RTTstest_vsock.pythat exercise guest -> host and host -> guest paths.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.