Tracing tests, fixes, and optimizations#8223
Merged
rustyrussell merged 10 commits intoApr 23, 2025
Merged
Conversation
e60e873 to
ab4d1be
Compare
cdecker
approved these changes
Apr 20, 2025
| dev_trace_file, (unsigned)getpid()); | ||
| trace_to_file = fopen(fname, "a+"); | ||
| if (!trace_to_file) | ||
| err(1, "Opening DEV_CLN_TRACE_FILE %s", fname); |
Member
There was a problem hiding this comment.
Bike shedding as usual, but should we adopt CLN_ as a common prefix for our envvars going forward? It's a minor thing, but could simplify things a bit going forward.
Great addition of a trace file by the way, very useful for simple setups.
Contributor
Author
There was a problem hiding this comment.
Hey, trace file was your idea! And I agree with renaming env vars.
With an average runtime of 18.7674, this implies 1876ns per trace, which is far in excess of the 370ns claimed in doc/developers-guide/tracing-cln-performance.md. We also add a tag in there, so we measure that! Results on my laptop: real 0m18.524000-19.100000(18.7674+/-0.21)s user 0m16.171000-16.833000(16.424+/-0.26)s sys 0m2.259000-2.400000(2.337+/-0.059)s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I added this debugging because the next test revealed a mismatch, so I wanted to see where it was happening. The comment in lightningd suggests it's possible, but I can't see any code which suspends in the lightningd io_loop, so I cannot see how this is triggered. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Suggested-by: Christian Decker Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Testing parenting handling revealed several issues: 1. By calling "trace_span_start" when CLN_TRACEPARENT is set produces a bogus entry, for which the span_id is overwritten so we never end it. 2. We don't need to close the remote parent when we close the first child: in fact, this causes the remaining traces to be detached from the parent! 3. Suspension should return current to the parent, not to NULL. Now the traces balance as we expect. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. trace_span_start() is always called with a string literal, so no copy needed (and we can use a macro to enforce this). 2. trace_span_tag() name and value are always longer-lived than the span, so no need to copy these either. Before: real 0m18.524000-19.100000(18.7674+/-0.21)s user 0m16.171000-16.833000(16.424+/-0.26)s sys 0m2.259000-2.400000(2.337+/-0.059)s After: real 0m16.421000-18.407000(17.8128+/-0.72)s user 0m14.242000-16.041000(15.5382+/-0.67)s sys 0m2.179000-2.363000(2.273+/-0.061)s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Avoids allocations. Also assume that name and value parameters outlive the trace span, so don't copy. Before: real 0m16.421000-18.407000(17.8128+/-0.72)s user 0m14.242000-16.041000(15.5382+/-0.67)s sys 0m2.179000-2.363000(2.273+/-0.061)s After: real 0m13.441000-14.592000(14.2686+/-0.43)s user 0m11.265000-12.289000(11.9626+/-0.37)s sys 0m2.175000-2.381000(2.3048+/-0.072)s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's an EBPF limit anyway, so stick with a 512-byte buffer. This brings us back to 621ns per trace: Before: real 0m13.441000-14.592000(14.2686+/-0.43)s user 0m11.265000-12.289000(11.9626+/-0.37)s sys 0m2.175000-2.381000(2.3048+/-0.072)s After: real 0m5.819000-6.472000(6.2064+/-0.26)s user 0m3.779000-4.101000(3.956+/-0.12)s sys 0m2.040000-2.431000(2.2496+/-0.15)s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is much faster to give 64 bits of data, and we don't need cryptographic randomness. This brings us back to 413ns per trace. Before: real 0m5.819000-6.472000(6.2064+/-0.26)s user 0m3.779000-4.101000(3.956+/-0.12)s sys 0m2.040000-2.431000(2.2496+/-0.15)s After: real 0m3.981000-4.247000(4.1276+/-0.11)s user 0m3.979000-4.245000(4.126+/-0.11)s sys 0m0.000000-0.002000(0.001+/-0.00063)s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: lightingd: trimmed overhead of tracing infrastructure.
We don't ever actually close the remote span (we don't have its key, after all), and we keep a pointer to the parent. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't happen very often, but can with autoclean. However, we rarely traverse to the end, since we always expect to find what we're looking for, and we fill from the front. So even a large array (unless it's used) is fine. Subtle: when we injected a parent, we used "active_spans" as the (arbitrary) key. That can now change with reallocation, and so if that memory were reused we could have a key clash. So we use "&active_spans" which doesn't change. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ab4d1be to
68da93d
Compare
Contributor
Author
|
Trivial rebase, and replaced DEV_CLN_TRACE_FILE with CLN_DEV_TRACE_FILE as per @cdecker suggestion. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Recommended by @cdecker , tests were a great idea!