removed usage of static buffers#7005
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7005 +/- ##
==========================================
- Coverage 98.69% 98.69% -0.01%
==========================================
Files 79 79
Lines 14680 14678 -2
==========================================
- Hits 14489 14487 -2
Misses 191 191 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
No obvious timing issues in HEAD=staticBufferRemoval Generated via commit 89dee7d Download link for the artifact containing the test results: ↓ atime-results.zip
|
|
@aitap I moved this to a local branch as requested. Can we get it merged? |
|
this may have performance implications. |
|
@tkhock the performance gain would be tiny because it doesn't happen in a loop. I still insist this is best practice, and severely minimizes the potential for bugs in the future while still being much simpler. Fewer lines. |
|
Is this pr dead? I'm not sure of the status. I have 4 outstanding PRs, none of them have comments from maintainers. Is there something wrong with them? I'm happy to adjust if necessary. |
|
removing static means new allocations for every function call, and maybe a performance hit (not benefit). also it is normal to wait some time before code review. |
|
I would prefer Ivan to weigh in here, I would need to do a lot of background reading to review competently. We have no review time SLO. For a volunteer project, a review period of ~1 week is pretty acceptable IMO. We do not meet this in general at the moment, something we are working to improve. We are also a review team of primarily R authors -- changes like this touch on deeper aspects of C coding that are outside my (I won't speak directly for others here) comfort zone. They are very welcome improvements, but also take more time to review. Thank you for your contributions & for your patience. |
|
@tdhock It doesn't mean new allocations, it's all on the stack. Because it's of a known size, it's part of the stack frame, it doesn't require fiddling with the frame pointer like If effectively just changes what part of memory is being used. It's always better to use the hot stack instead of global memory. It's also better practice in this case, because it means that the function is effectively pure. No need to worry about overriding a previous buffer without noticing (this is something that would not get picked up by static analysis, because it's well defined behavior.) |
|
Read a bit more carefully. I share Toby's concern about performance -- we have to think carefully about the tradeoff here since the static buffer is only allocated once, whereas this change entails allocation + However, all of these helpers are only invoked in error/warning/
This part, I'm not so sure. It's a bit harder to reason about a static buffer, but the compound literal is a bit atypical and ugly, as well as being a C99 feature (I forget off the top of my head if we rely on C99 features elsewhere...) I don't think this changes my assessment though -- I think the change is good to go. Note that @aitap just discussed the drawbacks of the static buffer approach very recently: #6987 (comment). With the change to a local buffer, we can probably make the verbose messages more helpful as well, as intimated there. I'll leave that for a different PR. I will leave this for a few more days for Ivan to weigh in if he finds the time. |
|
@MichaelChirico when you say c99 features, do you mean features that aren't in c89 or features that aren't in c11? I'm confident that this is future proofed all the way to c23. |
|
I guess it would be c89 at issue, yes, the page I linked gives references to c99, c11, c17 and c23. |
|
This codebase already relies on features that aren't in c89 |
|
Sorry, no time to log in, writing this at zero dark o'clock between
other commitments. To compare: R itself has an internal function
EncodeString() intended for similar use in printf()-like calls, and
yes, it has bit them too:
<r-devel/r-project-sprint-2023#65>)
@badasahog is correct that the allocation is almost free (just subtract
from the stack pointer) and faster to access because stack is already
in the cache. Having to initialise the buffers that will be overwritten
anyway is not ideal, but we won't measure any changes in performance
due to this. The drawback in slightly increased stack usage is not
noticeable either because R is already very stack-hungry.
The only two things I really dislike is (1) over-allocating the space
for the strlim() argument and (2) having to spell out the buffer size in
multiple places, which is probably what the original author of strlim()
and friends was trying to avoid. @badasahog is also correct that this
way we can get compiler warnings about mismatching buffer sizes. If the
buffer argument is declared as pointer (char* and size_t instead of
char[static 500]), there's no compile-time check for overflow. So
that's a trade-off between the code being "nice to read" (or at least
what we're used to reading) and "following best practices" (according
to @badasahog's style guide. Would you mind sharing it, or your overall
plans for the code base?).
None of my dislikes really matter. This code works either way. The
garbage fires are elsewhere.
|
|
I don't have long standing or specific safety/style guidelines for the code. What I will say is that I'm not a fan of the |
|
the only data.table style guide I know of is https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md#coding-style |
MichaelChirico
left a comment
There was a problem hiding this comment.
Approving based on Ivan's comment.
|
This is producing new compiler warnings, e.g.: Is there any workaround? |

Using a statically allocated buffer to avoid breaking lifetime limitations is unsafe. It's always better to pass the buffer in as a parameter.
It's also faster to do it this way, static storage duration is slow and can't be aliased with other memory nearly as easily.