Skip to content

removed usage of static buffers#7005

Merged
MichaelChirico merged 13 commits into
masterfrom
staticBufferRemoval
Jun 27, 2025
Merged

removed usage of static buffers#7005
MichaelChirico merged 13 commits into
masterfrom
staticBufferRemoval

Conversation

@badasahog
Copy link
Copy Markdown
Contributor

@badasahog badasahog commented May 21, 2025

Using a statically allocated buffer to avoid breaking lifetime limitations is unsafe. It's always better to pass the buffer in as a parameter.

It's also faster to do it this way, static storage duration is slow and can't be aliased with other memory nearly as easily.

@badasahog badasahog requested a review from aitap as a code owner May 21, 2025 15:19
@badasahog badasahog changed the title copied commit from local branch removed usage of static buffers May 21, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.69%. Comparing base (ab7c931) to head (89dee7d).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7005      +/-   ##
==========================================
- Coverage   98.69%   98.69%   -0.01%     
==========================================
  Files          79       79              
  Lines       14680    14678       -2     
==========================================
- Hits        14489    14487       -2     
  Misses        191      191              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2025

No obvious timing issues in HEAD=staticBufferRemoval
Comparison Plot

Generated via commit 89dee7d

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 45 seconds
Installing different package versions 38 seconds
Running and plotting the test cases 2 minutes and 26 seconds

@badasahog
Copy link
Copy Markdown
Contributor Author

@aitap I moved this to a local branch as requested. Can we get it merged?

@tdhock
Copy link
Copy Markdown
Member

tdhock commented May 21, 2025

this may have performance implications.
what kind of data would be able to show any performance differences of this pr?

@badasahog
Copy link
Copy Markdown
Contributor Author

@tkhock the performance gain would be tiny because it doesn't happen in a loop.

I still insist this is best practice, and severely minimizes the potential for bugs in the future while still being much simpler. Fewer lines.

@badasahog
Copy link
Copy Markdown
Contributor Author

badasahog commented May 26, 2025

Is this pr dead? I'm not sure of the status.

I have 4 outstanding PRs, none of them have comments from maintainers.

Is there something wrong with them? I'm happy to adjust if necessary.

@tdhock
Copy link
Copy Markdown
Member

tdhock commented May 26, 2025

removing static means new allocations for every function call, and maybe a performance hit (not benefit).
sometimes in data.table we want to do things which in other contexts may be considered "unsafe" and that is for a performance benefit.
the existing fread performance tests do not show any slow downs.

also it is normal to wait some time before code review.

@MichaelChirico
Copy link
Copy Markdown
Member

I would prefer Ivan to weigh in here, I would need to do a lot of background reading to review competently.

We have no review time SLO. For a volunteer project, a review period of ~1 week is pretty acceptable IMO. We do not meet this in general at the moment, something we are working to improve.

We are also a review team of primarily R authors -- changes like this touch on deeper aspects of C coding that are outside my (I won't speak directly for others here) comfort zone. They are very welcome improvements, but also take more time to review. Thank you for your contributions & for your patience.

@badasahog
Copy link
Copy Markdown
Contributor Author

@tdhock It doesn't mean new allocations, it's all on the stack. Because it's of a known size, it's part of the stack frame, it doesn't require fiddling with the frame pointer like alloca or VLAs.

If effectively just changes what part of memory is being used. It's always better to use the hot stack instead of global memory.

It's also better practice in this case, because it means that the function is effectively pure. No need to worry about overriding a previous buffer without noticing (this is something that would not get picked up by static analysis, because it's well defined behavior.)

@MichaelChirico
Copy link
Copy Markdown
Member

MichaelChirico commented May 27, 2025

Read a bit more carefully. I share Toby's concern about performance -- we have to think carefully about the tradeoff here since the static buffer is only allocated once, whereas this change entails allocation + '\0'-initialization at each invocation.

However, all of these helpers are only invoked in error/warning/verbose branches. Performance is not a concern there.

while still being much simpler. Fewer lines.

This part, I'm not so sure. It's a bit harder to reason about a static buffer, but the compound literal is a bit atypical and ugly, as well as being a C99 feature (I forget off the top of my head if we rely on C99 features elsewhere...) I don't think this changes my assessment though -- I think the change is good to go.

Note that @aitap just discussed the drawbacks of the static buffer approach very recently: #6987 (comment). With the change to a local buffer, we can probably make the verbose messages more helpful as well, as intimated there. I'll leave that for a different PR.

I will leave this for a few more days for Ivan to weigh in if he finds the time.

@badasahog
Copy link
Copy Markdown
Contributor Author

@MichaelChirico when you say c99 features, do you mean features that aren't in c89 or features that aren't in c11? I'm confident that this is future proofed all the way to c23.

@MichaelChirico
Copy link
Copy Markdown
Member

I guess it would be c89 at issue, yes, the page I linked gives references to c99, c11, c17 and c23.

@badasahog
Copy link
Copy Markdown
Contributor Author

badasahog commented May 27, 2025

This codebase already relies on features that aren't in c89
-fixed size integers
-restrict qualified pointers
-inline functions
-designated initializers
-data/code is desegregated
-integer division
-single line comments

@aitap
Copy link
Copy Markdown
Member

aitap commented May 28, 2025 via email

@badasahog
Copy link
Copy Markdown
Contributor Author

I don't have long standing or specific safety/style guidelines for the code.

What I will say is that I'm not a fan of the x=y; style, gives me no room to breathe. It's also not super consistent, so I would suggest a final decision be made to unify the style. I prefer x = y;.

@tdhock
Copy link
Copy Markdown
Member

tdhock commented Jun 3, 2025

the only data.table style guide I know of is https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md#coding-style

Copy link
Copy Markdown
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on Ivan's comment.

@MichaelChirico MichaelChirico merged commit 037bc89 into master Jun 27, 2025
12 checks passed
@MichaelChirico MichaelChirico deleted the staticBufferRemoval branch June 27, 2025 16:28
@MichaelChirico
Copy link
Copy Markdown
Member

This is producing new compiler warnings, e.g.:

freadR.h:27:51: note: in definition of macro ‘DTWARN’
   27 | #define DTWARN(...) warningsAreErrors ? halt__(1, __VA_ARGS__) : warning(__VA_ARGS__)
      |                                                   ^~~~~~~~~~~
fread.c:2766:71: warning: ISO C forbids empty initializer braces before C23 [-Wpedantic]
 2766 |           DTi + row1line, ncol, tt, strlim(skippedFooter, (char[500]) {}, 500));
      |                                                                       ^

Is there any workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants