rbindlist: unprotect longestLevels inside the loop#7794
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7794 +/- ##
=======================================
Coverage 98.85% 98.85%
=======================================
Files 87 87
Lines 17137 17140 +3
=======================================
+ Hits 16941 16944 +3
Misses 196 196 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Generated via commit ba8a7a7 Download link for the artifact containing the test results: ↓ atime-results.zip
|
| test(2375.6, print(data.table(x=rep("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 1e6)), topn=1), output="1000000: ABCDEFGHIJKLM...", options=list(width=25, datatable.prettyprint.char=NULL)) | ||
|
|
||
| # rbindlist() used to put O(ncol(...)) > R_PPStackSize = 50000 items on the protect stack, #7793 | ||
| x = as.data.table(as.list(1:50001)) |
There was a problem hiding this comment.
Per ?Memory, max-ppsize can be set up to 500,000, and I'm not aware of a way to poll it (at a glance R_PPStackSize appears to be non-API)
Probably best to just point to ?Memory and mention here "this test assumes the default value of 50,000", which has been the default for 15+ years, but if it does change in the future, it would be good to know the reason 50,001 was chosen.
There was a problem hiding this comment.
When I start a fresh R session with R --max-ppsize=500000, I can query it with commandArgs(). Is there any other API safe version to set it?
So maybe something like
max_ppsize = local({
arg = grep("^--max-ppsize=", commandArgs(FALSE), value = TRUE)
if (length(arg)) {
v = suppressWarnings(as.integer(sub("^--max-ppsize=", "", tail(arg, 1L))))
if (!is.na(v)) return(v)
}
50000L
})and then guard the test with max_ppsize?
|
Does atime find a performance / memory improvement from the change? Does |
Co-Authored-By: Michael Chirico <chiricom@google.com>
ben-schwen
left a comment
There was a problem hiding this comment.
Change LGTM.
Maybe guard with max_ppsize as suggested in comment above
Co-Authored-By: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
* test case * Protect coercedForFactor deeper in the stack * Unprotect longestLevels inside the loop * NEWS entry * Keep listNames protected * Clarify test comment Co-Authored-By: Michael Chirico <chiricom@google.com> * Detect --max-ppsize Co-Authored-By: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> --------- Co-authored-by: Michael Chirico <chiricom@google.com> Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>


Similar to how other protected objects are managed inside the per-column loop (
cn,getAttrib(..., R_ClassSymbol)),marks,thisCol), keeplongestLevelsat a known position in the protection stack and unprotect at the end of the loop iteration when it reaches the top of the stack. This requires movingcoercedForFactordeeper in the protection stack, because it needs to remain protected between the loop iterations. As a result,nprotectremains independent from the input size.Fixes: #7793