Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@

1. `unique()` returns a copy the case when `nrows(x) <= 1` instead of a mutable alias, [#5932](https://github.com/Rdatatable/data.table/pull/5932). This is consistent with existing `unique()` behavior when the input has no duplicates but more than one row. Thanks to @brookslogan for the report and @dshemetov for the fix.

2. `data.table` did not always expand ALTREP columns when assigning a full column by reference. This could result in the target column getting modified unintentionally if the next call to the data.table was a modification by reference of the source column. E.g. in `DT[, b := as.character(a)]` the string conversion gets deferred and subsequent modification of column `a` would also modify column `b`, [#5400](https://github.com/Rdatatable/data.table/issues/5400). Thanks to @aquasync for the report and Václav Tlapák for the PR.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific? "did not always", well, when?

Trying to put myself in the shoes of a user scanning the NEWS for bugs that might affect me, this one might seem "scary" without more clarity.


## NOTES

1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.
Expand Down
33 changes: 27 additions & 6 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -2270,15 +2270,36 @@ test(753.1, DT[,c("x1","x2"):=4:6, verbose = TRUE], data.table(a=letters[1:3],x=
output = "RHS for item 2 has been duplicated")
test(753.2, DT[2,x2:=7L], data.table(a=letters[1:3],x=3:1,x1=4:6,x2=c(4L,7L,6L),key="a"))
DT = data.table(a=letters[3:1],x=1:3,y=4:6)
test(754.1, DT[,c("x1","y1","x2"):=list(x,y)], error="Supplied 3 columns to be assigned 2 items. Please see NEWS for v1.12.2")
test(754.2, DT[,c("x1","y1","x2"):=list(x,y,x)], data.table(a=letters[3:1],x=1:3,y=4:6,x1=1:3,y1=4:6,x2=1:3))
test(754.01, DT[,c("x1","y1","x2"):=list(x,y)], error="Supplied 3 columns to be assigned 2 items. Please see NEWS for v1.12.2")
test(754.02, DT[,c("x1","y1","x2"):=list(x,y,x)], data.table(a=letters[3:1],x=1:3,y=4:6,x1=1:3,y1=4:6,x2=1:3))
# And non-recycling i.e. that a single column copy does copy the column
DT = data.table(a=1:3)
test(754.3, DT[,b:=a][1,a:=4L][2,b:=5L], data.table(a=INT(4,2,3),b=INT(1,5,3)))
test(754.4, DT[,b:=a][3,b:=6L], data.table(a=INT(4,2,3),b=INT(4,2,6)))
test(754.5, DT[,a:=as.character(a),verbose=TRUE], output="Direct plonk.*no copy")
test(754.03, DT[,b:=a][1,a:=4L][2,b:=5L], data.table(a=INT(4,2,3),b=INT(1,5,3)))
test(754.04, DT[,b:=a][3,b:=6L], data.table(a=INT(4,2,3),b=INT(4,2,6)))
test(754.05, DT[,a:=as.numeric(a),verbose=TRUE], output="Direct plonk.*no copy")
RHS = as.integer(DT$a)
test(754.6, DT[,a:=RHS,verbose=TRUE], output="RHS for item 1 has been duplicated")
test(754.06, DT[,a:=RHS,verbose=TRUE], output="RHS for item 1 has been duplicated")
# Expand ALTREPS in assign.c, #5400
# String conversion gets deferred
test(754.07, DT[,a:=as.character(a),verbose=TRUE], output="RHS for item 1 has been duplicated")
# Executing the code inside of test expands the ALTREP so we repeat the code
# in order to check the result after a further assignment
DT = data.table(a=1:3)
DT[,b:=as.character(a)]
DT[, a:=5L]
test(754.08, DT, data.table(a=5L, b=as.character(1:3)))
# This function returns an ALTREP wrapper if the input is at least length 64
Comment thread
MichaelChirico marked this conversation as resolved.
testFun = function(x) {
x[FALSE] = 1
x
}
DT = data.table(id=1:64, col1=0, col2=0)
test(754.09, DT[, col1 := testFun(col2), verbose = TRUE], output="RHS for item 1 has been duplicated")
DT = data.table(id=1:64, col1=0, col2=0)
DT[, col1 := testFun(col2)]
DT[, col2 := 999]
test(754.10, DT, data.table(id=1:64, col1=0, col2=999))
rm(testFun)

# Used to test warning on redundant by (#2282) but by=.EACHI has now superseded
DT = data.table(a=letters[1:3],b=rep(c("d","e"),each=3),x=1:6,key="a,b")
Expand Down
7 changes: 4 additions & 3 deletions src/assign.c
Original file line number Diff line number Diff line change
Expand Up @@ -498,11 +498,12 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
if (length(rows)==0 && targetlen==vlen && (vlen>0 || nrow==0)) {
if ( MAYBE_SHARED(thisvalue) || // set() protects the NAMED of atomic vectors from .Call setting arguments to 2 by wrapping with list
(TYPEOF(values)==VECSXP && i>LENGTH(values)-1) || // recycled RHS would have columns pointing to others, #185.
(TYPEOF(values)!=VECSXP && i>0) // assigning the same values to a second column. Have to ensure a copy #2540
(TYPEOF(values)!=VECSXP && i>0) || // assigning the same values to a second column. Have to ensure a copy #2540
ALTREP(thisvalue)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment here on why this test exists would be appropriate IMO

) {
if (verbose) {
Rprintf(_("RHS for item %d has been duplicated because NAMED==%d MAYBE_SHARED==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"),
i+1, NAMED(thisvalue), MAYBE_SHARED(thisvalue), length(values), length(cols));
Rprintf(_("RHS for item %d has been duplicated because NAMED==%d MAYBE_SHARED==%d ALTREP==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"),
i+1, NAMED(thisvalue), MAYBE_SHARED(thisvalue), ALTREP(thisvalue), length(values), length(cols));
}
thisvalue = copyAsPlain(thisvalue); // PROTECT not needed as assigned as element to protected list below.
} else {
Expand Down