Simplified and Extended "Updating by Reference" Section in Joins Vignette#6847
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6847 +/- ##
=======================================
Coverage 98.69% 98.69%
=======================================
Files 79 79
Lines 14677 14677
=======================================
Hits 14486 14486
Misses 191 191 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I think it is best to ping author of this vignette. I haven't had yet opportunity to read the original version. |
| - The on = .(id = product_id) ensures that updates happen based on matching IDs. | ||
| - This method modifies Products in place, avoiding unnecessary copies. | ||
|
|
||
| 2) If we need to get the latest price and date (instead of all matches), we can still use := efficiently: |
There was a problem hiding this comment.
I'm not sure we've cleanly explained the difference between this join and the previous one. The only difference I see is tail() vs. last(), am I missing something?
IIUC the difference between tail() and last() would be that last() can skip NA values, right?
| - We add a new `last_updated` column to track when the price was last changed. | ||
| - The `by = .EACHI` ensures that the `tail` function is applied for each product in `ProductPriceHistory`. | ||
| #### Let's update our `Products` table with the latest price from `ProductPriceHistory`: | ||
| ```{r Simple One-to-One Update} |
There was a problem hiding this comment.
are these valid {knitr} chunk names? regardless, please use machine-readable names (a la https://style.tidyverse.org/files.html)
|
@lucasmation @MichaelChirico thanks. |
|
@venom1204, what is the update on this, have you tested the code . |
|
@MichaelChirico I’ve minimized the changes I introduced while ensuring they contain sufficient information and remain easy to understand. Could you review them and let me know if any further simplification is needed, or if they are good as they are? |
|
@iagogv3 I’ve incorporated your suggestions into my latest version. Could you review the updates and let me know if any further improvements are needed? Your feedback would be greatly appreciated! |
|
cc @lucasmation, could you take a look at the updated vignette and see if it helps your understanding? |
| `:=`(price = last(i.price), last_updated = last(i.date)), | ||
| by = .EACHI] | ||
| ``` | ||
| - `by = .EACHI` groups by i's rows (1 group per Products row). |
There was a problem hiding this comment.
Wouldn't be 1 group per ProductPriceHistory row?
| Products[ProductPriceHistory, on = .(id = product_id), | ||
| (my_var_name) := i.price] | ||
| ``` | ||
| - This approach allows flexibility in specifying columns programmatically. |
There was a problem hiding this comment.
The examples in https://stackoverflow.com/a/44592473 include what I believe is another interesting point, that the on = option may also depend on the my_var_name (on = paste0(my_var_name, "==from"))
| update_cols <- intersect(c("price", "category", "stock"), names(ProductPriceHistory)) | ||
|
|
||
| for (col in update_cols) { | ||
| Products[ProductPriceHistory, on = .(id = product_id), (col) := get(paste0("i.", col))]} |
There was a problem hiding this comment.
Here may be better to use env than get as explained in https://rdatatable.gitlab.io/data.table/articles/datatable-programming.html#retired-interfaces
|
Thank you! @venom1204 I added some comments inline. |
|
I’ve updated the entire section accordingly. Apologies for being away for a while — I was engaged elsewhere. |
MichaelChirico
left a comment
There was a problem hiding this comment.
OK, I think this is in a "good enough" state now.
|
@MichaelChirico Sorry to chime in - I don't know whether it escaped notice (possibly down to my hopeless github skills) but I'd commented that the
Mentioned at #6846 (comment) and the benchmark at #6997 showing a |
could you file a new issue & ideally a follow-up PR? thank you! |
Simplified and extended the Updating by Reference section in the Joins in data.table vignette.
closes #6846
in this i
@jangorecki can you please review this when you have time and suggest me if any changes are needed
thank you