Skip to content

updated bmerge() to support joins on complex columns with zero imaginary part, treating them as double#7085

Open
venom1204 wants to merge 4 commits into
masterfrom
issue6627
Open

updated bmerge() to support joins on complex columns with zero imaginary part, treating them as double#7085
venom1204 wants to merge 4 commits into
masterfrom
issue6627

Conversation

@venom1204
Copy link
Copy Markdown
Contributor

@venom1204 venom1204 commented Jun 23, 2025

closes #6627

in this pr I

  • Added checks in bmerge() to detect and handle complex types.
  • If all Im(x) == 0, column is coerced to double .
  • Errors clearly identify the offending column when imaginary parts are non-zero.

hi @tdhock @joshhwuu @MichaelChirico can you please review this when you have time.
thanks.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.69%. Comparing base (e18779a) to head (906d3af).
⚠️ Report is 344 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7085      +/-   ##
==========================================
- Coverage   98.69%   98.69%   -0.01%     
==========================================
  Files          79       79              
  Lines       14677    14692      +15     
==========================================
+ Hits        14486    14500      +14     
- Misses        191      192       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 23, 2025

  • HEAD=issue6627 slower P<0.001 for melt improved in #5054
    Comparison Plot

Generated via commit 906d3af

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 46 seconds
Installing different package versions 41 seconds
Running and plotting the test cases 2 minutes and 24 seconds

@tdhock
Copy link
Copy Markdown
Member

tdhock commented Jun 26, 2025

I do not use complex numbers so I don't really understand what is a typical use case, can you please explain?

@venom1204

This comment was marked as off-topic.

@tdhock
Copy link
Copy Markdown
Member

tdhock commented Jun 26, 2025

that is not a typical use case

@tdhock
Copy link
Copy Markdown
Member

tdhock commented Jun 26, 2025

you should add an example in an Rd file with a typical use case

@venom1204
Copy link
Copy Markdown
Contributor Author

I've included an example and a brief description in the .Rd file.
Could you please check if this meets the requirements?

Thanks!

Comment thread man/data.table.Rd
try(products_all[sales, on = .(id = product_id), nomatch = 0])

\dontshow{
rm(products_clean, sales, products_all)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have any rm() commands in dontshow in other Rd files, please remove.

Comment thread man/data.table.Rd
products_all = data.table(id = c(101+0i, 102+1i, 103+0i), name = c("widget", "gadget", "thingamajig"))

# The join fails because the entire 'id' column is checked first.
try(products_all[sales, on = .(id = product_id), nomatch = 0])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove.
examples should show typical use cases, not errors.

Comment thread man/data.table.Rd
Comment on lines +237 to +238
\item If a \code{complex} column contains values with only a zero imaginary part (e.g., \code{10+0i}), it is treated as a \code{double} for the join, allowing it to match with \code{integer} and \code{double} columns successfully.
\item If any value in a \code{complex} join column has a non-zero imaginary part (e.g., \code{10+2i}), the join will stop with an error, as there is no defined way to sort or match such a number against a real number.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this feature is desirable. For the new example using current master I get

> products_clean[sales, on = .(id = product_id), nomatch = 0]
Erreur dans bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
  typeof x.id (complex) != typeof i.product_id (integer)

which seems totally reasonable and actionable. If I want a join, I would need to convert the types to be the same.

So I would suggest closing this pr and opening a new one that clarifies the documentation. What do you think @jangorecki @MichaelChirico @ben-schwen @aitap ?
(I don't really use complex numbers so I don't understand if this is a typical or desirable use case, but I guess any user is capable of converting complex to another joinable type)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would not add examples of joining of complex columns into the manual. It is just too uncommon. Unit tests yes, NEWS entry yes, but manual examples not really.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for the detailed feedback.
I understand there are two key takeaways from the discussion:

  • Documentation: I will remove the examples from the .Rd file along with the rm() command, and I will open a new PR specifically for the documentation cleanup.
  • Feature Design: There is an ongoing higher-level discussion about whether data.table should automatically coerce complex to double for joins, or if it's better to retain the existing behavior and require the user to handle coercion explicitly.

Please let me know the final decision on the feature design and how you'd like me to proceed. Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ask @MichaelChirico for review because he filed the original issue.

@venom1204 venom1204 requested a review from tdhock July 11, 2025 01:12
@venom1204 venom1204 marked this pull request as ready for review July 11, 2025 19:14
@venom1204
Copy link
Copy Markdown
Contributor Author

Hi @MichaelChirico, could you please share the next steps for this PR so I can proceed towards closing it?
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bmerge() needs to handle mixed complex-(int,dbl,i64) joins explicitly

4 participants