Skip to content

Ensure FPS/CUR selected idxs are unique in the case of zero-score#224

Closed
jwa7 wants to merge 6 commits into
scikit-learn-contrib:mainfrom
jwa7:main
Closed

Ensure FPS/CUR selected idxs are unique in the case of zero-score#224
jwa7 wants to merge 6 commits into
scikit-learn-contrib:mainfrom
jwa7:main

Conversation

@jwa7

@jwa7 jwa7 commented Mar 26, 2024

Copy link
Copy Markdown
Collaborator

Attempting to fix #206

Hello!

I have encountered the issue raised by Alex in the above issue, whilst working with the equisolve wrapper for TensorMap-based sample/feature selection (i.e. in https://github.com/lab-cosmo/equisolve/blob/main/src/equisolve/numpy/sample_selection.py and co).

I have adapted Alex's example into a few unit tests for both FPS and CUR sample/feature selection, and attempted to fix it. However, there is something I'm not understanding. While the FPS tests now pass, there are a couple of (different) CUR ones that do not.

With this PR I was hoping to get some feedback/help from the skmatter dev team on this. Thanks! :)

Contributor (creator of PR) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

For Reviewer

  • CHANGELOG updated if important change?

📚 Documentation preview 📚: https://scikit-matter--224.org.readthedocs.build/en/224/

@rosecers

Copy link
Copy Markdown
Collaborator

Are you still looking for feedback on this? Good measure to tag one of us for input.

@jwa7

jwa7 commented Jun 20, 2024

Copy link
Copy Markdown
Collaborator Author

Hey! This isn't something I'm actively working on at the moment, but I'll be sure to ping you for feedback when I / we get back round to it :)

@mhellstr

Copy link
Copy Markdown

I had the problem that CUR sample selection would give duplicate selected indices, for example X.selected_idx_ ==
[ 801 936 1308 253 1000 480 183 486 303 977 43 366 734 243 363 88 859 440 798 398 709 263 796
383 214 654 854 508 865 384 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643
413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413
643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643
413 643 413 643 413 643 413 643]

I cannot comment on the validity of the code, but this branch solves this problem to instead give unique predictions like this, X.selected_idx_ ==
[ 801 936 1308 253 1000 480 183 486 303 977 43 366 734 243 363 88 859 440 798 398 709 263 796
383 214 654 854 508 865 384 413 643 216 555 408 564 892 716 673 99 107 386 137 55 1 499
368 390 359 218 237 130 530 661 439 311 318 542 669 830 268 208 215 903 418 855 994 664 631
879 199 306 869 258 884 1418 123 700 266 1282 608 519 351 389 5 652 553 257 934 24 1455 270
825 744 1114 470 757 572 1259 15]

@ceriottm

Copy link
Copy Markdown
Collaborator

Moved over to #265

@ceriottm ceriottm closed this Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zero scores result in repeated selection and wrong scores at least for FPS

4 participants