Skip to content

Use lsjson for searches#551

Merged
JoeZiminski merged 17 commits into
mainfrom
use_lsjson_for_searches
Jul 30, 2025
Merged

Use lsjson for searches#551
JoeZiminski merged 17 commits into
mainfrom
use_lsjson_for_searches

Conversation

@JoeZiminski
Copy link
Copy Markdown
Member

@JoeZiminski JoeZiminski commented Jul 14, 2025

In #503 @cs7-shrey introduced a really nice way of performing searchers for files or folders to the central drive. This leverages rclone directly, meaning we can use one function for every type of search, significantly reducing code duplication.

This PR introduces makes this the default method by which datashuttle performs searches to the local or remote. Using lsjson to get all file and folders in the local / remote path, these are then filtered on our end. It is very difficult to use rlcone's --include or --filter or search for folders, so this way of performing the filtering was chosen. This could be slightly slower, but in general these projects are not going to be too huge, and the alternative to use rclone filtering directly would end up very convoluted and error-prone due to the folder-searching issue.

This PR also adds some tests to file transfer to cover a few additional cases, mixing wildcards and the search keywords such as all_sub.

This is a backend change so no docs changed are required.

@JoeZiminski JoeZiminski requested a review from cs7-shrey July 18, 2025 12:35
Copy link
Copy Markdown
Collaborator

@cs7-shrey cs7-shrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @JoeZiminski this looks good. I have added comments for some minor changes.

Comment thread datashuttle/utils/folders.py Outdated
Comment thread tests/tests_transfers/local_filesystem/test_transfer_special_arguments.py Outdated
Comment thread tests/tests_transfers/ssh/test_ssh_transfer.py Outdated
Comment on lines +95 to +103
pathtable = pathtable[
pathtable["parent_sub"]
.fillna("")
.apply(lambda x: fnmatch.fnmatch(x, "*date*"))
]

pathtable = pathtable[
pathtable["parent_datatype"].apply(lambda x: x == "funcimg")
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think if we could move this logic to get_expected_transferred_paths, would it be better? or would that complicate things inside that function?

Copy link
Copy Markdown
Member Author

@JoeZiminski JoeZiminski Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a tough one @cs7-shrey , it would be nice to centralise the logic but it would make the tests a little harder to follow and the signature might be slightly complex if I have understood correctly e.g. self.get_expected_transferred_paths(pathtable, "parent_sub", lambda x: fnmatch.fmatch(x, "*date")). In this case because the benefits of centralising logic are less important in tests, we could leave as-is for readability?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's totally fine, it's a very straightforward filtering so i think we can leave it as it is.

Comment thread tests/tests_transfers/local_filesystem/test_transfer_special_arguments.py Outdated
@JoeZiminski JoeZiminski requested a review from cs7-shrey July 22, 2025 23:01
Copy link
Copy Markdown
Collaborator

@cs7-shrey cs7-shrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @JoeZiminski, just one doubt, do we delete these functions or are we keeping them for now just to be safe?

@JoeZiminski
Copy link
Copy Markdown
Member Author

Hey @cs7-shrey good catch! Cheers I removed them, there is also a neat package vulture that can find unused functions / variables in your code, it couldn't find anything further (except a load of false positives for TUI-related things) and so I think that should be it.

@JoeZiminski JoeZiminski merged commit 22c9beb into main Jul 30, 2025
16 checks passed
cs7-shrey pushed a commit to cs7-shrey/datashuttle that referenced this pull request Jul 30, 2025
* Add get_rclone_config_name_local and rename get_rclone_config_name to get_rclone_config_name_central.

* Add new transfer function.

* Fix rename in tests.

* Tidy up the search functions.

* Fix circular import.

* Adding more tests and fixing an edge case.

* Add wildcards to local filesystem transfer tests.

* Added tests to ssh.

* Remove unused function.

* Move teardown to fix tests on macos.

* Revert get_rclone_config_name_central name change.

* Add documentation to tests.

* Remove unecessary sorted.

* Fix cyclical import.

* Remove unecessary wildcard.

* Remove unused functions.
cs7-shrey pushed a commit that referenced this pull request Aug 1, 2025
* Add get_rclone_config_name_local and rename get_rclone_config_name to get_rclone_config_name_central.

* Add new transfer function.

* Fix rename in tests.

* Tidy up the search functions.

* Fix circular import.

* Adding more tests and fixing an edge case.

* Add wildcards to local filesystem transfer tests.

* Added tests to ssh.

* Remove unused function.

* Move teardown to fix tests on macos.

* Revert get_rclone_config_name_central name change.

* Add documentation to tests.

* Remove unecessary sorted.

* Fix cyclical import.

* Remove unecessary wildcard.

* Remove unused functions.
@JoeZiminski JoeZiminski deleted the use_lsjson_for_searches branch November 18, 2025 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants