Use lsjson for searches#551
Conversation
… get_rclone_config_name_central.
…-unit/datashuttle into use_lsjson_for_searches
cs7-shrey
left a comment
There was a problem hiding this comment.
Hey @JoeZiminski this looks good. I have added comments for some minor changes.
| pathtable = pathtable[ | ||
| pathtable["parent_sub"] | ||
| .fillna("") | ||
| .apply(lambda x: fnmatch.fnmatch(x, "*date*")) | ||
| ] | ||
|
|
||
| pathtable = pathtable[ | ||
| pathtable["parent_datatype"].apply(lambda x: x == "funcimg") | ||
| ] |
There was a problem hiding this comment.
do you think if we could move this logic to get_expected_transferred_paths, would it be better? or would that complicate things inside that function?
There was a problem hiding this comment.
Its a tough one @cs7-shrey , it would be nice to centralise the logic but it would make the tests a little harder to follow and the signature might be slightly complex if I have understood correctly e.g. self.get_expected_transferred_paths(pathtable, "parent_sub", lambda x: fnmatch.fmatch(x, "*date")). In this case because the benefits of centralising logic are less important in tests, we could leave as-is for readability?
There was a problem hiding this comment.
Yeah that's totally fine, it's a very straightforward filtering so i think we can leave it as it is.
cs7-shrey
left a comment
There was a problem hiding this comment.
Hey @JoeZiminski, just one doubt, do we delete these functions or are we keeping them for now just to be safe?
|
Hey @cs7-shrey good catch! Cheers I removed them, there is also a neat package vulture that can find unused functions / variables in your code, it couldn't find anything further (except a load of false positives for TUI-related things) and so I think that should be it. |
* Add get_rclone_config_name_local and rename get_rclone_config_name to get_rclone_config_name_central. * Add new transfer function. * Fix rename in tests. * Tidy up the search functions. * Fix circular import. * Adding more tests and fixing an edge case. * Add wildcards to local filesystem transfer tests. * Added tests to ssh. * Remove unused function. * Move teardown to fix tests on macos. * Revert get_rclone_config_name_central name change. * Add documentation to tests. * Remove unecessary sorted. * Fix cyclical import. * Remove unecessary wildcard. * Remove unused functions.
* Add get_rclone_config_name_local and rename get_rclone_config_name to get_rclone_config_name_central. * Add new transfer function. * Fix rename in tests. * Tidy up the search functions. * Fix circular import. * Adding more tests and fixing an edge case. * Add wildcards to local filesystem transfer tests. * Added tests to ssh. * Remove unused function. * Move teardown to fix tests on macos. * Revert get_rclone_config_name_central name change. * Add documentation to tests. * Remove unecessary sorted. * Fix cyclical import. * Remove unecessary wildcard. * Remove unused functions.
In #503 @cs7-shrey introduced a really nice way of performing searchers for files or folders to the central drive. This leverages
rclonedirectly, meaning we can use one function for every type of search, significantly reducing code duplication.This PR introduces makes this the default method by which datashuttle performs searches to the local or remote. Using
lsjsonto get all file and folders in the local / remote path, these are then filtered on our end. It is very difficult to userlcone's--includeor--filteror search for folders, so this way of performing the filtering was chosen. This could be slightly slower, but in general these projects are not going to be too huge, and the alternative to userclonefiltering directly would end up very convoluted and error-prone due to the folder-searching issue.This PR also adds some tests to file transfer to cover a few additional cases, mixing wildcards and the search keywords such as
all_sub.This is a backend change so no docs changed are required.