The fsspec find interface accepts maxdepth, but the adlfs implementation doesn't actually use this parameter -- so it always recursively lists all files in the directory.
See here:
|
def find(self, path, withdirs=False, prefix="", **kwargs): |
pyarrow's wrapper of fsspec filesystems uses fs.find to perform a list operation at a directory. This makes use of the maxdepth parameter when the user specifies recursive=False in pyarrow.
fs = pyarrow.fs.PyFileSystem(pyarrow.fs.FSSpecHandler(fsspec_fs))
file_infos = fs.get_file_info(
pyarrow.fs.FileSelector("path", recursive=False)
)
# All files under path will be recursively listed, rather than just the top level.
Easy fix is to just take the gcsfs implementation:
https://github.com/fsspec/gcsfs/blob/ad684a5b3f25d46eeb5c3aebdbe647056a5e312b/gcsfs/core.py#L1441-L1444
The fsspec
findinterface acceptsmaxdepth, but theadlfsimplementation doesn't actually use this parameter -- so it always recursively lists all files in the directory.See here:
adlfs/adlfs/spec.py
Line 1128 in f15c37a
pyarrow's wrapper offsspecfilesystems usesfs.findto perform a list operation at a directory. This makes use of themaxdepthparameter when the user specifiesrecursive=Falsein pyarrow.Easy fix is to just take the
gcsfsimplementation:https://github.com/fsspec/gcsfs/blob/ad684a5b3f25d46eeb5c3aebdbe647056a5e312b/gcsfs/core.py#L1441-L1444