Skip to content

Commit 429f0b9

Browse files
committed
minor cleanup
1 parent 1b500c1 commit 429f0b9

3 files changed

Lines changed: 17 additions & 39 deletions

File tree

README.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,10 @@
22
[![codecov](https://codecov.io/gh/andhus/dirhash-python/branch/master/graph/badge.svg)](https://codecov.io/gh/andhus/dirhash-python)
33

44
# dirhash
5-
A lightweight python module and tool for computing the hash of any
5+
A lightweight python module and CLI for computing the hash of any
66
directory based on its files' structure and content.
7-
- Supports any hashing algorithm of Python's built-in `hashlib` module
8-
- `.gitignore` style "wildmatch" patterns for expressive filtering of files to
9-
include/exclude.
7+
- Supports all hashing algorithms of Python's built-in `hashlib` module.
8+
- Glob/wildcard (".gitignore style") path matching for expressive filtering of files to include/exclude.
109
- Multiprocessing for up to [6x speed-up](#performance)
1110

1211
The hash is computed according to the [Dirhash Standard](https://github.com/andhus/dirhash), which is designed to allow for consistent and collision resistant generation/verification of directory hashes across implementations.
@@ -68,7 +67,7 @@ and executing `hashlib` code.
6867
The main effort to boost performance is support for multiprocessing, where the
6968
reading and hashing is parallelized over individual files.
7069

71-
As a reference, let's compare the performance of the `dirhash` [CLI](https://github.com/andhus/dirhash/dirhash-python/cli.py)
70+
As a reference, let's compare the performance of the `dirhash` [CLI](https://github.com/andhus/dirhash-python/cli.py)
7271
with the shell command:
7372

7473
`find path/to/folder -type f -print0 | sort -z | xargs -0 md5 | md5`

src/dirhash/__init__.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -105,14 +105,14 @@ def dirhash(
105105
# Path Selection and Filtering
106106
Provided glob/wildcard (".gitignore style") match-patterns determine what
107107
paths within the `directory` to include when computing the hash value. Paths
108-
*relative to the root `directory` (i.e. excluding the name of the directory
109-
itself) are matched against the patterns.
108+
*relative to the root `directory`* (i.e. excluding the name of the root
109+
directory itself) are matched against the patterns.
110110
The `match` argument represent what should be *included* - as opposed
111-
to `ignore` patterns for which matches are *excluded*. Using `ignore` is
111+
to the `ignore` argument for which matches are *excluded*. Using `ignore` is
112112
just short for adding the same patterns to the `match` argument with the
113113
prefix "!", i.e. the calls bellow are equivalent:
114-
`dirhash(..., match=['*', '!<pattern>'])`
115-
`dirhash(..., ignore=['<pattern>'])`
114+
`dirhash(..., match=["*", "!<pattern>"])`
115+
`dirhash(..., ignore=["<pattern>"])`
116116
To validate which paths are included, call `dirhash.included_paths` with
117117
the same values for the arguments: `match`, `ignore`, `linked_dirs`,
118118
`linked_files` and `empty_dirs` to get a list of all paths that will be
@@ -348,7 +348,7 @@ class Filter(RecursionFilter):
348348
# Arguments
349349
match: Iterable[str] - An iterable of glob/wildcard (".gitignore style")
350350
match patterns for selection of which files and directories to include.
351-
Paths *relative to the root `directory` (i.e. excluding the name of the
351+
Paths *relative to the root `directory`* (i.e. excluding the name of the
352352
root directory itself) are matched against the provided patterns. For
353353
example, to include all files, except for hidden ones use:
354354
`match=['*', '!.*']` Default `None` which is equivalent to `['*']`,

src/dirhash/cli.py

Lines changed: 7 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,11 @@ def get_kwargs(args):
4343
choices=dirhash.algorithms_available,
4444
default='md5',
4545
help=(
46-
'Hashing algorithm to use. Always available: {}. Additionally available '
47-
'on current platform: {}. Note that the same algorithm may appear '
48-
'multiple times in this set under different names (thanks to '
49-
'OpenSSL) [https://docs.python.org/2/library/hashlib.html]'.format(
46+
'Hashing algorithm to use, by default "md5". Always available: {}. '
47+
'Additionally available on current platform: {}. Note that the same '
48+
'algorithm may appear multiple times in this set under different names '
49+
'(thanks to OpenSSL) '
50+
'[https://docs.python.org/2/library/hashlib.html]'.format(
5051
sorted(dirhash.algorithms_guaranteed),
5152
sorted(dirhash.algorithms_available - dirhash.algorithms_guaranteed)
5253
)
@@ -77,7 +78,7 @@ def get_kwargs(args):
7778
nargs='+',
7879
default=['*'],
7980
help=(
80-
'String of match-patterns, separated by blank space. NOTE: patterns '
81+
'One or several patterns for paths to include. NOTE: patterns '
8182
'with an asterisk must be in quotes ("*") or the asterisk '
8283
'preceded by an escape character (\*).'
8384
),
@@ -88,7 +89,7 @@ def get_kwargs(args):
8889
nargs='+',
8990
default=None,
9091
help=(
91-
'String of ignore-patterns, separated by blank space. NOTE: patterns '
92+
'One or several patterns for paths to exclude. NOTE: patterns '
9293
'with an asterisk must be in quotes ("*") or the asterisk '
9394
'preceded by an escape character (\*).'
9495
),
@@ -175,27 +176,5 @@ def get_kwargs(args):
175176
return vars(parser.parse_args(args))
176177

177178

178-
# def preprocess_kwargs(kwargs):
179-
# match_kwargs = {}
180-
# for kwarg in ['match', 'ignore']:
181-
# match_kwargs[kwarg] = kwargs.pop(kwarg)
182-
# match_patterns = dirhash.get_match_patterns(**match_kwargs)
183-
#
184-
# filtering_kwargs = {
185-
# 'match': match_patterns,
186-
# 'linked_dirs': kwargs.pop('linked_dirs'),
187-
# 'linked_files': kwargs.pop('linked_files'),
188-
# 'empty_dirs': kwargs.pop('empty_dirs'),
189-
# }
190-
# protocol_kwargs = {
191-
# 'allow_cyclic_links': kwargs.pop('allow_cyclic_links'),
192-
# 'entry_properties': kwargs.pop('properties') or ["data", "name"]
193-
# }
194-
# kwargs['filtering'] = filtering_kwargs
195-
# kwargs['protocol'] = protocol_kwargs
196-
#
197-
# return kwargs
198-
199-
200179
if __name__ == '__main__': # pragma: no cover
201180
main()

0 commit comments

Comments
 (0)