Huggingface revision pinning#1281
Conversation
In much the same way as unpinned container images benefit from digest pinning, fixing a model, dataset or file to a revision digest uniquely and immutably fixes use to a paricular model snapshot (commit)
for more information, see https://pre-commit.ci
ericwb
left a comment
There was a problem hiding this comment.
Looks mostly good to go after noted fixes.
- Add an entry for CWE 494 - Use string.hexdigits - Set to 18.6 release - Remove Copywright - Order after markupsafe
| IMPROPER_CHECK_OF_EXCEPT_COND = 703 | ||
| INCORRECT_PERMISSION_ASSIGNMENT = 732 | ||
| INAPPROPRIATE_ENCODING_FOR_OUTPUT_CONTEXT = 838 | ||
| DOWNLOAD_OF_CODE_WITHOUT_INTEGRITY_CHECK = 494 |
There was a problem hiding this comment.
Please keep these sorted by number. Makes it easier to detect whether we already have a constant for that CWE
for more information, see https://pre-commit.ci
| # Commit hashes: 40 chars (full SHA) or 7+ chars (short SHA) | ||
| if isinstance(revision_to_check, str): | ||
| # Remove quotes if present | ||
| revision_str = str(revision_to_check).strip("\"'") |
There was a problem hiding this comment.
This might result in some unexpected results in the case of more complicated string formation. Like f"{foo} '{bar}' """
There was a problem hiding this comment.
I will iterate on this some more, thanks for the quick reviews! much appreciated.
ericwb
left a comment
There was a problem hiding this comment.
Looks good enough to merge IMO
|
What happens for library code that accepts user-supplied revisions? Think of any function wrapping def load_dataset_wrapped(dataset_name, dataset_kwargs):
return load_dataset(
self.dataset_name,
revision=dataset_kwargs["revision"],
...? cc @ElenaKhaustova (Please let us know if we should rather open a new issue about this) |
In much the same way as unpinned container images benefit from digest pinning, fixing a model, dataset or file to a revision digest uniquely and immutably fixes use to a particular model snapshot (commit)
Example run: