feat: resolve repo type and fetch accordingly#435
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Wauplin
left a comment
There was a problem hiding this comment.
Thanks for the ping! I've mostly checked the huggingface_hub integration
danieldk
left a comment
There was a problem hiding this comment.
Great PR! Added two comments.
My main worry is that some public functions (install_kernel, get_variants, may have missed some) get a public repo_type argument, which we'll have to remove again if we switch fully to kernel-repos only. Is there some way we can get rid of them? (e.g. by doing automatic detection inside the functions)
| is_new_branch = False | ||
| if branch is not None: | ||
| is_new_branch = not _branch_exists(api, repo_id, branch) | ||
| is_new_branch = not _branch_exists(api, repo_id, branch, repo_type="model") |
There was a problem hiding this comment.
Fixed to model because kernels upload is deprecated?
There was a problem hiding this comment.
yea exactly, this is because the python upload only support uploading model repo prior to being deprecated, so now all of the call explicitly use repo_type="model" instead of updating the deprecated path for kernel types. happy to change if it makes more sense to upload to kernel repos, but intentionally avoided adding the new functionality for now.
There was a problem hiding this comment.
Yeah no point updating deprecated functionality.
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks for the PR!
Main comment is using _resolve_repo_type() for fetching the repo type inline instead of calling it as a part of other functions such as select_revision_or_version. This feels more idiomatic to me.
Also, I think we should add the deprecation now. If we do so, we could also add a test to make sure deprecation warnings are raised accordingly.
| is_new_branch = False | ||
| if branch is not None: | ||
| is_new_branch = not _branch_exists(api, repo_id, branch) | ||
| is_new_branch = not _branch_exists(api, repo_id, branch, repo_type="model") |
There was a problem hiding this comment.
Yeah no point updating deprecated functionality.
| repo_type = _resolve_repo_type(repo_id) | ||
| versions = {} | ||
| for tag in _get_hf_api().list_repo_refs(repo_id).tags: | ||
| for tag in _get_hf_api().list_repo_refs(repo_id, repo_type=repo_type).tags: |
There was a problem hiding this comment.
(nit): Similar to the above function, do we want?
tags = _get_hf_api().list_repo_refs(repo_id, repo_type=repo_type).tags| """ | ||
| if isinstance(version_spec, int): | ||
| versions = _get_available_versions(repo_id) | ||
| versions, repo_type = _get_available_versions(repo_id) |
There was a problem hiding this comment.
_get_available_versions() returning versions as well as repo_type looks confusing to me. Isn't it better to just call _resolve_repo_type(repo_id) to determine the repo_type?
| # TODO: add back the deprecation warning when we remove support for the "model" repo type. | ||
| # we should also add a removal date in the warning message for a final migration date. | ||
| # warnings.warn( | ||
| # f"Repository '{repo_id}' uses deprecated repo type 'model'; " | ||
| # "use 'kernel' instead.", | ||
| # DeprecationWarning, | ||
| # stacklevel=2, | ||
| # ) |
There was a problem hiding this comment.
Should we not add it right now? If so, we could add a deprecation date (maybe two release cycles) and instructions to do it the proper way (re-uploading the repo as a kernel type)?
IIUC, after the model repos are removed, the above LOC will definitely fail. Deprecation shouldn't mean complete removal; it should indicate that removal is coming and change is needed.
Or am I misunderstanding something?
0135bc0 to
e8095a9
Compare
Wauplin
left a comment
There was a problem hiding this comment.
Made a new pass on the PR and dropping "model" indeed made the implementation simpler ^^
I've left some comments, mostly to know if you need new things to be updated on the huggingface_hub side or not.
(also a nit, the PR title and description are outdated now)
| repo_id=repo_id, | ||
| operations=list(chunk), | ||
| revision=revision, | ||
| repo_type="model", |
There was a problem hiding this comment.
no repo_type="kernel" here? (same above)
There was a problem hiding this comment.
no repo_type="kernel" in this case since we plan to fully remove this upload file and all of the upload logic on the python cli in a following PR.
I removed repo_type="model" but realize that was unneeded - however in this case I don't think is technically changes any of the logic since if I understand correctly the default functionality is to push to model repos.
however as mentioned this will all be removed in the next PR I open since we are cleaning up deprecated paths for the v0.14.0 release
There was a problem hiding this comment.
I removed repo_type="model" but realize that was unneeded - however in this case I don't think is technically changes any of the logic since if I understand correctly the default functionality is to push to model repos.
yes indeed, "model" is the default repo type
no repo_type="kernel" in this case since we plan to fully remove this upload file and all of the upload logic on the python cli in a following PR.
Got it!
| # Ensure that we are testing with a non-existing kernel, so that we know | ||
| # that the kernel must be local. | ||
| with pytest.raises(RepositoryNotFoundError): | ||
| with pytest.raises(HfHubHTTPError): |
There was a problem hiding this comment.
RepositoryNotFoundError was more specific. Did it change? Or it is that RepositoryNotFoundError is not raised for kernel repos (and in such a case we should fix it on huggingface_hub side)
There was a problem hiding this comment.
oh thanks good catch, RepositoryNotFoundError is a better error but I think throw kernel repos throw a 401 instead of a 404 when a repo cannot be found (example in this error). We may also need a update on the server side to improve the error.
not sure if we should block this PR with that update, maybe it can be a small followup PR? what do you think?
There was a problem hiding this comment.
I think throw kernel repos throw a 401 instead of a 404 when a repo cannot be found
That depends whether user is authenticated or not. It's not specific to kernels. You can test that by opening
- https://huggingface.co/api/models/huggingface/doesnt_exist
- https://huggingface.co/api/kernels/huggingface/doesnt_exist
in your browser. In normal mode you get a 404 and in incognito (i.e. logged out) you get a 401. This is because repo might exist and be private (so unauthorized). The theory behind it is debatable, but low prio as we could argue for both 401 and 404 .
So here I would advice to either make authenticated calls or catch HfHubHTTPError, it's fine
There was a problem hiding this comment.
ah okay that makes sense thanks for the info! will keep HfHubHTTPError for now
| assert version() == "1" | ||
|
|
||
|
|
||
| @pytest.mark.skip(reason="Tags are not supported on kernel repos") |
There was a problem hiding this comment.
"Tags are not supported on kernel repos"
Is this something that you need resolved on huggingface_hub side? If yes I can work on a PR, if not maybe just remove the test/feature if it doesn't exist anymore
There was a problem hiding this comment.
I'm not sure we need any changes on the huggingface_hub side, this test is skipped because it tests the backward compatible path where we used tags instead of branches.
this test and the _old locking path is also going to be removed in a follow up PR as this path is deprecated an we are removing support in v0.14.0
| assert linear.n_calls == 2 | ||
|
|
||
|
|
||
| @pytest.mark.skip(reason="Tags are not supported on kernel repos") |
There was a problem hiding this comment.
same questions here for tags and RepositoryNotFoundError
There was a problem hiding this comment.
tried to use RepositoryNotFoundError but ran into the 401 issue noted above. I've reverted back to HfHubHTTPError so CI passes but also this test will be removed in a follow up PR
thanks for pointing these things out!
| def install_kernel_all_variants( | ||
| repo_id: str, | ||
| revision: str, | ||
| *, |
| package_name, variant_path = install_kernel( | ||
| repo_id, revision=revision, backend=backend, user_agent=user_agent | ||
| repo_id, | ||
| revision, |
There was a problem hiding this comment.
Not using a kwarg for revision seems like a regression. Also see the other comments about kwargs.
There was a problem hiding this comment.
thanks for pointing out, updated as mentioned above
This PR adds support for fetching from kernel repo types and falls back to fetching from model repo types if the kernel repo does not exist