Add embed-openclip-rn101-yfcc15m model (FP16, 86-tag default vocab)#38
Open
andriiryzhkov wants to merge 1 commit into
Open
Add embed-openclip-rn101-yfcc15m model (FP16, 86-tag default vocab)#38andriiryzhkov wants to merge 1 commit into
andriiryzhkov wants to merge 1 commit into
Conversation
c2c86fc to
7e18cc4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
embed-openclip-rn101-yfcc15m– a ResNet-101 image embedder for tag suggestion and image-similarity search.Why this one and not a stronger CLIP variant: every other CLIP training corpus (LAION, WIT-400M, DataComp, MetaCLIP, WebLI) is a web scrape with no per-image consent. YFCC15M is 15M Flickr photos uploaded under Creative Commons – the one option that meets the project's consent-based training-data criterion. The cost is a lower benchmark score (~31% ImageNet zero-shot vs ~67% for LAION ViT-B-32), but in actual photo-library use the gap is much smaller than that number suggests.
Ships
model.onnx(60 MB FP16, mean/std + L2 norm baked in) plustags.json– 86 precomputed centroids for cold-start tag suggestions before users have enough data of their own. Text encoder runs at convert time only, not shipped.