-
Notifications
You must be signed in to change notification settings - Fork 7
Full refactor of the library #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 47 commits
Commits
Show all changes
69 commits
Select commit
Hold shift + click to select a range
64f38a5
feat: add HF dependencies (as a group)
meilame-tayebjee 3703f48
feat: add WordPiece tokenize
meilame-tayebjee 1266287
chore: rename file to ngram
meilame-tayebjee d2563ea
feat: improve base tokenizer, add HF abstract
meilame-tayebjee ae045ab
feat: change inheritance to HFTokenizer
meilame-tayebjee c6eac58
feat(dataset): init
meilame-tayebjee c25eb36
fix: add update of vocab size in post training
meilame-tayebjee d897bef
fix: categorical tensors set to None instead of empty tensors when no…
meilame-tayebjee 51be1d1
feat: add ruff and datasets dep
meilame-tayebjee b53a10d
feat: first working example for model/module
meilame-tayebjee 6f3417c
chore: fix signature
meilame-tayebjee c600f18
chore: default value for batch_idx in predict
meilame-tayebjee 85cb8b8
feat!: violently modularize and simplify forward+checking
meilame-tayebjee dc863ff
chore: remove tokenizer (now it is ngram tokenizer)
meilame-tayebjee 064b73f
feat!(components): first working example with full modularity
meilame-tayebjee 164cccf
fix: avoid bugs with numpy arrays in boolean contexts
meilame-tayebjee c5b9673
feat: add smooth imports for HF and output_dim field
meilame-tayebjee a0fe18c
feat!(wrapper class): finalize orchestration tokenizer, dataset, mode…
meilame-tayebjee ddd7cec
fix: return only optimizer when scheduler is none
meilame-tayebjee 32e6805
feat(test): clean tests (wip)
meilame-tayebjee 8fdaf0c
chore: clean
meilame-tayebjee a7f71d3
feat: enable to choose context size in tokenizer
meilame-tayebjee 0a9eda5
chore: pin_memory to default False (avoid warning on CPU run)
meilame-tayebjee 6d951fe
feat: ad __repr__ for all components
meilame-tayebjee c31ad43
chore: format
meilame-tayebjee 956b7a3
feat!(HF): enable load from pretrained
meilame-tayebjee a497697
chore: update description
meilame-tayebjee 2fda9c2
feat: __call__ for tokenizers is tokenize
meilame-tayebjee 13b9de4
feat(tokenizers): clean __call__ and __rep__, add offset return for e…
meilame-tayebjee f55452b
feat!(explainability): finalize explainability feature at word and ch…
meilame-tayebjee 0262109
chore: remove useless file
meilame-tayebjee 6bdb750
fix: typo in trainer_params max_epochs
meilame-tayebjee 830a45c
feat!(tokenizer): ensure output is consistent across al tokenizers
meilame-tayebjee c7307f5
fix: move hf-dep to optional dependencies
meilame-tayebjee a5b3e4d
Merge branch 'main' into hf_tokenizer
meilame-tayebjee 934b041
feat!(attention): enable attention logic
meilame-tayebjee 5e150b2
fix: check if categorical var are present before checking their arrays
meilame-tayebjee 162e296
fix: no persistent_workers if num_workers=0
meilame-tayebjee 1591bd9
fix: closing parenthesis
meilame-tayebjee 1af9e53
fix: truncation=True is needed
meilame-tayebjee 4ca1807
add ipywidgets
meilame-tayebjee 927a5e7
fix: check_Y problem of indexes
meilame-tayebjee 7fdb4e3
fix: truncation=True is needed
meilame-tayebjee 4e36940
rmeove unncessary print
meilame-tayebjee d44d051
progress on doc
meilame-tayebjee a179c37
fix: load model on cpu to avoid pb after training
meilame-tayebjee ea26799
progress on docs
meilame-tayebjee 269c76a
fix!(explainability): remove nan words and fix plotting
meilame-tayebjee 89cc8fe
examples : fix basic_classification after refactor
micedre 1b62eee
Fix check for categorical variable
micedre 704fe14
Adapt examples to new package architecture
micedre be28866
Merge branch 'hf_tokenizer' of https://github.com/InseeFrLab/torchTex…
meilame-tayebjee be4acf2
chore: first draft of example notebook. WIP
meilame-tayebjee 0f9b4b4
refactor: replace cpu_run with accelerator in TrainingConfig
meilame-tayebjee 5102f82
feat!(tokenizer-ngram): add very fast ngram tokenizer
meilame-tayebjee ab58e26
doc: clean example notebook
meilame-tayebjee 45ace28
fix: better handling of truncation to avoid warning
meilame-tayebjee b2e797b
doc: fix readme
meilame-tayebjee 84b118b
fix: allow tokenizer not to have train attribute
meilame-tayebjee 3c0a85a
feat(ngram): add return offsets and word_ids + fix output_dim
meilame-tayebjee ab70485
fix: update vocab_size after training
meilame-tayebjee 27a11bb
fix: add a flag for return_word_ids
meilame-tayebjee 823467b
fix: add a flag for return_word_ids
meilame-tayebjee 93a6e80
Merge branch 'hf_tokenizer' of https://github.com/InseeFrLab/torchTex…
meilame-tayebjee 4e2ffa5
fix: replace _build_vocab by train
meilame-tayebjee 519a32d
feat(test): add test of all pipeline with different tokenizers
meilame-tayebjee 6017a22
chore: remove old file
meilame-tayebjee aa70919
fix: right command to install HF dependencies in warning
meilame-tayebjee 41a15f0
chore: change HF opt. dep. group name to huggingface
meilame-tayebjee File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.