Refactor Decoder Tests#93
Conversation
| if isinstance(common_batch_sizes, str): | ||
| common_batch_sizes = [int(bs) for bs in common_batch_sizes.split(",")] | ||
| if isinstance(COMMON_BATCH_SIZES, str): | ||
| COMMON_BATCH_SIZES = [int(bs) for bs in COMMON_BATCH_SIZES.split(",")] |
There was a problem hiding this comment.
Not merged yet but fya get_env_to_int_list method from Gaurav's PR here would be helpful here
b6e36d4 to
1fdea45
Compare
1fdea45 to
9ef02a9
Compare
| model, | ||
| micro_model_path, | ||
| validation_zero_info, | ||
| ): |
There was a problem hiding this comment.
Now that we are doing a restructuring and splitting up validation level 1 and 0, it might be a good opportunity to give a description here of what each validation level is doing. If not in this PR, we could do in a follow up PR
There was a problem hiding this comment.
Hey @JRosenkranz, I rebased this PR and added a short description for level 0 / 1 for now. Happy to continue cleanup / add better docstrings in follow-up PRs as well 😄
|
bot:test |
|
bot:test |
b6e36d4 to
d2551b9
Compare
| ) | ||
| skip_assertions = os.environ.get("FMS_TEST_SHAPES_SKIP_ASSERTIONS", {}) | ||
| validation_info_dir = os.environ.get( | ||
| SKIP_ASSERTIONS = os.environ.get("FMS_TEST_SHAPES_SKIP_ASSERTIONS", {}) |
There was a problem hiding this comment.
Can we pull some of this env var setup outside of this script to use with the other pytests please?
There was a problem hiding this comment.
Definitely! I'll do it in a different PR to try to keep things as isolated as possible here if that's ok 🙂
There was a problem hiding this comment.
Totally ok and makes a lot of sense, thank you!
| model_path, batch_size, seq_length, max_new_tokens, persistent_model | ||
| ##### Common utils | ||
| # metric calculator based on the cross-entropy and mean diff for each decode step | ||
| def _metric_calculator(r: torch.Tensor, t: torch.Tensor): |
There was a problem hiding this comment.
I believe we use this in a few places, not necessarily for this PR but we might want to move this out into a utility
There was a problem hiding this comment.
Yup, looks like it! I will open some other cleanup PRs for stuff like this / clean up some of the env var stuff @tharapalanivel had asked for since this one is already a lot to look at
| warmup_model( | ||
| model, input_ids, max_new_tokens, compile_dynamic_sendnn, **extra_kwargs | ||
| ) | ||
| def _get_aiu_model(model_path, gptq_kwargs, persistent_model_inst): |
There was a problem hiding this comment.
I think I prefer the persistent model calling this in the current version with get_or_create. Is there a specific reason we moved this?
There was a problem hiding this comment.
The main the reason was the cache test, because the branch I based it off of was not using the persistent model fixture, and I wanted to avoid changing the tests too much while cleaning them up, since I also wasn't very familiar with what they were testing. I agree and put it back to just use get_or_create though, and will just use that in the cache test also!
| return cpu_validation_info | ||
|
|
||
| # Don't save iter 0 for AIU only | ||
| skip_save = device == "aiu" and token_iter == 0 |
There was a problem hiding this comment.
I believe we are supposed to save every iteration here
There was a problem hiding this comment.
Sounds good, removed it!
2e42e7c to
1e369b2
Compare
|
bot:test |
tharapalanivel
left a comment
There was a problem hiding this comment.
Will need another rebase and lint fixes but lgtm once the bot tests also pass, thanks @alex-jw-brooks!
Signed-off-by: Avery Blanchard <avery.blanchard@ibm.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
commit ed571f728a351f8dd92737be5554c3dc46f71a30
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Tue Jul 29 09:20:06 2025 -0600
Remove cache tests
commit 2848f7b2785b91c60c536b8993c3193c40c381ea
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Mon Jul 28 08:07:01 2025 -0600
Add leading underscores, revert model name
commit c30b7b70a0f6e464d3212fd9bed4f9ea33f9de93
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Mon Jul 28 07:15:08 2025 -0600
Explictly clear cache paths
commit 42aaf666d7f8ffb2fb611df7ad2d06b48e480dd7
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Mon Jul 28 07:14:23 2025 -0600
Set the cache dir in conftest
commit b978e7225f02bf1d9a5f7b919ca6cbe2ee8d641a
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Mon Jul 28 06:15:10 2025 -0600
run formatting
commit 8d64df08333991927c45f9a982ddaf95f39c94cf
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Fri Jul 25 11:18:13 2025 -0600
refactor cache miss into fixture
commit 0b524b8c818495cb646add2adfc27a2884ac8de5
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Fri Jul 25 07:11:09 2025 -0600
Consolidate cache test with common
commit d8a36d405a101e101ab9ede3b8d12fa3026cd01f
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Fri Jul 25 06:41:13 2025 -0600
Run cache test first
commit 2efb797fb21587e9136b314c44ec56c658636826
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Fri Jul 25 05:48:25 2025 -0600
Finish splitting out common shape test helpers
commit 4ae73dea18848005f86d1c9bcdf29f153711330f
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Fri Jul 25 05:28:31 2025 -0600
refactor most of common shape test
commit 083afdc3a468649ec4b0bbadc921d40b47e37498
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Thu Jul 24 14:08:20 2025 -0600
Move torch sendnn cache dir to common
commit e9b576381a738c59f91d5fc904ceaa2a0e410864
Author: Alex-Brooks <Alex.Brooks@ibm.com>
Date: Thu Jul 24 14:02:06 2025 -0600
Use caps for constants, common post proc
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
1e369b2 to
dd355c1
Compare
|
bot:test |
|
bot:test |
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
bot:test |
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
bot:test |
|
|
||
| # NOTE: we should configure the cachedir before importing torchsendnn's | ||
| # graph cache to prevent it from being initialized in the wrong place. | ||
| os.environ["TORCH_SENDNN_CACHE_DIR"] = os.path.join(os.getcwd(), ".cache") |
There was a problem hiding this comment.
should this be a setdefault, this way it will only set it if a user did not already specify it?
There was a problem hiding this comment.
Good point! Changed it
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
3471879 to
70e84f5
Compare
|
bot:test |
This PR builds on top of #20 to try to make the tests more reusable.
Summary of changes from the above branch are:
- Splits the common shapes test out into more understandable helpers that are then reused in the cache test in the follow-up PR
- Renames some stuff to better align with conventions