Skip to content

Add size divisibility specialization#150

Merged
voltjia merged 2 commits into
masterfrom
add-size-divisibility-specialization
May 8, 2026
Merged

Add size divisibility specialization#150
voltjia merged 2 commits into
masterfrom
add-size-divisibility-specialization

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 8, 2026

Summary

  • Pass int64:16 divisibility hints to Triton for size parameters whose dim is divisible by 16, enabling vectorized loads alongside the existing stride-1 contiguity hint.
  • Extend the C++ dispatcher to pick the most-specific variant by also checking shape[dim] % 16 == 0, falling back to less-specialized variants otherwise.
  • Rename the existing stride-spec vocabulary to contiguity so the new divisibility axis sits alongside it without ambiguity (variant suffixes go from s_<dims> to divisibility_<...>_contiguity_<...>).

Testing

pytest output:

============================= test session starts ==============================
platform linux -- Python 3.10.16, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/ninetoothed
configfile: pyproject.toml
plugins: anyio-4.12.1, xdist-3.8.0, cov-7.0.0, typeguard-4.4.4
collected 213 items

tests/test_add.py .                                                      [  0%]
tests/test_addmm.py ..                                                   [  1%]
tests/test_aot.py .........                                              [  5%]
tests/test_aot_auto_tuning.py ....                                       [  7%]
tests/test_attention.py ........                                         [ 11%]
tests/test_auto_tuner.py ....                                            [ 13%]
tests/test_clone.py ....                                                 [ 15%]
tests/test_conv2d.py ....                                                [ 16%]
tests/test_data_ptr.py .                                                 [ 17%]
tests/test_debugging.py .                                                [ 17%]
tests/test_dropout.py .                                                  [ 18%]
tests/test_eval.py ........                                              [ 22%]
tests/test_expand.py .                                                   [ 22%]
tests/test_generation.py ............................................... [ 44%]
.............................                                            [ 58%]
tests/test_getitem.py ..........                                         [ 62%]
tests/test_ipynb.py .                                                    [ 63%]
tests/test_jagged.py ................                                    [ 70%]
tests/test_matmul.py ..                                                  [ 71%]
tests/test_max_pool2d.py ..                                              [ 72%]
tests/test_naming.py .......                                             [ 76%]
tests/test_pad.py ................................................       [ 98%]
tests/test_pow.py .                                                      [ 99%]
tests/test_softmax.py .                                                  [ 99%]
tests/test_unsqueeze.py .                                                [100%]

======================= 213 passed in 3161.00s (0:52:41) =======================

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ebd64b6dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/ninetoothed/aot.py
@voltjia voltjia merged commit eea40cd into master May 8, 2026
8 checks passed
@voltjia voltjia deleted the add-size-divisibility-specialization branch May 8, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant