Skip to content

feat: triton matmul kernel adjusted, now is closer to HW behavior#82

Merged
BrandonGroth merged 9 commits into
foundation-model-stack:mainfrom
chichun-charlie-liu:fp24_acc_trun_chunk8
Apr 4, 2025
Merged

feat: triton matmul kernel adjusted, now is closer to HW behavior#82
BrandonGroth merged 9 commits into
foundation-model-stack:mainfrom
chichun-charlie-liu:fp24_acc_trun_chunk8

Conversation

@chichun-charlie-liu
Copy link
Copy Markdown
Collaborator

Description of the change

add a flag truncate_then_accumulate to allow truncation on partial dot product, i.e. one output from a CUDA instruction, before accumulating into final sum. (Previous behavior is to first accumulate into final sum then truncate.)

Related issue number

How to verify the PR

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

Signed-off-by: cliu-us <cliu@us.ibm.com>
Signed-off-by: cliu-us <cliu@us.ibm.com>
Signed-off-by: cliu-us <cliu@us.ibm.com>
…descriptions to avoid confusion

Signed-off-by: cliu-us <cliu@us.ibm.com>
Signed-off-by: cliu-us <cliu@us.ibm.com>
Signed-off-by: cliu-us <cliu@us.ibm.com>
@chichun-charlie-liu chichun-charlie-liu changed the title enhance: triton matmul kernel adjusted, now is closer to HW behavior feat: triton matmul kernel adjusted, now is closer to HW behavior Mar 27, 2025
@github-actions github-actions Bot added the feat label Mar 27, 2025
Signed-off-by: cliu-us <cliu@us.ibm.com>
Signed-off-by: cliu-us <cliu@us.ibm.com>
@BrandonGroth BrandonGroth merged commit dc2ad5d into foundation-model-stack:main Apr 4, 2025
11 checks passed
@chichun-charlie-liu chichun-charlie-liu deleted the fp24_acc_trun_chunk8 branch April 7, 2025 14:13
@chichun-charlie-liu chichun-charlie-liu linked an issue May 8, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add accumulation triton kernel

2 participants