feat: triton matmul kernel adjusted, now is closer to HW behavior by chichun-charlie-liu · Pull Request #82 · foundation-model-stack/fms-model-optimizer

chichun-charlie-liu · 2025-03-27T15:50:26Z

Description of the change

add a flag truncate_then_accumulate to allow truncation on partial dot product, i.e. one output from a CUDA instruction, before accumulating into final sum. (Previous behavior is to first accumulate into final sum then truncate.)

Related issue number

How to verify the PR

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

Signed-off-by: cliu-us <cliu@us.ibm.com>

…descriptions to avoid confusion Signed-off-by: cliu-us <cliu@us.ibm.com>

Signed-off-by: cliu-us <cliu@us.ibm.com>

chichun-charlie-liu added 7 commits March 3, 2025 14:24

enable chunk_size=8

0f7201e

Signed-off-by: cliu-us <cliu@us.ibm.com>

bug fix, M, N incorrect when using chunk_size 8 padding/dilation

262db69

modify triton matmul to allow the formula C=A*B+C

eea8426

Signed-off-by: cliu-us <cliu@us.ibm.com>

bug fix

a4ea093

Signed-off-by: cliu-us <cliu@us.ibm.com>

SageAttn method is not ideal to verify accumulator precision, change …

cf39d3d

…descriptions to avoid confusion Signed-off-by: cliu-us <cliu@us.ibm.com>

linting

434e68f

Signed-off-by: cliu-us <cliu@us.ibm.com>

minor linting

e2960d5

Signed-off-by: cliu-us <cliu@us.ibm.com>

chichun-charlie-liu changed the title ~~enhance: triton matmul kernel adjusted, now is closer to HW behavior~~ feat: triton matmul kernel adjusted, now is closer to HW behavior Mar 27, 2025

github-actions Bot added the feat label Mar 27, 2025

chichun-charlie-liu added 2 commits March 27, 2025 22:51

enable "dilation" for fp8, if chunk_size<32

aba51f8

Signed-off-by: cliu-us <cliu@us.ibm.com>

minor linting

dbd540d

Signed-off-by: cliu-us <cliu@us.ibm.com>

chichun-charlie-liu marked this pull request as ready for review March 28, 2025 18:24

chichun-charlie-liu requested review from BrandonGroth, andrea-fasoli, kcirred, nwang-ibm and tharapalanivel as code owners March 28, 2025 18:24

BrandonGroth approved these changes Apr 4, 2025

View reviewed changes

BrandonGroth merged commit dc2ad5d into foundation-model-stack:main Apr 4, 2025
11 checks passed

chichun-charlie-liu deleted the fp24_acc_trun_chunk8 branch April 7, 2025 14:13

chichun-charlie-liu linked an issue May 8, 2025 that may be closed by this pull request

Add accumulation triton kernel #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: triton matmul kernel adjusted, now is closer to HW behavior#82

feat: triton matmul kernel adjusted, now is closer to HW behavior#82
BrandonGroth merged 9 commits into
foundation-model-stack:mainfrom
chichun-charlie-liu:fp24_acc_trun_chunk8

chichun-charlie-liu commented Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chichun-charlie-liu commented Mar 27, 2025

Description of the change

Related issue number

How to verify the PR

Was the PR tested

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants