LLVM Bump to c27444ab4976dd9ff131212f87463f9945ab28d7#393
Open
jorickert wants to merge 4 commits into
Open
Conversation
AMD changes: Update lowering and tests for onnx->tosa conversions that are not upstream Partial cherry-pick of f03b287 LLVM update 43d71ba (onnx#3086) * update float types, tosa, other misc changes Signed-off-by: Boyana Norris <brnorris03@gmail.com> * fix buildOnnxToTosaPaddingConstOp Signed-off-by: Boyana Norris <brnorris03@gmail.com> * fix lit tests (wip) Signed-off-by: Boyana Norris <brnorris03@gmail.com> * updte doc Signed-off-by: Boyana Norris <brnorris03@gmail.com> * use stablehlo tagged version Signed-off-by: Boyana Norris <brnorris03@gmail.com> * fixed more lit tests Signed-off-by: Boyana Norris <brnorris03@gmail.com> * fix .clang-format Signed-off-by: Boyana Norris <brnorris03@gmail.com> * fix lit (wip) Signed-off-by: Boyana Norris <brnorris03@gmail.com> * revert .clang-format change Signed-off-by: Boyana Norris <brnorris03@gmail.com> * fix lit tests Signed-off-by: Boyana Norris <brnorris03@gmail.com> * fix formatting Signed-off-by: Boyana Norris <brnorris03@gmail.com> * lit tests pass (except jni -- not tested) Signed-off-by: Boyana Norris <brnorris03@gmail.com> * manually fix formatting; can't get clang-format to do it on any of my machines Signed-off-by: Boyana Norris <brnorris03@gmail.com> * revert lit test changes unrelated to update Signed-off-by: Boyana Norris <brnorris03@gmail.com> * update llvm and stablhlo shas, misc minor updates Signed-off-by: Boyana Norris <brnorris03@gmail.com> * remove non-existent passes Signed-off-by: Boyana Norris <brnorris03@gmail.com> * lit updates (wip) Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Bump Upsample to Opset 10 and change the opset versioning to allow to skip over opset versions if a newer, backwards compatible one exists. (onnx#3065) * Bump Upsample to Opset 10 This is a non-functional change, the only difference is that Upsample was marked as deprecated with Opset 10 Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Use a map of the available opset versions in onnx to select the node opset to use. Introduces a new built-time generated map that contains all versions of an operation as defined by onnx. To determine the opset version for a node/op: 1. Determine the latest valid opset version. This is the newest version in this opset-version-map that is older or equal to the current graph opset. 2. Select the newest version from the versions supported by onnx-mlir that is equal or newer to the latest valid opset version. This allows it to skip over opset versions, that have a newer backwards compatible version. Example: Versions in onnx and supported by onnx-mlir: [3, 5]. Graph opset version to node version: 3 -> 3, 4 -> 3, 5 -> 5 Versions in onnx: [7, 9, 10]. Version 10 is backwards compatible to version 9. Version supported by onnx-mlir: [7, 10]. Graph opset version to node version: 7 -> 7, 8 -> 7, 9 -> 10, 10 -> 10 Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> --------- Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Improve scripts (onnx#3089) Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * Bump various ops to opset 21, adding int4/uint4 and 8 bit float support. (onnx#3064) * Add support for TensorProto::UINT4/INT4 Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Upgrade onnx.Cast to opset 21 Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Bump various ops to opset 21. These are all backwards compatibel version bumps, only adding support for int/uint4. Bumped ops: Flatten Identity If Loop Pad Reshape Scan Shape Size Squeeze Transpose Unsqueeze Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> --------- Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Added minimal support to do some timing of OM Runtime functionality (onnx#3095) Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * adding __errno_location call for mvs (onnx#3099) Signed-off-by: Christopher Munoz <chrismunoz1019@gmail.com> * Rewriting pattern to remove WhereOp and EqualOp. (onnx#3094) Remove ONNXWhereOp and ONNXEqualOp into newly created ConcatOp. --------- Signed-off-by: Haruki Imai <imaihal@jp.ibm.com> * Enable NNPA saturation by default and change the option to --nnpa-disable-saturation (onnx#3101) * Enable NNPA saturation by default and change the option to --nnpa-disable-saturation Signed-off-by: Tung D. Le <tung@jp.ibm.com> --------- Signed-off-by: Tung D. Le <tung@jp.ibm.com> * removing weak attribute of errorno (onnx#3103) Signed-off-by: Christopher Munoz <chrismunoz1019@gmail.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * Fix the custom build link for docs/Docker.md (onnx#3104) Signed-off-by: JiQiu <qiuji@iscas.ac.cn> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * Python driver for torch model (onnx#3093) * implementation Signed-off-by: Chen Tong <chentong@us.ibm.com> * format Signed-off-by: Chen Tong <chentong@us.ibm.com> * test Signed-off-by: Chen Tong <chentong@us.ibm.com> * py format Signed-off-by: Chen Tong <chentong@us.ibm.com> * torch.compile Signed-off-by: Chen Tong <chentong@us.ibm.com> * refine Signed-off-by: Chen Tong <chentong@us.ibm.com> * add debug Signed-off-by: Chen Tong <chentong@us.ibm.com> * respond Signed-off-by: Chen Tong <chentong@us.ibm.com> * response Signed-off-by: Chen Tong <chentong@us.ibm.com> * format Signed-off-by: Chen Tong <chentong@us.ibm.com> --------- Signed-off-by: Chen Tong <chentong@us.ibm.com> Co-authored-by: Sunny Anand <164108690+Sunny-Anand@users.noreply.github.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * implement (onnx#3108) Signed-off-by: Chen Tong <chentong@us.ibm.com> * Followups for torch model driver (onnx#3106) * simplify Signed-off-by: Chen Tong <chentong@us.ibm.com> * complete Signed-off-by: Chen Tong <chentong@us.ibm.com> * fix Signed-off-by: Chen Tong <chentong@us.ibm.com> * fix Signed-off-by: Chen Tong <chentong@us.ibm.com> --------- Signed-off-by: Chen Tong <chentong@us.ibm.com> * Fix an error in ZHighConstantPropagation for QuantizedStick (onnx#3112) Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Add z17 for -march (onnx#3113) * done Signed-off-by: Tong Chen <chentong@zaiu.pok.stglabs.ibm.com> * convert Signed-off-by: Tong Chen <chentong@us.ibm.com> * fix Signed-off-by: Tong Chen <chentong@us.ibm.com> * format Signed-off-by: Tong Chen <chentong@us.ibm.com> --------- Signed-off-by: Tong Chen <chentong@zaiu.pok.stglabs.ibm.com> Signed-off-by: Tong Chen <chentong@us.ibm.com> * Decompose Hardswish into simpler ONNX ops (onnx#3107) * Decompose and lower Hardswish Signed-off-by: Kumarappan <kumarappan.thiyagarajan@multicorewareinc.com> * Providing the decomposition as compile time option with krnl dialect lowering as default Signed-off-by: Kumarappan <kumarappan.thiyagarajan@multicorewareinc.com> --------- Signed-off-by: Kumarappan <kumarappan.thiyagarajan@multicorewareinc.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * Reorder relu to maxpool optimization pass in ONNX dialect (onnx#3109) * Reorder Relu and maxpool optimization Signed-off-by: Arkar-Hema <hema.bhaskar@multicorewareinc.com> * Swap Relu and maxpool only when Relu is not a consumer of conv Signed-off-by: Arkar-Hema <hema.bhaskar@multicorewareinc.com> --------- Signed-off-by: Arkar-Hema <hema.bhaskar@multicorewareinc.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * Move onnx.Constant before the root op when fusing onnx ops (onnx#3119) Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Support QLinearMatMul on CPU (onnx#3117) * Support QLinearMatMul on CPU Signed-off-by: Tung D. Le <tung@jp.ibm.com> --------- Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Update black-format-check.yml (onnx#3118) Signed-off-by: Andreas Fehlner <fehlner@arcor.de> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * Merge nested concat Ops optimization pass in ONNX dialect (onnx#3111) * Merging nested concat ops Signed-off-by: Arkar-Hema <hema.bhaskar@multicorewareinc.com> --------- Signed-off-by: Arkar-Hema <hema.bhaskar@multicorewareinc.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * Enhance shape inference for ONNX Reshape (onnx#3122) * Add a special case in shape inference for reshape Signed-off-by: Tung D. Le <tung@jp.ibm.com> --------- Signed-off-by: Tung D. Le <tung@jp.ibm.com> * update zdnn1.1.2 (onnx#3130) Signed-off-by: Sunny Anand <sunnyanand.979@gmail.com> * Updating supported ops on NNPA md for z17. (onnx#3120) * starting to update new z17 NNPA ops Signed-off-by: Christopher Munoz <chrismunoz1019@gmail.com> --------- Signed-off-by: Christopher Munoz <chrismunoz1019@gmail.com> Co-authored-by: Sunny Anand <164108690+Sunny-Anand@users.noreply.github.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * fix CVE-2025-32434 (onnx#3135) Signed-off-by: Sunny Anand <sunnyanand.979@gmail.com> * Fuse consecutive clips pattern (onnx#3132) * Fuse consecutive clips pattern Signed-off-by: Kumarappan <kumarappan.thiyagarajan@multicorewareinc.com> --------- Signed-off-by: Kumarappan <kumarappan.thiyagarajan@multicorewareinc.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> * Replace deprecated applyPatternsAndFoldGreedily with applyPatternsGreedily. This functions also folds by default, so it is an NFC Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Fix clang-format Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Replace bufferization::createOwnershipBasedBufferDeallocationPass with mlir::createConvertBufferizationToMemRefPass Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Update onnx-to-tosa reshape lit test Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Move gemm_to_fc tests to gemm_to_matmul Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Change tosaBuilder::mul function signature to make clear that the shift is an int8 Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Disable buffer_loop_hoisting test as it gets completly optimized away Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Guard against dynamic dim in result Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Use resize operaton input and output type to calculate the border, instead of using the calculated numerator/denominator Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Guard against linear interpolation of integer types Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Add test for disallowed onnx.Resize on its with linear interpolation to tosa Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Add 'Pure' annotation to some krnl ops and recreate documentation Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Build stablehlo with static libs Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> * Disable memref.prefetch since it does not work with the new bufferization Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Conv add const where the constant is a scalar (onnx#3145) Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * added support for Celu op (onnx#3139) Signed-off-by: logeshwaranmcw <logeshwaran.elanchelian@multicorewareinc.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> * Fix some warnings related to stickification for NNPA (onnx#3147) Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Removing duplicate file (onnx#3146) Signed-off-by: Christopher Munoz <chrismunoz1019@gmail.com> * migrated instance/group normalization from decompose to canonicalize (onnx#3148) Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * Fusion of Matmul add covering the stacked/unstacked/bcast1/bcast23 patterns (onnx#3140) Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * Support --march=native (onnx#3134) * changes Signed-off-by: Chen Tong <chentong@us.ibm.com> * format Signed-off-by: Chen Tong <chentong@us.ibm.com> * linkage Signed-off-by: Chen Tong <chentong@us.ibm.com> * lib Signed-off-by: Chen Tong <chentong@us.ibm.com> --------- Signed-off-by: Chen Tong <chentong@us.ibm.com> * fix another error on s390x Signed-off-by: Tung D. Le <tung@jp.ibm.com> * lower Ub to LLVM since vector.shape_cast is lowered to UB Signed-off-by: Tung D. Le <tung@jp.ibm.com> --------- Signed-off-by: Boyana Norris <brnorris03@gmail.com> Signed-off-by: Tung D. Le <tung@jp.ibm.com> Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Christopher Munoz <chrismunoz1019@gmail.com> Signed-off-by: Haruki Imai <imaihal@jp.ibm.com> Signed-off-by: JiQiu <qiuji@iscas.ac.cn> Signed-off-by: Chen Tong <chentong@us.ibm.com> Signed-off-by: Tong Chen <chentong@zaiu.pok.stglabs.ibm.com> Signed-off-by: Tong Chen <chentong@us.ibm.com> Signed-off-by: Kumarappan <kumarappan.thiyagarajan@multicorewareinc.com> Signed-off-by: Arkar-Hema <hema.bhaskar@multicorewareinc.com> Signed-off-by: Andreas Fehlner <fehlner@arcor.de> Signed-off-by: Sunny Anand <sunnyanand.979@gmail.com> Signed-off-by: logeshwaranmcw <logeshwaran.elanchelian@multicorewareinc.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Co-authored-by: Jonas Rickert <Jonas.Rickert@amd.com> Co-authored-by: Christopher Munoz <32556579+christopherlmunoz@users.noreply.github.com> Co-authored-by: Haruki Imai <imaihal@jp.ibm.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> Co-authored-by: qjivy <qiuji@iscas.ac.cn> Co-authored-by: Tong Chen <chentong@us.ibm.com> Co-authored-by: Sunny Anand <164108690+Sunny-Anand@users.noreply.github.com> Co-authored-by: kumarappan-cmyk <kumarappan.thiyagarajan@multicorewareinc.com> Co-authored-by: Arkar-Hema <hema.bhaskar@multicorewareinc.com> Co-authored-by: Andreas Fehlner <fehlner@arcor.de> Co-authored-by: logeshwaranmcw <155156061+logeshwaranmcw@users.noreply.github.com> Signed-off-by: Jonas Rickert <jonas.rickert@amd.com>
0f71648 to
e500b7d
Compare
Author
|
Targeting Xilinx/llvm-project#584 |
mgehre-amd
approved these changes
Jul 4, 2025
30b5d30 to
114e05b
Compare
114e05b to
081786c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AMD changes: Update lowering and tests for onnx->tosa conversions that are not upstream
Partial cherry-pick of f03b287
LLVM update 43d71ba (onnx#3086)
update float types, tosa, other misc changes
fix buildOnnxToTosaPaddingConstOp
fix lit tests (wip)
updte doc
use stablehlo tagged version
fixed more lit tests
fix .clang-format
fix lit (wip)
revert .clang-format change
fix lit tests
fix formatting
lit tests pass (except jni -- not tested)
manually fix formatting; can't get clang-format to do it on any of my machines
revert lit test changes unrelated to update
update llvm and stablhlo shas, misc minor updates
remove non-existent passes
lit updates (wip)
Bump Upsample to Opset 10 and change the opset versioning to allow to skip over opset versions if a newer, backwards compatible one exists. (Bump Upsample to Opset 10 and change the opset versioning to allow to skip over opset versions if a newer, backwards compatible one exists. onnx/onnx-mlir#3065)
Bump Upsample to Opset 10
This is a non-functional change, the only difference is that Upsample was marked as deprecated with Opset 10
Introduces a new built-time generated map that contains all versions of an operation as defined by onnx. To determine the opset version for a node/op:
Determine the latest valid opset version. This is the newest version in this opset-version-map that is older or equal to the current graph opset.
Select the newest version from the versions supported by onnx-mlir that is equal or newer to the latest valid opset version. This allows it to skip over opset versions, that have a newer backwards compatible version. Example:
Versions in onnx and supported by onnx-mlir: [3, 5].
Graph opset version to node version: 3 -> 3, 4 -> 3, 5 -> 5
Versions in onnx: [7, 9, 10]. Version 10 is backwards compatible to version 9.
Version supported by onnx-mlir: [7, 10].
Graph opset version to node version: 7 -> 7, 8 -> 7, 9 -> 10, 10 -> 10
Improve scripts (Improve scripts onnx/onnx-mlir#3089)
Bump various ops to opset 21, adding int4/uint4 and 8 bit float support. (Bump various ops to opset 21, adding int4/uint4 and 8 bit float support. onnx/onnx-mlir#3064)
Add support for TensorProto::UINT4/INT4
Upgrade onnx.Cast to opset 21
Bump various ops to opset 21.
These are all backwards compatibel version bumps, only adding support for int/uint4.
Bumped ops:
Flatten
Identity
If
Loop
Pad
Reshape
Scan
Shape
Size
Squeeze
Transpose
Unsqueeze
Added minimal support to do some timing of OM Runtime functionality (Added minimal support to do some timing of OM Runtime functionality onnx/onnx-mlir#3095)
adding __errno_location call for mvs (Including __errno_location call for MVS onnx/onnx-mlir#3099)
Rewriting pattern to remove WhereOp and EqualOp. (Rewriting pattern to remove WhereOp and EqualOp. onnx/onnx-mlir#3094)
Remove ONNXWhereOp and ONNXEqualOp into newly created ConcatOp.
Enable NNPA saturation by default and change the option to --nnpa-disable-saturation (Enable NNPA saturation by default and change the option to --nnpa-disable-saturation onnx/onnx-mlir#3101)
Enable NNPA saturation by default and change the option to --nnpa-disable-saturation
removing weak attribute of errorno (removing weak attribute of errorno onnx/onnx-mlir#3103)
Fix the custom build link for docs/Docker.md (Fix the custom build link for docs/Docker.md onnx/onnx-mlir#3104)
Python driver for torch model (Python driver for torch model onnx/onnx-mlir#3093)
implementation
format
test
py format
torch.compile
refine
add debug
respond
response
format
implement (Improve the c++ driver build-run-onnx-lib.sh onnx/onnx-mlir#3108)
Followups for torch model driver (Followups for torch model driver onnx/onnx-mlir#3106)
simplify
complete
fix
fix
Fix an error in ZHighConstantPropagation for QuantizedStick ([NNPA] Fix an error in ZHighConstantPropagation for QuantizedStick onnx/onnx-mlir#3112)
Add z17 for -march (Add z17 for -march onnx/onnx-mlir#3113)
done
convert
fix
format
Decompose Hardswish into simpler ONNX ops (Decompose Hardswish into simpler ONNX ops onnx/onnx-mlir#3107)
Decompose and lower Hardswish
Providing the decomposition as compile time option with krnl dialect lowering as default
Reorder relu to maxpool optimization pass in ONNX dialect (Reorder relu to maxpool optimization pass in ONNX dialect onnx/onnx-mlir#3109)
Reorder Relu and maxpool optimization
Swap Relu and maxpool only when Relu is not a consumer of conv
Move onnx.Constant before the root op when fusing onnx ops (Fix "operand #0 does not dominate this use" when fusing onnx.ConstantOp onnx/onnx-mlir#3119)
Support QLinearMatMul on CPU (Support QLinearMatMul on CPU onnx/onnx-mlir#3117)
Support QLinearMatMul on CPU
Update black-format-check.yml (Update black-format-check.yml onnx/onnx-mlir#3118)
Merge nested concat Ops optimization pass in ONNX dialect (Merge nested concat Ops optimization pass in ONNX dialect onnx/onnx-mlir#3111)
Merging nested concat ops
Enhance shape inference for ONNX Reshape (Enhance shape inference for ONNX Reshape onnx/onnx-mlir#3122)
Add a special case in shape inference for reshape
update zdnn1.1.2 (update zdnn-v1.1.2 onnx/onnx-mlir#3130)
Updating supported ops on NNPA md for z17. (Updating supported ops on NNPA md for z17. onnx/onnx-mlir#3120)
starting to update new z17 NNPA ops
fix CVE-2025-32434 (fix CVE-2025-32434 onnx/onnx-mlir#3135)
Fuse consecutive clips pattern (Fuse consecutive clips pattern onnx/onnx-mlir#3132)
Fuse consecutive clips pattern
Replace deprecated applyPatternsAndFoldGreedily with applyPatternsGreedily. This functions also folds by default, so it is an NFC
Fix clang-format
Replace bufferization::createOwnershipBasedBufferDeallocationPass with mlir::createConvertBufferizationToMemRefPass
Update onnx-to-tosa reshape lit test
Move gemm_to_fc tests to gemm_to_matmul
Change tosaBuilder::mul function signature to make clear that the shift is an int8
Disable buffer_loop_hoisting test as it gets completly optimized away
Guard against dynamic dim in result
Use resize operaton input and output type to calculate the border, instead of using the calculated numerator/denominator
Guard against linear interpolation of integer types
Add test for disallowed onnx.Resize on its with linear interpolation to tosa
Add 'Pure' annotation to some krnl ops and recreate documentation
Build stablehlo with static libs
Disable memref.prefetch since it does not work with the new bufferization
Conv add const where the constant is a scalar (Conv add const where the constant is a scalar onnx/onnx-mlir#3145)
added support for Celu op (added support for Celu op onnx/onnx-mlir#3139)
Fix some warnings related to stickification for NNPA (Fix some warnings related to stickification for NNPA onnx/onnx-mlir#3147)
Removing duplicate file (Removing duplicate file docs/SupportedONNXOps-NNPA-supplement.md onnx/onnx-mlir#3146)
migrated instance/group normalization from decompose to canonicalize (Instance and Group norm needs shape inference onnx/onnx-mlir#3148)
Fusion of Matmul add covering the stacked/unstacked/bcast1/bcast23 patterns (Fusion of Matmul add covering the stacked/unstacked/bcast1/bcast23 patterns onnx/onnx-mlir#3140)
Support --march=native (Support --march=native onnx/onnx-mlir#3134)
changes
format
linkage
lib
fix another error on s390x
lower Ub to LLVM since vector.shape_cast is lowered to UB