Use fp32 accumulation in SkipLayerNorm/EmbedLayerNorm CUDA kernels#28682
Open
tianleiwu wants to merge 4 commits into
Open
Use fp32 accumulation in SkipLayerNorm/EmbedLayerNorm CUDA kernels#28682tianleiwu wants to merge 4 commits into
tianleiwu wants to merge 4 commits into