[XPU] Fix RuntimeError in addcmul when tensor2 is a CPU scalar tensor#3608
Conversation
|
@copilot fix the ci failue |
Fixed in 27ec115. Three lint issues were addressed:
|
|
@copilot fix lint error |
|
@copilot how cuda process this cpu input? whether this fix is aligned with cuda? |
The XPU fix is aligned with CUDA. Here's how CUDA handles the CPU scalar case (in
Both implementations:
The XPU fix matches the CUDA pattern exactly — the only structural difference is that CUDA extracts the logic into a named helper function while XPU uses captured-value functors, which is consistent with the rest of the XPU kernel style. |
This comment was marked as duplicate.
This comment was marked as duplicate.
|
@copilot whether the ci fails is related to the change of this pr? |
The CI failures are not related to this PR's changes. The 21 new failures fall into these categories:
None of these touch pointwise ops or the addcmul kernel. Our changes are limited to |
Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/2eb174d7-463e-49b6-80b8-5c3ef0029f6d Co-authored-by: laifenxiawucha <184592536+laifenxiawucha@users.noreply.github.com>
Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/2eb174d7-463e-49b6-80b8-5c3ef0029f6d Co-authored-by: laifenxiawucha <184592536+laifenxiawucha@users.noreply.github.com>
…g-format fix Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/c1669bef-ca65-42e3-bfa5-3a3de8924609 Co-authored-by: Stonepia <12094956+Stonepia@users.noreply.github.com>
Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/c0ea05f8-8757-47a9-a0cb-d08cfd8a6b0b Co-authored-by: Stonepia <12094956+Stonepia@users.noreply.github.com>
ac40dfd to
81cafd9
Compare
|
@copilot New UT failures detected in op_ut tests (6 total). Please check these new failures and analyze whether they are caused by the PR changes. Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status. Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26208134635 New failures:
... and 1 more failures. See full list in artifact. Instructions:
|
None of the 6 new failures are caused by this PR. Our changes are limited to two files:
The failing tests touch entirely different areas:
These failures are pre-existing issues or regressions from other commits merged into the branch (e.g. |
torch.addcmulon XPU raisedRuntimeError: iter.device(arg).is_xpu()whenevertensor2was a CPU scalar tensor, becausegpu_kernelasserts all operands are on XPU with no fallback for CPU scalars.Changes
src/ATen/native/xpu/sycl/PointwiseOpsKernels.cppAddcmulCpuScalarFunctorandAddcmulComplexCpuScalarFunctor: 2-argument functors that capture the CPU scalar value oftensor2as a compile-time constant, eliminating it from the iterator before dispatch.addcmul_kernel, detectiter.is_cpu_scalar(3), extract the scalar viaiter.scalar_value<>, calliter.remove_operand(3), then dispatch the 2-arg functor — matching the pattern used byopmath_gpu_kernel_with_scalarsfor binary CPU scalar handling.AT_DISPATCH_ALL_TYPES_AND2(not_COMPLEX) in the non-complex CPU-scalar branch, since complex types are already handled by the precedingifblock.test/repro/test_addcmul_cpu_scalar.py: Reproducer covering all affected dtypes (float32,float64,complex64,complex128,int8,int16,int32,int64,uint8).Test:
test/xpu/test_torch_xpu.py::TestTorchDeviceTypeXPU::test_addcmul_use_cpu_scalar_True_xpu_*