|
| 1 | +# Fix Summary: Issue #105367 - Segmentation Fault with Complex Variable Operations |
| 2 | + |
| 3 | +## Issue Description |
| 4 | +A segmentation fault occurred when performing complex number operations involving: |
| 5 | +- Complex64/complex128 variables |
| 6 | +- `tf.raw_ops.Conj` operation |
| 7 | +- `Variable.assign_add()` method |
| 8 | + |
| 9 | +## Reproduction Code |
| 10 | +```python |
| 11 | +import tensorflow as tf |
| 12 | +input_data = tf.constant([1 + 2j, 3 + 4j], dtype=tf.complex64) |
| 13 | +var = tf.Variable(input_data, dtype=tf.complex64) |
| 14 | +conj_result = tf.raw_ops.Conj(input=input_data) |
| 15 | +assign_add_op = var.assign_add(conj_result) |
| 16 | +# Segmentation fault (core dumped) |
| 17 | +``` |
| 18 | + |
| 19 | +## Root Cause Analysis |
| 20 | + |
| 21 | +The segmentation fault was caused by **missing complex type support** in two critical locations: |
| 22 | + |
| 23 | +1. **GPU DenseUpdate Functor Instantiations** (`dense_update_functor_gpu.cu.cc`) |
| 24 | + - The template instantiations for `DenseUpdate<GPUDevice, T, ADD>` and `DenseUpdate<GPUDevice, T, SUB>` only included `TF_CALL_GPU_NUMBER_TYPES` and `TF_CALL_INTEGRAL_TYPES` |
| 25 | + - `TF_CALL_GPU_NUMBER_TYPES` = {half, bfloat16, float, double} - **does NOT include complex types** |
| 26 | + - `TF_CALL_COMPLEX_TYPES` = {complex64, complex128} - **was missing** |
| 27 | + |
| 28 | +2. **GPU Kernel Registrations** (`resource_variable_ops.cc`) |
| 29 | + - The GPU kernel registrations for `AssignAddVariableOp` and `AssignSubVariableOp` similarly only included `TF_CALL_GPU_NUMBER_TYPES` and `TF_CALL_INTEGRAL_TYPES_NO_INT32` |
| 30 | + - Complex types were not registered for GPU execution |
| 31 | + |
| 32 | +When users attempted to use `assign_add` on complex variables (especially after operations like `tf.raw_ops.Conj`), the kernel was not properly instantiated for complex types on GPU, leading to undefined behavior and segmentation faults. |
| 33 | + |
| 34 | +## Solution |
| 35 | + |
| 36 | +### Files Modified |
| 37 | + |
| 38 | +1. **tensorflow/core/kernels/dense_update_functor_gpu.cu.cc** |
| 39 | + ```cpp |
| 40 | + // Added complex type support |
| 41 | + #define DEFINE_GPU_KERNELS(T) \ |
| 42 | + template struct functor::DenseUpdate<GPUDevice, T, ADD>; \ |
| 43 | + template struct functor::DenseUpdate<GPUDevice, T, SUB>; |
| 44 | + TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_KERNELS); |
| 45 | + TF_CALL_INTEGRAL_TYPES(DEFINE_GPU_KERNELS); |
| 46 | + TF_CALL_COMPLEX_TYPES(DEFINE_GPU_KERNELS); // <-- ADDED |
| 47 | + TF_CALL_float8_e5m2(DEFINE_GPU_KERNELS); |
| 48 | + TF_CALL_float8_e4m3fn(DEFINE_GPU_KERNELS); |
| 49 | + ``` |
| 50 | +
|
| 51 | +2. **tensorflow/core/kernels/resource_variable_ops.cc** |
| 52 | + ```cpp |
| 53 | + // Added complex type support to GPU kernel registrations |
| 54 | + TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS); |
| 55 | + TF_CALL_INTEGRAL_TYPES_NO_INT32(REGISTER_GPU_KERNELS); |
| 56 | + TF_CALL_COMPLEX_TYPES(REGISTER_GPU_KERNELS); // <-- ADDED |
| 57 | + ``` |
| 58 | + |
| 59 | +3. **tensorflow/python/kernel_tests/variables/resource_variable_ops_test.py** |
| 60 | + - Added `testComplexVariableAssignAddWithConj()` - Tests GPU execution with Conj operation |
| 61 | + - Added `testComplexVariableAssignAddCPU()` - Tests CPU execution with complex types |
| 62 | + - Both tests cover complex64 and complex128 data types |
| 63 | + |
| 64 | +## Testing |
| 65 | + |
| 66 | +The fix has been validated with: |
| 67 | +- ✅ The original reproduction case from issue #105367 |
| 68 | +- ✅ New unit tests covering both complex64 and complex128 types |
| 69 | +- ✅ Tests for both CPU and GPU execution paths |
| 70 | +- ✅ Tests with `tf.raw_ops.Conj` operation combined with `assign_add` |
| 71 | + |
| 72 | +## Impact |
| 73 | + |
| 74 | +This fix enables: |
| 75 | +- Proper support for complex number arithmetic in resource variables on GPU |
| 76 | +- Safe usage of `assign_add` and `assign_sub` with complex variables |
| 77 | +- Compatibility with operations that produce complex results (like `Conj`, `FFT`, etc.) |
| 78 | + |
| 79 | +## Pull Request |
| 80 | + |
| 81 | +- **Branch**: `fix-complex-variable-conj-segfault` |
| 82 | +- **PR URL**: https://github.com/CodersAcademy006/tensorflow/pull/9 |
| 83 | +- **Fixes**: #105367 |
| 84 | + |
| 85 | +## Technical Details |
| 86 | + |
| 87 | +### Type Macro Definitions |
| 88 | +- `TF_CALL_GPU_NUMBER_TYPES`: half, bfloat16, float, double |
| 89 | +- `TF_CALL_COMPLEX_TYPES`: complex64, complex128 |
| 90 | +- `TF_CALL_NUMBER_TYPES`: TF_CALL_REAL_NUMBER_TYPES + TF_CALL_COMPLEX_TYPES |
| 91 | + |
| 92 | +### Why This Worked on CPU but Failed on GPU |
| 93 | +- CPU implementations use generic templates defined in header files |
| 94 | +- GPU implementations require explicit template instantiations in `.cu.cc` files |
| 95 | +- CPU kernel registrations already included `TF_CALL_NUMBER_TYPES` (which includes complex types) |
| 96 | +- GPU kernel registrations only included `TF_CALL_GPU_NUMBER_TYPES` (which excludes complex types) |
| 97 | + |
| 98 | +This asymmetry caused the issue to only manifest on GPU execution paths. |
0 commit comments