I've encountered a problem where replacing values in a DataFrame corrupts the original column data.
irb> df = RedAmber::DataFrame.new(val: [352, 256, 4, 0]);
irb> df.assign(val: df[:val].replace(df[:val] == 0, 1))
=>
#<RedAmber::DataFrame : 4 x 1 Vector, 0x000000000003bf60>
val
<uint8>
0 96
1 0
2 4
3 1
This happens because the column data type changes to match the input value's type. While this behavior is consistent, it is not intuitive and should match the original column's data type. How about changing this behavior?
However, as a result of changing the behavior, if the column type is uint8 and the replacement value is of type double or a large integer, the replacement data will be corrupted. It would be useful to have a method for easier data type casting or to allow specifying the data type as a keyword argument in the replace method, e.g., .replace(…, data_type: :double).
I'd appreciate any comments or ideas on this matter.
Thanks.
I've encountered a problem where replacing values in a DataFrame corrupts the original column data.
This happens because the column data type changes to match the input value's type. While this behavior is consistent, it is not intuitive and should match the original column's data type. How about changing this behavior?
However, as a result of changing the behavior, if the column type is
uint8and the replacement value is of typedoubleor a large integer, the replacement data will be corrupted. It would be useful to have a method for easier data type casting or to allow specifying the data type as a keyword argument in thereplacemethod, e.g.,.replace(…, data_type: :double).I'd appreciate any comments or ideas on this matter.
Thanks.