Skip to content

Commit 570d95c

Browse files
authored
Improve usage of polymorphism in columns (rapidsai#21030)
This PR reduces the amount of ColumnBase subclass-specific specialization present in method implementations in ColumnBase itself. It also completes some improvements around `__cuda_array_interface__` and mask setting. These improvements will as we work to establish a cleaner model for the ColumnBase<->pylibcudf.Column interop. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#21030
1 parent 3cbc9bb commit 570d95c

6 files changed

Lines changed: 252 additions & 181 deletions

File tree

python/cudf/cudf/core/column/categorical.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
DtypeObj,
3939
ScalarLike,
4040
)
41+
from cudf.core.buffer import Buffer
4142
from cudf.core.column import (
4243
ColumnBase,
4344
DatetimeColumn,
@@ -626,6 +627,23 @@ def copy(self, deep: bool = True) -> Self:
626627
def memory_usage(self) -> int:
627628
return self.categories.memory_usage + self.codes.memory_usage
628629

630+
@classmethod
631+
def _deserialize_plc_column(
632+
cls,
633+
header: dict,
634+
dtype: DtypeObj,
635+
data: Buffer | None,
636+
mask: Buffer | None,
637+
children: list[ColumnBase],
638+
) -> plc.Column:
639+
"""Construct plc.Column from codes child for categorical columns.
640+
641+
Categorical columns store data as integer codes referencing categories.
642+
The plc_column must be constructed from the codes child column, which
643+
contains the actual integer data, rather than from the data/mask buffers.
644+
"""
645+
return children.pop(0).plc_column
646+
629647
@staticmethod
630648
def _concat(
631649
objs: MutableSequence[CategoricalColumn],

0 commit comments

Comments
 (0)