Skip to content

Commit 96321e6

Browse files
committed
feat: add StructLikeSet for deduplicating StructLike rows
Adds StructLikeSet<bool kValidate = true>, a hash set for StructLike rows backed by an internal arena allocator. Key design points: - Deep-copies inserted rows into a monotonic_buffer_resource arena; string data and nested struct/list/map scalars are fully materialized so the set owns its memory independently of the caller - Transparent heterogeneous lookup: Contains() does not allocate a temporary key - Hash and equality semantics match the Java reference implementation (String.hashCode, StructLikeHash, ListHash; float/double use canonical NaN bits and distinguish ±0.0) - Schema validation (field count + scalar type) on Insert/Contains; can be disabled via kValidate=false (UncheckedStructLikeSet) when the caller guarantees conformance - Internal Arena wrapper containers use std::pmr::vector
1 parent 0596ef5 commit 96321e6

File tree

7 files changed

+1124
-0
lines changed

7 files changed

+1124
-0
lines changed

src/iceberg/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ set(ICEBERG_SOURCES
109109
util/property_util.cc
110110
util/snapshot_util.cc
111111
util/string_util.cc
112+
util/struct_like_set.cc
112113
util/temporal_util.cc
113114
util/timepoint.cc
114115
util/transform_util.cc

src/iceberg/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ iceberg_sources = files(
127127
'util/property_util.cc',
128128
'util/snapshot_util.cc',
129129
'util/string_util.cc',
130+
'util/struct_like_set.cc',
130131
'util/temporal_util.cc',
131132
'util/timepoint.cc',
132133
'util/transform_util.cc',

src/iceberg/test/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ add_iceberg_test(util_test
116116
formatter_test.cc
117117
location_util_test.cc
118118
string_util_test.cc
119+
struct_like_set_test.cc
119120
transform_util_test.cc
120121
truncate_util_test.cc
121122
url_encoder_test.cc

src/iceberg/test/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ iceberg_tests = {
9191
'formatter_test.cc',
9292
'location_util_test.cc',
9393
'string_util_test.cc',
94+
'struct_like_set_test.cc',
9495
'transform_util_test.cc',
9596
'truncate_util_test.cc',
9697
'url_encoder_test.cc',

0 commit comments

Comments
 (0)