feat: add find field (by id and name) support to schema#180
feat: add find field (by id and name) support to schema#180Xuanwo merged 14 commits intoapache:mainfrom
Conversation
nullccxsy
commented
Aug 15, 2025
- add insensitive way to find schemafield(list, struct, map)
- change the complexity of find name to O(1)
- test insensitive way to find schemafield(list, struct, map)
adc4817 to
ec73749
Compare
3b0653a to
ae4515e
Compare
0394c01 to
ef4b2de
Compare
src/iceberg/schema.cc
Outdated
| NametoIdVisitor::GetPath(path, std::string(field.name()), case_sensitive_); | ||
| short_full_path = GetPath(short_path, std::string(field.name()), case_sensitive_); | ||
| name_to_id_[full_path] = field.field_id(); | ||
| name_to_id_.emplace(short_full_path, field.field_id()); |
There was a problem hiding this comment.
Check the return value and error out if failed.
There was a problem hiding this comment.
If it fails, it means that the short name conflicts with the canonical name. In this case, the short name is discarded, so don't need to check the return value.
8319453 to
8ba65bf
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Thanks for the update! I think we are close to the finish line.
|
Please rename to title to |
|
Please note that the java impl has |
src/iceberg/schema.h
Outdated
| Result<Status> InitIdToIndexMap() const; | ||
| Result<Status> InitNameToIndexMap() const; | ||
| Result<Status> InitLowerCaseNameToIndexMap() const; |
There was a problem hiding this comment.
Are they "Ensure" rather than Init?
There was a problem hiding this comment.
May name it LazyIdToIndexMap ?
There was a problem hiding this comment.
The purpose here is not "Init" ( Maybe reinitialize ) Mapping, it do Ensure the mapping to be "initialized", I guess EnusreIdToIndexMap() is ok to me.
Generally, LazyIdToIndexMap seems to be another api like Result<const map<...>&> LazyIdToIndexMap
mapleFU
left a comment
There was a problem hiding this comment.
Personally I feel a bit afraid about lazy initialize mapping, since it would be thread-unsafetly. But maybe it's not a severe problem here.
For lazy initialization, we may leverage |
So this requires a mutex and three once_flag? It might be huge but it's ok to me... |
ok, this will be fixed |
We can support this is another patch, currently we can add a todo here. |
0f85748 to
f0dbff4
Compare
d5f8d92 to
4df8eb7
Compare
- Add GetFieldByName method to support nested field queries with dot notation - Implement InitNameToIndexMap and InitIdToIndexMap for field mapping - Introduce SchemaFieldVisitor for schema traversal - Implement name pruning logic for element and value in nested structure
- Introduce IdVisitor for schema traversal to init id_to_index_(Map) - Introduce NameVisitor for schema traversal to init name_to_index_(Map), lowercase_name_to_index_(map)
…ava impl 1. Refactor IdVisitor: Add VisitNested() to handle NestedType, make Visit() generic for all Type hierarchy (avoid anti-pattern) 2. Merge id_to_index_ + full_schemafield_ into id_to_field_ (std::unordered_map<int32_t, const ref<SchemaField>>) to reduce memory footprint 3. Align with Java impl's idToField design (support future alias/identifier fields without refactor)
- change the type of id from size_t to int_32 - rename GetPath to BuildPath - fix bug new_short_path should build from short_path, not path
…sult call in macros - Added logic to detect and handle duplicate names in NameToIdVisitor to prevent invalid schema errors. - Fixed duplicate result call issue in src/iceberg/util/macros.h by optimizing ICEBERG_RETURN_UNEXPECTED macro.
… conversion function - Implemented detection for duplicate field IDs, Names in visitors, returning kInvalidSchema error with message. - Updated lowercase conversion in NameToIdVisitor
…reduce copies and unify initialization - Switched field_ and schema_ members in test classes to std::unique_ptr for unique ownership and to avoid duplicate and avoid creating duplicate objects.
… std::string creation
17bcac2 to
9aa221d
Compare
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this!