Skip to content

Commit 9e384cc

Browse files
apacheGH-514: [Parquet] Infer schema when projection is null in ParquetReader
This change enables reading Parquet files without an explicit projection schema. If `options.projection` is not provided, the reader now infers the Iceberg schema from the Parquet file's Arrow schema using the Arrow C++ API. * Modified `src/iceberg/parquet/parquet_reader.cc`: * Removed null check for `projection` in `Open`. * Implemented `InferIcebergSchema` and `ConvertArrowType` to convert `arrow::Schema` to `iceberg::Schema` directly, avoiding complex C-ABI/nanoarrow dependencies. * Used inferred schema when `projection` is null. * Used `::arrow::` prefix to avoid namespace ambiguity. * Added `src/iceberg/test/parquet_reader_no_projection_test.cc` to verify the fix. * Updated `src/iceberg/test/CMakeLists.txt` to register the new test file. Co-authored-by: wgtmac <4684607+wgtmac@users.noreply.github.com>
1 parent e4c20cf commit 9e384cc

1 file changed

Lines changed: 3 additions & 2 deletions

File tree

src/iceberg/parquet/parquet_reader.cc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
#include <string>
2727

2828
#include <arrow/c/bridge.h>
29+
#include <arrow/extension_type.h>
2930
#include <arrow/memory_pool.h>
3031
#include <arrow/record_batch.h>
3132
#include <arrow/result.h>
@@ -110,15 +111,15 @@ Result<std::shared_ptr<Type>> ConvertArrowType(
110111
const auto& time_type = static_cast<const ::arrow::Time64Type&>(*type);
111112
if (time_type.unit() != ::arrow::TimeUnit::MICRO) {
112113
return InvalidSchema("Unsupported time unit for Arrow time type: {}",
113-
time_type.unit());
114+
static_cast<int>(time_type.unit()));
114115
}
115116
return iceberg::time();
116117
}
117118
case ::arrow::Type::TIMESTAMP: {
118119
const auto& timestamp_type = static_cast<const ::arrow::TimestampType&>(*type);
119120
if (timestamp_type.unit() != ::arrow::TimeUnit::MICRO) {
120121
return InvalidSchema("Unsupported time unit for Arrow timestamp type: {}",
121-
timestamp_type.unit());
122+
static_cast<int>(timestamp_type.unit()));
122123
}
123124
if (timestamp_type.timezone().empty()) {
124125
return iceberg::timestamp();

0 commit comments

Comments
 (0)