Skip to content

Commit 373fbaa

Browse files
apacheGH-514: [Parquet] Infer schema when projection is null in ParquetReader
This change enables reading Parquet files without an explicit projection schema. If `options.projection` is not provided, the reader now infers the Iceberg schema from the Parquet file's Arrow schema. * Modified `src/iceberg/parquet/parquet_reader.cc` to remove the null check for `projection` and add schema inference logic using `FromArrowSchema`. * Added `src/iceberg/test/parquet_reader_no_projection_test.cc` to verify the fix. * Updated `src/iceberg/CMakeLists.txt` to include `arrow_c_data_guard_internal.cc` and `schema_internal.cc` in the bundle target to resolve linker errors. Co-authored-by: wgtmac <4684607+wgtmac@users.noreply.github.com>
1 parent 2aeb1ae commit 373fbaa

2 files changed

Lines changed: 6 additions & 2 deletions

File tree

src/iceberg/CMakeLists.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ if(ICEBERG_BUILD_BUNDLE)
173173
set(ICEBERG_BUNDLE_SOURCES
174174
arrow/arrow_fs_file_io.cc
175175
arrow/metadata_column_util.cc
176+
arrow_c_data_guard_internal.cc
176177
avro/avro_data_util.cc
177178
avro/avro_direct_decoder.cc
178179
avro/avro_direct_encoder.cc
@@ -185,7 +186,8 @@ if(ICEBERG_BUILD_BUNDLE)
185186
parquet/parquet_reader.cc
186187
parquet/parquet_register.cc
187188
parquet/parquet_schema_util.cc
188-
parquet/parquet_writer.cc)
189+
parquet/parquet_writer.cc
190+
schema_internal.cc)
189191

190192
# Libraries to link with exported libiceberg_bundle.{so,a}.
191193
set(ICEBERG_BUNDLE_STATIC_BUILD_INTERFACE_LIBS)

src/iceberg/test/parquet_reader_no_projection_test.cc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020
#include <arrow/array.h>
2121
#include <arrow/c/bridge.h>
2222
#include <arrow/json/from_string.h>
23+
#include <arrow/record_batch.h>
24+
#include <arrow/table.h>
2325
#include <arrow/type.h>
2426
#include <gtest/gtest.h>
2527

@@ -28,6 +30,7 @@
2830
#include "iceberg/file_reader.h"
2931
#include "iceberg/file_writer.h"
3032
#include "iceberg/parquet/parquet_register.h"
33+
#include "iceberg/schema.h"
3134
#include "iceberg/schema_internal.h"
3235
#include "iceberg/test/matchers.h"
3336
#include "iceberg/type.h"
@@ -117,7 +120,6 @@ TEST_F(ParquetReaderNoProjectionTest, ReadWithoutProjection) {
117120
auto reader_result = ReaderFactoryRegistry::Open(
118121
FileFormatType::kParquet, {.path = temp_parquet_file_, .io = file_io_});
119122

120-
// This is expected to fail currently
121123
ASSERT_THAT(reader_result, IsOk())
122124
<< "Failed to create reader: " << reader_result.error().message;
123125
auto reader = std::move(reader_result.value());

0 commit comments

Comments
 (0)