Skip to content

Commit 1c7e71e

Browse files
authored
[fix](scan) Fix OOB crash in partition column generation for Iceberg/Paimon tables (#62177)
### What problem does this PR solve? Problem Summary: ``` 3# raise at ../sysdeps/posix/raise.c:27 4# abort at ./stdlib/abort.c:81 5# 0x0000556DDAC000A1 in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be 6# std::vector, std::allocator >, std::allocator, std::allocator > > >::operator[](unsigned long) const at /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_vector.h:1282 7# doris::FileScanner::_generate_partition_columns() at ./be/build_ASAN/../src/exec/scan/file_scanner.cpp:1653 8# doris::FileScanner::_get_next_reader() at ./be/build_ASAN/../src/exec/scan/file_scanner.cpp:957 9# doris::FileScanner::_get_block_wrapped(doris::RuntimeState*, doris::Block*, bool*) at ./be/build_ASAN/../src/exec/scan/file_scanner.cpp:439 10# doris::FileScanner::_get_block_impl(doris::RuntimeState*, doris::Block*, bool*) at ./be/build_ASAN/../src/exec/scan/file_scanner.cpp:403 11# doris::Scanner::get_block(doris::RuntimeState*, doris::Block*, bool*) at ./be/build_ASAN/../src/exec/scan/scanner.cpp:143 12# doris::Scanner::get_block_after_projects(doris::RuntimeState*, doris::Block*, bool*) at ./be/build_ASAN/../src/exec/scan/scanner.cpp:119 13# doris::ScannerScheduler::_scanner_scan(std::shared_ptr, std::shared_ptr) at ./be/build_ASAN/../src/exec/scan/scanner_scheduler.cpp:179 14# doris::ScannerScheduler::submit(std::shared_ptr, std::shared_ptr)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const at ./be/build_ASAN/../src/exec/scan/scanner_scheduler.cpp:76 ``` ### Release note BE crashes with SIGABRT (vector out-of-bounds) in `FileScanner::_generate_partition_columns()` when scanning Iceberg or Paimon partitioned tables. Root cause: `_partition_slot_index_map` is built once from the first scan range's `columns_from_path_keys`, but different ranges may have different `columns_from_path` sizes due to: 1. **Iceberg partition evolution**: Tables evolving from non-identity transforms (e.g., `day(ts)`) to identity transforms (`ts`). `IcebergUtils.getPartitionInfoMap()` returns null for non-identity transforms, so `setIcebergParams()` leaves `columnsFromPath` as an empty list for those ranges while populating it for identity-transform ranges. 2. **Paimon mixed reader paths**: Native reader splits (Parquet/ORC) call `setPaimonPartitionValues(partitionInfoMap)`, but JNI scanner splits (for merge-required or unsupported-format data) skip this call entirely, leaving `paimonPartitionValues` as null. The root cause is that `createFileRangeDesc()` unconditionally called `setColumnsFromPath([])` for all ranges, setting the Thrift `__isset` flag to true even with an empty list. When BE sees `__isset=true`, it enters the partition column loop and crashes on out-of-bounds access for ranges that were never populated with actual partition values. Fix: - **FileQueryScanNode.createFileRangeDesc()**: Only call `setColumnsFromPath()` and `setColumnsFromPathKeys()` when `columnsFromPathKeys` is non-empty. This keeps `__isset=false` for Iceberg/Paimon ranges that have no path-derived partition keys, so BE skips partition column generation entirely for those ranges. Ranges that do have partition values (set later by `setIcebergParams`/`setPaimonParams`) work correctly since they explicitly call `setColumnsFromPath()` with actual values. - **PaimonScanNode**: Additionally set partition values on JNI scanner splits (previously missing), enabling runtime filter partition pruning for JNI splits. Fix a BE crash (SIGABRT) that could occur when querying Iceberg tables with partition evolution or Paimon tables with mixed-format data splits.
1 parent 9125b17 commit 1c7e71e

2 files changed

Lines changed: 7 additions & 3 deletions

File tree

fe/fe-core/src/main/java/org/apache/doris/datasource/FileQueryScanNode.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -539,8 +539,10 @@ private TFileRangeDesc createFileRangeDesc(FileSplit fileSplit, List<String> col
539539
// fileSize only be used when format is orc or parquet and TFileType is broker
540540
// When TFileType is other type, it is not necessary
541541
rangeDesc.setFileSize(fileSplit.getFileLength());
542-
rangeDesc.setColumnsFromPath(columnsFromPath);
543-
rangeDesc.setColumnsFromPathKeys(columnsFromPathKeys);
542+
if (!columnsFromPathKeys.isEmpty()) {
543+
rangeDesc.setColumnsFromPath(columnsFromPath);
544+
rangeDesc.setColumnsFromPathKeys(columnsFromPathKeys);
545+
}
544546

545547
rangeDesc.setFileType(fileSplit.getLocationType());
546548
rangeDesc.setPath(fileSplit.getPath().toStorageLocation().toString());

fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,9 @@ public List<Split> getSplits(int numBackends) throws UserException {
443443
if (ignoreSplitType == SessionVariable.IgnoreSplitType.IGNORE_JNI) {
444444
continue;
445445
}
446-
splits.add(new PaimonSplit(dataSplit));
446+
PaimonSplit jniSplit = new PaimonSplit(dataSplit);
447+
jniSplit.setPaimonPartitionValues(partitionInfoMap);
448+
splits.add(jniSplit);
447449
++paimonSplitNum;
448450
}
449451

0 commit comments

Comments
 (0)