Skip to content

Commit 3f4f55d

Browse files
committed
[ml] Fix out-of-bounds read in filtered cluster split
In RClusterLoader::LoadTrainingClusterInto the boundary entry was read from rdfEntries unconditionally before checking valCount, so when one side of the split is empty the access was past end-of-vector. This was silent in non-hardened builds but trips libstdc++ assertions (e.g. with libcxxhardeningfast), aborting test09_filtered_last_chunk. Compute the boundary only when the corresponding rdfEntries index is in-bounds, falling back to the cluster endpoint otherwise. (cherry picked from commit 6183bad)
1 parent 428633d commit 3f4f55d

1 file changed

Lines changed: 14 additions & 4 deletions

File tree

tree/ml/inc/ROOT/ML/RClusterLoader.hxx

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -410,10 +410,20 @@ public:
410410
trainIsPrefix = coin(g);
411411
}
412412

413-
// The boundary is the raw entry index of the first entry assigned to validation.
414-
// Stable across epochs since the same filter always produces the same ordered entries.
415-
const std::uint64_t trainBoundaryEntry = trainIsPrefix ? rdfEntries[trainCount] : rdfEntries[valCount];
416-
const std::uint64_t boundary = (valCount > 0) ? trainBoundaryEntry : endRow;
413+
// The boundary is the raw entry index that splits train and val sub-ranges within the
414+
// cluster. Stable across epochs since the same filter always produces the same ordered
415+
// entries. When one side has no filtered entries we fall back to the cluster endpoint that
416+
// collapses that side to an empty range, avoiding an out-of-bounds access into rdfEntries
417+
// (whose size is totalFiltered, so rdfEntries[totalFiltered] is OOB and trips libstdc++
418+
// hardened-mode assertions).
419+
std::uint64_t boundary;
420+
if (trainIsPrefix) {
421+
// train = [startRow, boundary), val = [boundary, endRow)
422+
boundary = (trainCount < totalFiltered) ? rdfEntries[trainCount] : endRow;
423+
} else {
424+
// train = [boundary, endRow), val = [startRow, boundary)
425+
boundary = (valCount < totalFiltered) ? rdfEntries[valCount] : endRow;
426+
}
417427

418428
const std::uint64_t trainStart = trainIsPrefix ? startRow : boundary;
419429
const std::uint64_t trainEnd = trainIsPrefix ? boundary : endRow;

0 commit comments

Comments
 (0)