[SPARK-57695][CORE] Make TestUtils.recursiveList null-safe by reusing Utils.recursiveList#56901
Closed
LuciferYang wants to merge 1 commit into
Closed
[SPARK-57695][CORE] Make TestUtils.recursiveList null-safe by reusing Utils.recursiveList#56901LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
… Utils.recursiveList TestUtils.recursiveList duplicated the directory walk already provided by Utils.recursiveList (SparkFileUtils.recursiveList) and carried the same issues: it called File.listFiles without a null check and offered no linear-time guarantee. Delegate to Utils.recursiveList instead of keeping a separate copy, so the helper picks up the null-safety and O(n) traversal from SPARK-57530. Follow-up to SPARK-57530; should land after it.
MaxGekk
approved these changes
Jun 30, 2026
MaxGekk
left a comment
Member
There was a problem hiding this comment.
0 blocking, 0 non-blocking, 0 nits.
Clean dedup — delegating TestUtils.recursiveList to the SPARK-57530-fixed Utils.recursiveList is behavior-preserving. LGTM.
Verification
- Identical result set + precondition: both return every entry (files and subdirectories) at every level, once;
require(f.isDirectory)is preserved by the delegate, so the throw on a non-directory argument is unchanged. - The only delta — DFS → BFS order — is unobservable for every current caller: the order-sensitive ones filter to a single element before
.head(JsonSuiteviacoalesce(1);PartitionedWriteSuiteassertsfiles.length == 1first;FileMetadataStructSuitereads the single.parquet);CSVSuite/TextSuitemirrorJsonSuite; the rest use.count/.filter/ set ops. - Null-safety (unreadable dir → warn-and-skip instead of NPE, from SPARK-57530) is the intended improvement and isn't reached by the readable temp-dir test callers.
- The delegate has dedicated coverage in
SparkFileUtilsSuite; no new test needed for a pure delegation.
Member
|
The failed CI is not related to the PR, I believe. +1, LGTM. Merging to master/4.x. |
MaxGekk
pushed a commit
that referenced
this pull request
Jun 30, 2026
… Utils.recursiveList
### What changes were proposed in this pull request?
`TestUtils.recursiveList` duplicated the recursive directory walk already provided by `Utils.recursiveList` (i.e. `SparkFileUtils.recursiveList`). This PR removes the duplicated copy and delegates to `Utils.recursiveList`:
```scala
// before
def recursiveList(f: File): Array[File] = {
require(f.isDirectory)
val current = f.listFiles
current ++ current.filter(_.isDirectory).flatMap(recursiveList)
}
// after
def recursiveList(f: File): Array[File] = Utils.recursiveList(f)
```
### Why are the changes needed?
The duplicated implementation carried the same two issues that were just fixed in `SparkFileUtils.recursiveList` under [SPARK-57530](https://issues.apache.org/jira/browse/SPARK-57530):
1. It called `File.listFiles` without a null check, so an IO error (or the directory being removed mid-walk) would throw an NPE.
2. The `current ++ ... .flatMap(...)` form had no linear-time guarantee.
By delegating to `Utils.recursiveList`, `TestUtils.recursiveList` automatically picks up the null-safety (a directory that cannot be listed is skipped with a warning instead of throwing) and the O(n) traversal from SPARK-57530, and the duplicated logic is removed.
This is a follow-up to SPARK-57530, which is already merged to master.
### Does this PR introduce _any_ user-facing change?
No. `TestUtils` is a test-only `private[spark]` helper; this is an internal refactor with no behavior change for any successful directory walk.
### How was this patch tested?
Existing tests that use `TestUtils.recursiveList` continue to exercise it. The behavioral contract for a readable directory tree is unchanged (same set of files returned); the only difference is that an unreadable directory is now skipped with a warning rather than throwing, matching `Utils.recursiveList`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code
Closes #56901 from LuciferYang/SPARK-57695-testutils-recursivelist.
Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit a4ce62b)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
TestUtils.recursiveListduplicated the recursive directory walk already provided byUtils.recursiveList(i.e.SparkFileUtils.recursiveList). This PR removes the duplicated copy and delegates toUtils.recursiveList:Why are the changes needed?
The duplicated implementation carried the same two issues that were just fixed in
SparkFileUtils.recursiveListunder SPARK-57530:File.listFileswithout a null check, so an IO error (or the directory being removed mid-walk) would throw an NPE.current ++ ... .flatMap(...)form had no linear-time guarantee.By delegating to
Utils.recursiveList,TestUtils.recursiveListautomatically picks up the null-safety (a directory that cannot be listed is skipped with a warning instead of throwing) and the O(n) traversal from SPARK-57530, and the duplicated logic is removed.This is a follow-up to SPARK-57530, which is already merged to master.
Does this PR introduce any user-facing change?
No.
TestUtilsis a test-onlyprivate[spark]helper; this is an internal refactor with no behavior change for any successful directory walk.How was this patch tested?
Existing tests that use
TestUtils.recursiveListcontinue to exercise it. The behavioral contract for a readable directory tree is unchanged (same set of files returned); the only difference is that an unreadable directory is now skipped with a warning rather than throwing, matchingUtils.recursiveList.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code