Skip to content

[SPARK-57695][CORE] Make TestUtils.recursiveList null-safe by reusing Utils.recursiveList#56901

Closed
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:SPARK-57695-testutils-recursivelist
Closed

[SPARK-57695][CORE] Make TestUtils.recursiveList null-safe by reusing Utils.recursiveList#56901
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:SPARK-57695-testutils-recursivelist

Conversation

@LuciferYang

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

TestUtils.recursiveList duplicated the recursive directory walk already provided by Utils.recursiveList (i.e. SparkFileUtils.recursiveList). This PR removes the duplicated copy and delegates to Utils.recursiveList:

// before
def recursiveList(f: File): Array[File] = {
  require(f.isDirectory)
  val current = f.listFiles
  current ++ current.filter(_.isDirectory).flatMap(recursiveList)
}

// after
def recursiveList(f: File): Array[File] = Utils.recursiveList(f)

Why are the changes needed?

The duplicated implementation carried the same two issues that were just fixed in SparkFileUtils.recursiveList under SPARK-57530:

  1. It called File.listFiles without a null check, so an IO error (or the directory being removed mid-walk) would throw an NPE.
  2. The current ++ ... .flatMap(...) form had no linear-time guarantee.

By delegating to Utils.recursiveList, TestUtils.recursiveList automatically picks up the null-safety (a directory that cannot be listed is skipped with a warning instead of throwing) and the O(n) traversal from SPARK-57530, and the duplicated logic is removed.

This is a follow-up to SPARK-57530, which is already merged to master.

Does this PR introduce any user-facing change?

No. TestUtils is a test-only private[spark] helper; this is an internal refactor with no behavior change for any successful directory walk.

How was this patch tested?

Existing tests that use TestUtils.recursiveList continue to exercise it. The behavioral contract for a readable directory tree is unchanged (same set of files returned); the only difference is that an unreadable directory is now skipped with a warning rather than throwing, matching Utils.recursiveList.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

… Utils.recursiveList

TestUtils.recursiveList duplicated the directory walk already provided by
Utils.recursiveList (SparkFileUtils.recursiveList) and carried the same issues:
it called File.listFiles without a null check and offered no linear-time
guarantee. Delegate to Utils.recursiveList instead of keeping a separate copy,
so the helper picks up the null-safety and O(n) traversal from SPARK-57530.

Follow-up to SPARK-57530; should land after it.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 0 non-blocking, 0 nits.
Clean dedup — delegating TestUtils.recursiveList to the SPARK-57530-fixed Utils.recursiveList is behavior-preserving. LGTM.

Verification

  • Identical result set + precondition: both return every entry (files and subdirectories) at every level, once; require(f.isDirectory) is preserved by the delegate, so the throw on a non-directory argument is unchanged.
  • The only delta — DFS → BFS order — is unobservable for every current caller: the order-sensitive ones filter to a single element before .head (JsonSuite via coalesce(1); PartitionedWriteSuite asserts files.length == 1 first; FileMetadataStructSuite reads the single .parquet); CSVSuite/TextSuite mirror JsonSuite; the rest use .count / .filter / set ops.
  • Null-safety (unreadable dir → warn-and-skip instead of NPE, from SPARK-57530) is the intended improvement and isn't reached by the readable temp-dir test callers.
  • The delegate has dedicated coverage in SparkFileUtilsSuite; no new test needed for a pure delegation.

@MaxGekk

MaxGekk commented Jun 30, 2026

Copy link
Copy Markdown
Member

The failed CI is not related to the PR, I believe.

+1, LGTM. Merging to master/4.x.
Thank you, @LuciferYang.

@MaxGekk MaxGekk closed this in a4ce62b Jun 30, 2026
MaxGekk pushed a commit that referenced this pull request Jun 30, 2026
… Utils.recursiveList

### What changes were proposed in this pull request?

`TestUtils.recursiveList` duplicated the recursive directory walk already provided by `Utils.recursiveList` (i.e. `SparkFileUtils.recursiveList`). This PR removes the duplicated copy and delegates to `Utils.recursiveList`:

```scala
// before
def recursiveList(f: File): Array[File] = {
  require(f.isDirectory)
  val current = f.listFiles
  current ++ current.filter(_.isDirectory).flatMap(recursiveList)
}

// after
def recursiveList(f: File): Array[File] = Utils.recursiveList(f)
```

### Why are the changes needed?

The duplicated implementation carried the same two issues that were just fixed in `SparkFileUtils.recursiveList` under [SPARK-57530](https://issues.apache.org/jira/browse/SPARK-57530):

1. It called `File.listFiles` without a null check, so an IO error (or the directory being removed mid-walk) would throw an NPE.
2. The `current ++ ... .flatMap(...)` form had no linear-time guarantee.

By delegating to `Utils.recursiveList`, `TestUtils.recursiveList` automatically picks up the null-safety (a directory that cannot be listed is skipped with a warning instead of throwing) and the O(n) traversal from SPARK-57530, and the duplicated logic is removed.

This is a follow-up to SPARK-57530, which is already merged to master.

### Does this PR introduce _any_ user-facing change?

No. `TestUtils` is a test-only `private[spark]` helper; this is an internal refactor with no behavior change for any successful directory walk.

### How was this patch tested?

Existing tests that use `TestUtils.recursiveList` continue to exercise it. The behavioral contract for a readable directory tree is unchanged (same set of files returned); the only difference is that an unreadable directory is now skipped with a warning rather than throwing, matching `Utils.recursiveList`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Closes #56901 from LuciferYang/SPARK-57695-testutils-recursivelist.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit a4ce62b)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants