Skip to content

[SPARK-57525][CONNECT][4.1] Declarative Pipelines should not throw NoSuchElementException when a run fails without an attached cause#56900

Open
LuciferYang wants to merge 1 commit into
apache:branch-4.1from
LuciferYang:SPARK-57525-4.1
Open

[SPARK-57525][CONNECT][4.1] Declarative Pipelines should not throw NoSuchElementException when a run fails without an attached cause#56900
LuciferYang wants to merge 1 commit into
apache:branch-4.1from
LuciferYang:SPARK-57525-4.1

Conversation

@LuciferYang

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

PipelinesHandler.startRun rethrows a failed pipeline run to the Spark Connect client via runFailureEvent.foreach { event => throw event.error.get }. But event.error is None for run termination reasons that carry no cause - UnexpectedRunFailure and FailureStoppingFlow both have cause = None - so event.error.get raised a NoSuchElementException, crashing the handler and hiding the real failure from the client.

This PR extracts the rethrow into throwRunFailure: when the failure has a cause it is rethrown unchanged; when it does not, a SparkException with a new PIPELINE_RUN_FAILED error condition is thrown, carrying the run's termination message. PIPELINE_RUN_FAILED (rather than INTERNAL_ERROR) is used so that operational outcomes such as FailureStoppingFlow are not mislabeled as Spark bugs.

Why are the changes needed?

A run that fails without an attached cause (e.g. UnexpectedRunFailure, or a flow that fails to stop) currently surfaces to the Connect client as an opaque NoSuchElementException ("None.get") instead of the actual run-failure message. That masks the real problem and looks like an internal error. These reasons reach this code via the asynchronous onCompletion path, where PipelineExecution.runPipeline's own catch never fires.

Does this PR introduce any user-facing change?

Yes. When a pipeline run fails without an attached cause, the Spark Connect client now receives a PIPELINE_RUN_FAILED error carrying the run's termination message (e.g. "Run failed unexpectedly.") instead of a NoSuchElementException.

How was this patch tested?

New PipelinesHandlerSuite unit-tests throwRunFailure for both cases: the cause-present case rethrows the original cause, and the no-cause case throws a PIPELINE_RUN_FAILED SparkException carrying the termination message (verified with checkError, using the real UnexpectedRunFailure and FailureStoppingFlow messages). The cause-less termination reasons cannot be triggered deterministically through the end-to-end run path, so the rethrow is unit-tested directly. SparkThrowableSuite validates the new error condition.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…lementException when a run fails without an attached cause

`PipelinesHandler.startRun` rethrows a failed pipeline run to the Spark Connect client via `runFailureEvent.foreach { event => throw event.error.get }`. But `event.error` is `None` for run termination reasons that carry no cause - `UnexpectedRunFailure` and `FailureStoppingFlow` both have `cause = None` - so `event.error.get` raised a `NoSuchElementException`, crashing the handler and hiding the real failure from the client.

This PR extracts the rethrow into `throwRunFailure`: when the failure has a cause it is rethrown unchanged; when it does not, a `SparkException` with a new `PIPELINE_RUN_FAILED` error condition is thrown, carrying the run's termination message. `PIPELINE_RUN_FAILED` (rather than `INTERNAL_ERROR`) is used so that operational outcomes such as `FailureStoppingFlow` are not mislabeled as Spark bugs.

A run that fails without an attached cause (e.g. `UnexpectedRunFailure`, or a flow that fails to stop) currently surfaces to the Connect client as an opaque `NoSuchElementException` ("None.get") instead of the actual run-failure message. That masks the real problem and looks like an internal error. These reasons reach this code via the asynchronous `onCompletion` path, where `PipelineExecution.runPipeline`'s own catch never fires.

Yes. When a pipeline run fails without an attached cause, the Spark Connect client now receives a `PIPELINE_RUN_FAILED` error carrying the run's termination message (e.g. "Run failed unexpectedly.") instead of a `NoSuchElementException`.

New `PipelinesHandlerSuite` unit-tests `throwRunFailure` for both cases: the cause-present case rethrows the original cause, and the no-cause case throws a `PIPELINE_RUN_FAILED` `SparkException` carrying the termination message (verified with `checkError`, using the real `UnexpectedRunFailure` and `FailureStoppingFlow` messages). The cause-less termination reasons cannot be triggered deterministically through the end-to-end run path, so the rethrow is unit-tested directly. `SparkThrowableSuite` validates the new error condition.

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56594 from LuciferYang/sdp-run-failure-no-cause.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant