Skip to content

[SPARK-57774][CORE] DriverLogger should not propagate log4j internal exception to caller#56890

Closed
pan3793 wants to merge 2 commits into
apache:masterfrom
pan3793:SPARK-driverlog-ignoreexceptions
Closed

[SPARK-57774][CORE] DriverLogger should not propagate log4j internal exception to caller#56890
pan3793 wants to merge 2 commits into
apache:masterfrom
pan3793:SPARK-driverlog-ignoreexceptions

Conversation

@pan3793

@pan3793 pan3793 commented Jun 30, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Flip the setIgnoreExceptions from false to true in DriverLogger

Why are the changes needed?

https://logging.apache.org/log4j/2.x/plugin-reference.html

ignoreExceptions - If "true" (default) exceptions encountered when appending events are logged; otherwise they are propagated to the caller.

The recent log4j 2.25.0 ~ 2.25.3 and 2.26.0 have known internal bugs and may fail the Spark caller, this has been observed from the GHA several times.

One example https://github.com/apache/spark/actions/runs/27894816608/attempts/1

[info] - provider reports error after FS leaves safe mode *** FAILED *** (3 seconds, 25 milliseconds)
[info]   org.apache.logging.log4j.core.appender.AppenderLoggingException: An exception occurred processing Appender _DriverLogAppender
[info]   at org.apache.logging.log4j.core.appender.DefaultErrorHandler.error(DefaultErrorHandler.java:95)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.handleAppenderError(AppenderControl.java:169)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:162)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:133)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:124)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:88)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:711)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:669)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:645)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:589)
[info]   at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:92)
[info]   at org.apache.logging.log4j.core.Logger.log(Logger.java:187)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2970)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2922)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2904)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2648)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:2587)
[info]   at org.apache.logging.slf4j.Log4jLogger.warn(Log4jLogger.java:253)
[info]   at org.apache.spark.internal.Logging.logWarning(Logging.scala:299)
[info]   at org.apache.spark.internal.Logging.logWarning$(Logging.scala:298)
[info]   at org.apache.spark.SparkFunSuite.logWarning(SparkFunSuite.scala:33)
...

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Review and monitor CI.

Was this patch authored or co-authored using generative AI tooling?

GLM 5.2 is used for issue analysis.

@pan3793

pan3793 commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

@viirya, this flag comes from the log4j 1.x => 2.x migration (SPARK-6305), do you have special reason to set ignoreExceptions=false here?

cc @LuciferYang @dongjoon-hyun @huaxingao @ramanathan1504

@LuciferYang LuciferYang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change, as long as it does not introduce unintended behavioral changes.

@pan3793 pan3793 changed the title [SPARK-57774][CORE] DriverLogger should ignore log4j internal exception [SPARK-57774][CORE] DriverLogger should not propagate log4j internal exception to caller Jun 30, 2026
builder.setIgnoreExceptions(false)
// SPARK-57774: Set to true to avoid propagating log4j internal exception
// (e.g., due to log4j bugs) to the caller.
builder.setIgnoreExceptions(true)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a trade-off. With true here, a real failure to write the local driver log (disk full, permission loss mid-run) is now silently handled by log4j's internal error handler rather than surfacing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any logic to handle exceptions thrown by logging calls, if users really care about that, maybe they should customize log4j's internal error handler?

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yaooqinn @HyukjinKwon to please take a look ^^

@viirya

viirya commented Jun 30, 2026

Copy link
Copy Markdown
Member

@viirya, this flag comes from the log4j 1.x => 2.x migration (SPARK-6305), do you have special reason to set ignoreExceptions=false here?

No, but on the contrary, we generally shouldn't ask why we should ignore the exceptions?

@pan3793

pan3793 commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

we generally shouldn't ask why we should ignore the exceptions?

@viirya I think the method/variable name here is misleading; the ignoreExceptions actually indicates whether to propagate the exception to the caller, not ignore it silently.

@viirya viirya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Verified the propagation mechanism against the bundled log4j-core-2.25.4: in AppenderControl.handleAppenderError, the error handler runs regardless, and ignoreExceptions only gates whether the wrapped AppenderLoggingException is additionally rethrown to the caller. So flipping it to true keeps the failure visible via the error handler while stopping it from escaping into arbitrary logInfo/logWarning/logError call sites — which is exactly the failure in the linked stack trace.

The one tradeoff — genuine appender I/O errors (disk full, permissions) no longer propagate — is the right call for a best-effort driver-log feature, and those errors are still logged. Agree no unit test is warranted; reproducing requires a buggy log4j throwing from inside append.

@viirya

viirya commented Jun 30, 2026

Copy link
Copy Markdown
Member

Small correction to my note above on the tradeoff: when the appender fails, the error is surfaced via log4j's own StatusLogger (which defaults to System.err), not written into the driver log file itself — I was imprecise saying "still logged." So a disk-full/permission failure on the local driver-log file doesn't go silent, but you see it on stderr rather than in the driver log.

Two caveats on that: DefaultErrorHandler throttles repeats (~5-minute interval), so a persistent write failure won't produce one stderr line per event — most are suppressed. And the authoritative signal for the DFS-sync path is unaffected anyway, since DfsAsyncWriter reports its own failures through Spark's logError. None of this changes the conclusion; just making the tradeoff precise.

@pan3793 pan3793 closed this in f5de55e Jul 1, 2026
pan3793 added a commit that referenced this pull request Jul 1, 2026
…exception to caller

### What changes were proposed in this pull request?

Flip the `setIgnoreExceptions` from false to true in `DriverLogger`

### Why are the changes needed?

https://logging.apache.org/log4j/2.x/plugin-reference.html

> ignoreExceptions - If "true" (default) exceptions encountered when appending events are logged; otherwise they are propagated to the caller.

The recent log4j 2.25.0 ~ 2.25.3 and 2.26.0 have [known internal bugs](#51719 (comment)) and may fail the Spark caller, this has been observed from the GHA several times.

One example https://github.com/apache/spark/actions/runs/27894816608/attempts/1
```
[info] - provider reports error after FS leaves safe mode *** FAILED *** (3 seconds, 25 milliseconds)
[info]   org.apache.logging.log4j.core.appender.AppenderLoggingException: An exception occurred processing Appender _DriverLogAppender
[info]   at org.apache.logging.log4j.core.appender.DefaultErrorHandler.error(DefaultErrorHandler.java:95)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.handleAppenderError(AppenderControl.java:169)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:162)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:133)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:124)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:88)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:711)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:669)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:645)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:589)
[info]   at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:92)
[info]   at org.apache.logging.log4j.core.Logger.log(Logger.java:187)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2970)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2922)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2904)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2648)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:2587)
[info]   at org.apache.logging.slf4j.Log4jLogger.warn(Log4jLogger.java:253)
[info]   at org.apache.spark.internal.Logging.logWarning(Logging.scala:299)
[info]   at org.apache.spark.internal.Logging.logWarning$(Logging.scala:298)
[info]   at org.apache.spark.SparkFunSuite.logWarning(SparkFunSuite.scala:33)
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Review and monitor CI.

### Was this patch authored or co-authored using generative AI tooling?

GLM 5.2 is used for issue analysis.

Closes #56890 from pan3793/SPARK-driverlog-ignoreexceptions.

Authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit f5de55e)
Signed-off-by: Cheng Pan <chengpan@apache.org>
pan3793 added a commit that referenced this pull request Jul 1, 2026
…exception to caller

### What changes were proposed in this pull request?

Flip the `setIgnoreExceptions` from false to true in `DriverLogger`

### Why are the changes needed?

https://logging.apache.org/log4j/2.x/plugin-reference.html

> ignoreExceptions - If "true" (default) exceptions encountered when appending events are logged; otherwise they are propagated to the caller.

The recent log4j 2.25.0 ~ 2.25.3 and 2.26.0 have [known internal bugs](#51719 (comment)) and may fail the Spark caller, this has been observed from the GHA several times.

One example https://github.com/apache/spark/actions/runs/27894816608/attempts/1
```
[info] - provider reports error after FS leaves safe mode *** FAILED *** (3 seconds, 25 milliseconds)
[info]   org.apache.logging.log4j.core.appender.AppenderLoggingException: An exception occurred processing Appender _DriverLogAppender
[info]   at org.apache.logging.log4j.core.appender.DefaultErrorHandler.error(DefaultErrorHandler.java:95)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.handleAppenderError(AppenderControl.java:169)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:162)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:133)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:124)
[info]   at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:88)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:711)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:669)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:645)
[info]   at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:589)
[info]   at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:92)
[info]   at org.apache.logging.log4j.core.Logger.log(Logger.java:187)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2970)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2922)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2904)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2648)
[info]   at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:2587)
[info]   at org.apache.logging.slf4j.Log4jLogger.warn(Log4jLogger.java:253)
[info]   at org.apache.spark.internal.Logging.logWarning(Logging.scala:299)
[info]   at org.apache.spark.internal.Logging.logWarning$(Logging.scala:298)
[info]   at org.apache.spark.SparkFunSuite.logWarning(SparkFunSuite.scala:33)
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Review and monitor CI.

### Was this patch authored or co-authored using generative AI tooling?

GLM 5.2 is used for issue analysis.

Closes #56890 from pan3793/SPARK-driverlog-ignoreexceptions.

Authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit f5de55e)
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793

pan3793 commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

thanks, merged to master/4.x/4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants