[SPARK-57774][CORE] DriverLogger should not propagate log4j internal exception to caller#56890
[SPARK-57774][CORE] DriverLogger should not propagate log4j internal exception to caller#56890pan3793 wants to merge 2 commits into
Conversation
|
@viirya, this flag comes from the log4j 1.x => 2.x migration (SPARK-6305), do you have special reason to set |
LuciferYang
left a comment
There was a problem hiding this comment.
I'm fine with this change, as long as it does not introduce unintended behavioral changes.
| builder.setIgnoreExceptions(false) | ||
| // SPARK-57774: Set to true to avoid propagating log4j internal exception | ||
| // (e.g., due to log4j bugs) to the caller. | ||
| builder.setIgnoreExceptions(true) |
There was a problem hiding this comment.
This seems like a trade-off. With true here, a real failure to write the local driver log (disk full, permission loss mid-run) is now silently handled by log4j's internal error handler rather than surfacing.
There was a problem hiding this comment.
I don't think we have any logic to handle exceptions thrown by logging calls, if users really care about that, maybe they should customize log4j's internal error handler?
No, but on the contrary, we generally shouldn't ask why we should ignore the exceptions? |
@viirya I think the method/variable name here is misleading; the |
viirya
left a comment
There was a problem hiding this comment.
LGTM. Verified the propagation mechanism against the bundled log4j-core-2.25.4: in AppenderControl.handleAppenderError, the error handler runs regardless, and ignoreExceptions only gates whether the wrapped AppenderLoggingException is additionally rethrown to the caller. So flipping it to true keeps the failure visible via the error handler while stopping it from escaping into arbitrary logInfo/logWarning/logError call sites — which is exactly the failure in the linked stack trace.
The one tradeoff — genuine appender I/O errors (disk full, permissions) no longer propagate — is the right call for a best-effort driver-log feature, and those errors are still logged. Agree no unit test is warranted; reproducing requires a buggy log4j throwing from inside append.
|
Small correction to my note above on the tradeoff: when the appender fails, the error is surfaced via log4j's own Two caveats on that: |
…exception to caller ### What changes were proposed in this pull request? Flip the `setIgnoreExceptions` from false to true in `DriverLogger` ### Why are the changes needed? https://logging.apache.org/log4j/2.x/plugin-reference.html > ignoreExceptions - If "true" (default) exceptions encountered when appending events are logged; otherwise they are propagated to the caller. The recent log4j 2.25.0 ~ 2.25.3 and 2.26.0 have [known internal bugs](#51719 (comment)) and may fail the Spark caller, this has been observed from the GHA several times. One example https://github.com/apache/spark/actions/runs/27894816608/attempts/1 ``` [info] - provider reports error after FS leaves safe mode *** FAILED *** (3 seconds, 25 milliseconds) [info] org.apache.logging.log4j.core.appender.AppenderLoggingException: An exception occurred processing Appender _DriverLogAppender [info] at org.apache.logging.log4j.core.appender.DefaultErrorHandler.error(DefaultErrorHandler.java:95) [info] at org.apache.logging.log4j.core.config.AppenderControl.handleAppenderError(AppenderControl.java:169) [info] at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:162) [info] at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:133) [info] at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:124) [info] at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:88) [info] at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:711) [info] at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:669) [info] at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:645) [info] at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:589) [info] at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:92) [info] at org.apache.logging.log4j.core.Logger.log(Logger.java:187) [info] at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2970) [info] at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2922) [info] at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2904) [info] at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2648) [info] at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:2587) [info] at org.apache.logging.slf4j.Log4jLogger.warn(Log4jLogger.java:253) [info] at org.apache.spark.internal.Logging.logWarning(Logging.scala:299) [info] at org.apache.spark.internal.Logging.logWarning$(Logging.scala:298) [info] at org.apache.spark.SparkFunSuite.logWarning(SparkFunSuite.scala:33) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Review and monitor CI. ### Was this patch authored or co-authored using generative AI tooling? GLM 5.2 is used for issue analysis. Closes #56890 from pan3793/SPARK-driverlog-ignoreexceptions. Authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit f5de55e) Signed-off-by: Cheng Pan <chengpan@apache.org>
…exception to caller ### What changes were proposed in this pull request? Flip the `setIgnoreExceptions` from false to true in `DriverLogger` ### Why are the changes needed? https://logging.apache.org/log4j/2.x/plugin-reference.html > ignoreExceptions - If "true" (default) exceptions encountered when appending events are logged; otherwise they are propagated to the caller. The recent log4j 2.25.0 ~ 2.25.3 and 2.26.0 have [known internal bugs](#51719 (comment)) and may fail the Spark caller, this has been observed from the GHA several times. One example https://github.com/apache/spark/actions/runs/27894816608/attempts/1 ``` [info] - provider reports error after FS leaves safe mode *** FAILED *** (3 seconds, 25 milliseconds) [info] org.apache.logging.log4j.core.appender.AppenderLoggingException: An exception occurred processing Appender _DriverLogAppender [info] at org.apache.logging.log4j.core.appender.DefaultErrorHandler.error(DefaultErrorHandler.java:95) [info] at org.apache.logging.log4j.core.config.AppenderControl.handleAppenderError(AppenderControl.java:169) [info] at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:162) [info] at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:133) [info] at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:124) [info] at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:88) [info] at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:711) [info] at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:669) [info] at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:645) [info] at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:589) [info] at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:92) [info] at org.apache.logging.log4j.core.Logger.log(Logger.java:187) [info] at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2970) [info] at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2922) [info] at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2904) [info] at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2648) [info] at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:2587) [info] at org.apache.logging.slf4j.Log4jLogger.warn(Log4jLogger.java:253) [info] at org.apache.spark.internal.Logging.logWarning(Logging.scala:299) [info] at org.apache.spark.internal.Logging.logWarning$(Logging.scala:298) [info] at org.apache.spark.SparkFunSuite.logWarning(SparkFunSuite.scala:33) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Review and monitor CI. ### Was this patch authored or co-authored using generative AI tooling? GLM 5.2 is used for issue analysis. Closes #56890 from pan3793/SPARK-driverlog-ignoreexceptions. Authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit f5de55e) Signed-off-by: Cheng Pan <chengpan@apache.org>
|
thanks, merged to master/4.x/4.2 |
What changes were proposed in this pull request?
Flip the
setIgnoreExceptionsfrom false to true inDriverLoggerWhy are the changes needed?
https://logging.apache.org/log4j/2.x/plugin-reference.html
The recent log4j 2.25.0 ~ 2.25.3 and 2.26.0 have known internal bugs and may fail the Spark caller, this has been observed from the GHA several times.
One example https://github.com/apache/spark/actions/runs/27894816608/attempts/1
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Review and monitor CI.
Was this patch authored or co-authored using generative AI tooling?
GLM 5.2 is used for issue analysis.