With the recent --show-logs flag, we switch the deploy mode to client so that EMR steps can capture the driver stdout.
Unfortunately, --client mode doesn't work with additional archives provided via the --archives flag or --conf spark.archives parameter. See https://issues.apache.org/jira/browse/SPARK-36088 for more a related issue.
In order to support this for cluster mode, we'd need to parse the step stderr logs to retrieve the Yarn application ID, then fetch the Yarn application logs from S3.
With the recent
--show-logsflag, we switch the deploy mode toclientso that EMR steps can capture the driverstdout.Unfortunately,
--clientmode doesn't work with additional archives provided via the--archivesflag or--conf spark.archivesparameter. See https://issues.apache.org/jira/browse/SPARK-36088 for more a related issue.In order to support this for cluster mode, we'd need to parse the step
stderrlogs to retrieve the Yarn application ID, then fetch the Yarn application logs from S3.