URITools uses a static field:
public static String s3Region = null;
This works in single-JVM setups, but in Spark/EMR each executor runs in its own JVM. Setting URITools.s3Region on the driver does not propagate to executors, so S3 access may use the wrong/default region.
Proposal
Initialize s3Region from a system property and/or environment variable so it can be configured via EMR/Spark settings:
public class URITools {
public static int cloudThreads = 256;
public static String s3Region =
System.getProperty(
"s3Region",
System.getenv("AWS_REGION")
);
public static boolean useS3CredentialsWrite = true;
public static boolean useS3CredentialsRead = true;
}
spark.driver.extraJavaOptions=-Ds3Region=us-west-2
spark.executor.extraJavaOptions=-Ds3Region=us-west-2
URITools uses a static field:
This works in single-JVM setups, but in Spark/EMR each executor runs in its own JVM. Setting URITools.s3Region on the driver does not propagate to executors, so S3 access may use the wrong/default region.
Proposal
Initialize s3Region from a system property and/or environment variable so it can be configured via EMR/Spark settings: